Did China lie about COVID-19? – Did SARS-CoV-2 accidentally escape from a lab? – Part 4

[Note: A shorter, revised and on a few issues more up to date version of this piece has been published by Quillette as a four-part essay. The version on Quillette is more polished and a bit less polemical, but this one has more technical details on some issues, so which version you should read depends on what you’re looking for.]

But wildlife markets are not the only reason some people accuse China of being responsible for the pandemic. Many are saying that SARS-CoV-2 escaped from a lab or even that it was bioengineered by Chinese researchers and was accidentally released. Some even go as far as saying that China deliberately released it, but I won’t discuss this claim because it’s absurd on its face, so it would be a waste of time. But I want to talk about the theory according to which SARS-CoV-2 accidentally escaped from a lab, because it’s at least not preposterous. Moreover, not only has this version of the lab escape theory gained traction on social media, but it has also been peddled by U.S. officials and reputable news organizations, so it seems important to address it. However, it’s not easy to discuss this theory, because there is a kind of “death by a thousand cuts” strategy going on here. People bring up a lot of things which they claim support a version of the lab escape theory, but upon closer inspection, what they say turns out to be either downright confused or doesn’t really support the lab escape theory. However, when you point that out, they just reply with something like “this may be true, but there are dozens of other suspicious things, so I still think we should take the lab escape theory seriously”. I can’t realistically hope to address every argument that has been put forward in favor of this theory and, even if I could, I think it would be neither useful nor necessary. Instead, I’m going to focus on the arguments that proponents of the lab escape theory most frequently use, but more importantly I’m going to criticize what I take to be the fundamental logical flaw in their reasoning.

The lab escape theory started to spread on Chinese social media in February and was soon picked up by people on Twitter in the West. A video posted on April 1 by Matthew Tye, a US citizen who used to live in China and has a popular YouTube channel, seems to have played a significant role in popularizing this theory. Jim Geraghty of the National Review, whom I’ve already talked about, even wrote a piece about it. In this video, which has more than 2 million views as of this writing, Tye claims to have identified the Wuhan Institute of Virology (WIV) as the source of outbreak in Wuhan. So I want to start with a discussion of Tye’s specific claims before moving on to other arguments that have been made in favor of the lab escape theory. Tye is clearly very hostile to the CCP and doesn’t make any efforts to hide it. Unfortunately, this leads him to make a lot of nonsensical or even dishonest claims, so I think it’s unfortunate his video has been so popular. For instance, at the beginning of the video, he claims that the top-down leadership of China stifled and prevented “the release of any relevant information for months”. Of course, this has become a widely accepted narrative, but as I hope what I said previously has shown, reality is far more complicated and this claim is mostly nonsense. This is hardly the only example of nonsensical claims he makes in that video, but again I can’t discuss everything, so I will focus on what he claims to be strong evidence in favor of the lab escape theory.

The first piece of evidence he presents is a job ad posted on the website of the WIV on December 24 to recruit a postdoc researcher. (He’d previously noted that another job ad had been posted on November 18 to recruit someone to work on bat coronaviruses.) After noting this was before any news of the outbreak in Wuhan broke, he claims the ad basically says “we’ve discovered a new and terrible virus and would like to recruit people to come deal with it”. But here is a translation of the job posting in question, I’ve just removed the parts that were not relevant to the research topic, done with a combination of DeepL/Google Translate and improved with the help of a Chinese speaker:

Proposed recruitment direction 1: Ecological study of bat migration and virus transmission

Proposed [recruitment] direction 2: Research on the cross-species infection and pathogenicity of bat viruses

 

 

Introduction to PI

 

Shi Zhengli, Ph.D., Researcher, Head of Emerging Viruses Discipline Group, Wuhan Institute of Virus Research, Chinese Academy of Sciences, Director of Emerging Infectious Diseases Research Center, Wuhan Institute of Virus Research, Chinese Academy of Sciences, Director of Key Laboratory of Highly Pathogenic Biology and Biosecurity, Chinese Academy of Sciences, Editor-in-Chief of Virologica Sinica. Long-term research on the pathogen biology of important viruses carried by bats has been carried out to confirm the bat origin of major new infectious diseases in humans and animals such as SARS and SADS, and a large number of new viruses in bats and rodents have been discovered and identified. She has published more than 110 papers in SCI journals such as Nature, Science, Nat Rev Microbiol, Cell Host Microbe, Nat Microbiol, PLoS Pathog, and has been listed in the Elsevier “China Highly Cited Scholars” list (Immunology and Microbiology) for five consecutive years since 2014. Awarded “Advanced Worker” by the Chinese Academy of Sciences and “May Day Labor Medal” by the National Academy.

 

Main research directions of the subject group

 

The Emerging Viruses Section focuses on the pathogenesis of emerging viruses and their infection mechanisms, including the discovery of viruses in bats and rodents, the study of early warning and transmission patterns, the study of cross-species infection mechanisms and pathogenicity of important viruses transmitted by bats such as coronavirus, and serology and molecular diagnostic techniques of emerging viruses.

Clearly, Tye’s summary of the content of this ad is completely dishonest, since nowhere does it say that “a new and terrible virus” has just been discovered or anything like that.

In the rest of the video, Tye recycles a theory that had been circulating on Chinese social media for a while, as he readily admits. According to this theory, the outbreak in Wuhan started with a researcher at the WIV named Huang Yanling, who was infected in a lab, infected other people and eventually died. (Tye reports a rumor according to which she was cremated in secrecy and the workers at the funeral home were infected.) But there doesn’t seem to be any real evidence to support this story, which as far as I can tell was just made up out of thin air by random people on Chinese social media. It’s true that Huang Yanling used to be a graduate student at the WIV, but except for that (which is not really a reason to think she had anything to do with the outbreak), there is no reason to think there is any truth to this story. Tye claims that her picture and information was removed from the webpage of the lab in which she worked, even though it’s still available for every other student who used to be part of that research group, whether or not they later moved on to another institution. However, this is a purely gratuitous assertion, since as far as I know there is no evidence whatsoever that her picture and information were ever on this webpage. (Moreover, Tye’s presentation misleadingly suggests Huang Yanling is the only student who doesn’t have a picture on that webpage, by not scrolling down further, but 2 other students also don’t have a picture, although they have a biography if you click on their name. He also leaves out that another student has a picture but no information.) Anyone who has ever spent time going through lists of graduate students or professors on the websites of research institutions knows that it’s pretty common for them to have just their name without any picture or information. This is not evidence of anything in any interesting sense of the term.

Still, this story gained enough traction on Chinese social media that, on February 16, the WIV made a statement to publicly deny it. According to this statement, Huang Yanling did work at the WIV in the past, but she graduated in 2015 and has lived in another province since then. This is consistent with her ResearchGate profile, according to which she hasn’t published since 2015, something that would be extremely unlikely if she were still working at a research institution. Tye contends that, if Huang Yanling were still alive, the CCP would have arranged a photo-op of some kind a long time ago to quell the rumor. According to him, the fact that it did no such thing is strong evidence that she is dead. I know many people find this argument compelling, but it’s really bad. Not only would a photo-op with Huang Yanling would not quell the rumor, but it would probably do the opposite by raising awareness about it. I don’t even think we know what Huang Yanling looks like, because I don’t think we have any picture of her, so if the Chinese authorities did a photo-op with someone they claim to be her, we’d have no way to know that it’s true. Even if there is a picture of her out there, it wouldn’t be difficult for the Chinese authorities to find someone who looks like her to do a propaganda stunt, something people on the Internet would make sure to point out immediately. Tye may have never heard of the Streisand effect, but I’m pretty sure that people in charge of the propaganda at the CCP have.

In his piece about the lab escape theory, Geraghty tries to make the WIV’s denial look suspicious by telling this anecdote:

On February 17, Zhen Shuji, a Hong Kong correspondent from the French public-radio service Radio France Internationale, reported: “when a reporter from the Beijing News of the Mainland asked the institute for rumors about patient zero, the institute first denied that there was a researcher Huang Yanling, but after learning that the name of the person on the Internet did exist, acknowledged that the person had worked at the firm but has now left the office and is unaccounted for.”

Geraghty refrains from spelling this out, because the suggestion is pretty transparent. If Huang Yanling had really nothing to do with the outbreak, then how come the WIV initially denied that she had ever worked over there, before admitting it after people pointed out she was listed on the institute’s website? This argument is often used by people who think that Huang Yanling was patient zero.

The problem is that, once again, what Geraghty says is extremely misleading. Here is a translation of what the Radio France Internationale article says a few paragraphs after the passage he quoted:

Beijing News tried to verify the information with researcher Shi Zhengli, who specializes in bat coronavirus research, and Chen Quanjiao, who works on influenza; neither could confirm if there was a staff named Huang Yanling in the institute. They further commented there were more than 1000 staffs in the institute, not a single one was infected during the pandemic. Netizens subsequently pointed out Huang’s name was on the official page if the institute, but the content related to her had been deleted.

So in the passage Geraghty quoted, Radio France Internationale is actually referring to Shi Zhengli and Chen Quanjiao’s claim they didn’t know whether there was a Huang Yanling at the WIV, but it’s written in a way that makes it sound like the WIV first denied she had ever worked there and then walked it back after it was caught in a lie. This interpretation is confirmed by the Beijing News article Radio France Internationale is talking about. A more suspicious man than me would suspect that Geraghty’s decision to quote the first passage but not the second was not entirely innocent.

So there is no real evidence that Huang Yanling had anything to do with the outbreak in Wuhan and, in fact, we have every reason to believe she stopped working at the WIV several years ago. But there are also many reasons to think Tye’s story is totally implausible. First, it’s worth keeping in mind that, according to the South China Morning Post (which claims to have seen confidential government data), the first person believed to have been infected by SARS-CoV-2 is a 55 years old man. Without more details, it’s hard to know what to make of this report, but it’s another piece of evidence that makes Tye’s story unlikely. Moreover, if Huang Yanling was still a graduate student in 2015, she would have been in her late twenties or early thirties in 2019. But a woman of that age is overwhelmingly unlikely to die even if they are infected by SARS-CoV-2. Indeed, according to a recent French study, the infection fatality rate for a woman between 30 and 39 is only 0.01%. Thus, just on that basis, your prior that Tye’s story is false should be extremely high and therefore only very strong evidence should make you revise your credence, but as I just noted there is no evidence to speak of in favor of the claim that Huang Yanling was patient zero and died. Moreover, as I explained in the first part of this essay, everything indicates that nobody, including the local health authorities, had noticed the cluster of pneumonia in Wuhan until late December. However, if Huang Yanling were patient zero, she would presumably have been infected sometime in November. So if she had subsequently died she would almost certainly have already been dead or at least been hospitalized in a very serious condition by late December. It’s very unlikely that, if a woman in her late twenties or early thirties who worked with viruses in a research lab had been hospitalized with pneumonia and soon after that cases of pneumonia of unknown etiology had started to show up in hospitals people wouldn’t have connected the dots.

The theory that Huang Yanling was patient zero is entirely without merit. It’s just a totally unsubstantiated that some random people on the Internet conjured up, but unlike most such theories, many otherwise intelligent people take it seriously. Interestingly, similarly unsubstantiated theories circulate on Facebook all the time in the West, but nobody intelligent pay attention to them, except to refute them. Yet, in doing research for this piece, I have found that again and again even intelligent people take seriously completely unsubstantiated and wildly implausible theories originating from Chinese social media. I guess it’s because they don’t trust the Chinese government, which is fair, but it doesn’t mean that you should take seriously everything that random people on Chinese social media are saying. Most of what people say on Chinese social media is nonsense, just like most of what people say on social media in the West is nonsense. The fact that China is not a liberal democracy doesn’t make random people on the Internet more epistemically responsible than they are in the West. Of course, if you asked people who share Tye’s video and similar nonsense about the role of China in the pandemic of COVID-19 whether they believe that, they would deny it, but they still behave as if they did or they wouldn’t share that kind of garbage. It’s very important that we maintain the same epistemic standards we ordinarily use when we talk about regimes we don’t like, because history has shown that when we don’t, disaster often ensues.

But not everybody who believes that SARS-CoV-2 escaped from a lab thinks that Huang Yanling was patient zero. There are many people who think it’s likely that SARS-CoV-2 originated from a lab in Wuhan, yet don’t claim to know which researcher was infected first, so I want to discuss the lab escape theory more broadly. We should distinguish between at least 2 versions of this theory, although this distinction is not as important as most proponents of the lab escape theory believe. According to the first version, SARS-CoV-2 emerged in the wild and infected animals on which samples were taken by researchers in Wuhan, which somehow led to someone being infected and ultimately to the outbreak. On the second version, SARS-CoV-2 was artificially created in a lab, accidentally infected someone and started the outbreak. Of course, we could further distinguish between versions of the lab escape theory depending on the exact scenario that is posited, but I don’t think it would really be useful. The lab escape theory has not just been promoted by random people on the Internet. Although he now appears to be slowly walking back from the lab escape theory, perhaps because even the U.S. intelligence community doesn’t believe it, Pompeo had previously claimed to have evidence the virus originated from a lab in Wuhan and was even ambiguous on whether he thought it had been bioengineered. Moreover, as we have seen, polls show that many people believe SARS-CoV-2 came from a lab. So I think it’s important to discuss whether, setting aside Tye’s ridiculous claims, there is any evidence to support a version of the lab escape theory.

People who defend the lab escape theory often cite 2 sources. The first is a document published under the title “Evidence SARS-CoV-2 Emerged From a Biological Laboratory in Wuhan, China” on a website called “Project Evidence” by a group of anonymous people who describe themselves as researchers. They claim to present evidence that SARS-CoV-2 accidentally escaped from a lab and, while they don’t conclude that it’s definitely what happened, this is clearly the impression they are trying to convey. While they reject the theory that SARS-CoV-2 was artificially created as a bioweapon and intentionally released, they don’t rule out that it was bioengineered and accidentally released, although some of the claims they make are more naturally interpreted as supporting the theory that it evolved naturally and escaped from a lab after some animals that carried the virus accidentally infected someone. The other source that proponents of the lab escape theory often cite is a blog post called “Lab-Made? SARS-CoV-2 Genealogy Through the Lens of Gain-of-Function Research“. It was written by Yuri Deigin, a biotech entrepreneur. He also doesn’t claim that SARS-CoV-2 definitely originated from a lab and, in the conclusion, even says that a natural origin is still more likely. However, throughout his post, he keeps pointing out what he calls “strange coincidences”, which he clearly thinks are evidence the virus was in fact created in a lab and accidentally escaped. Thus, although he claims a natural origin is still more likely, he at least thinks the probability that SARS-CoV-2 originated in a lab is pretty high and many people who read his post came away with the conclusion that it did.

I think both documents, especially the first, are very confused, but they make so many points that I’m not going to try and debunk every mistake they contain, because I don’t think it would be useful. (The first document also favorably presents the theory that Huang Yanling was patient zero, which I already discussed.) This is just another example of the “death by a thousand cuts” strategy I already mentioned. They bury the reader under a huge mass of information which they claim support the lab escape theory. In fact, none of the points they make really support that theory, but most readers won’t realize that because they lack the knowledge they would need to understand why and/or don’t have the time to fact-check the claims in question. Even if they realize that some of the points made in those documents don’t really support the lab escape theory, most of them will still come away with the impression that it’s likely SARS-CoV-2 came from a lab, because there will still be many other claims that sound plausible to them. To be clear, I’m not claiming that Yuri Deigin and the people behind “Project Evidence” are disingenuous. On the contrary, I have no doubt they sincerely believe what they say, but they are clearly very confused about a number of things, so what they write is still very misleading. Thus, instead of trying to address every single point they make, I’m going to try to identify their central arguments, which I will then discuss at the highest level of generality possible so as to avoid getting bogged down in details that do not really matter. Before I do that, however, I have to explain a few things about the biology of SARS-CoV-2. This is going to be a little technical, but don’t worry, I will keep the details to a minimum and it won’t be long.

SARS-CoV-2 is able to infect human cells because it has something called the spike protein on its surface that allows it to bind a receptor called ACE2 found on certain human cells. Different coronaviruses have a different spike protein that can bind to different types of receptors and therefore they can infect different types of cells in different species. In particular, their spike proteins differ in something called the receptor-binding domain (RBD), which plays a key role in the binding process. As we have seen, the closest relative of SARS-CoV-2 that we currently know about is RaTG13-CoV, a bat coronavirus. Even though SARS-CoV-2 is genetically very close to RaTG13-CoV overall, they are very different in the part of the genome that codes for the RBD of the spike protein. Among other things, it’s because the RBD of SARS-CoV-2’s spike protein differs from that of RaTG13-CoV that it can infect humans pretty easily, whereas RaTG13-CoV probably can’t or not very efficiently. As it happens, even though SARS-CoV-2 is closer to RaTG13-CoV in the rest of the genome, in the part that codes for the RBD, it’s much closer to coronaviruses found in pangolins. This is why many people think that, although it ultimately originated in bats, the ancestor of SARS-CoV-2 went through pangolins, where it gained the ability to infect humans by recombination with a pangolin coronavirus. In other words, a pangolin that was already infected with a pangolin coronavirus somehow came in contact with a bat and was also infected by a bat coronavirus, then during replication the part of the genome of the pangolin coronavirus that codes for the RBD ended up in the genome of the bat coronavirus, which resulted in the emergence of a virus that could infect humans even though neither of the original viruses could or not very efficiently. We know this is possible since pangolins and bats have recently been observed living in the same burrows in Gabon and the same thing could happen in China.

Of course, this is just one possibility, but the actual story could be even more complicated. Perhaps the virus went from a bat to a pangolin, then back to a bat or some other animal, before infecting humans. It’s also possible that, unbeknownst to us, there are bat coronaviruses that are very similar to SARS-CoV-2 in the RBD, not because they acquired this part of their genome through recombination with a pangolin coronavirus, but because they independently evolved a RBD similar to that of the pangolin coronaviruses that have been found to be similar to SARS-CoV-2 in the RBD. In fact, it’s possible that pangolin coronaviruses acquired this RBD from such a bat coronavirus, not the other way around. In this scenario, there was still a recombination event, but pangolin coronaviruses acquired their RBD from the ancestor of SARS-CoV-2, which therefore is not really a chimera but only appear to be one because we haven’t yet found a bat coronavirus that has this kind of RBD. SARS-CoV-2 could have emerged by recombination between one of those bat coronaviruses and a relative of RaTG13-CoV without the involvement of any other animal. The truth is that we don’t know. However, since overall SARS-CoV-2 is genetically closer to RaTG13-CoV than to any other virus we know about, but is much closer to pangolin coronaviruses in the RBD, our best theory for the moment is that it was the result of a recombination event during which a relative of RaTG13-CoV acquired the RBD of a relative of those pangolin coronaviruses.

Such recombinations naturally occur all the time, but it could also have occurred artificially. Scientists often create such chimeras because it allows them to test experimentally the role played by different parts of a virus. For instance, if you want to know whether the RBD of one virus is what makes it more infectious to humans than another virus, you could take the RBD of the latter and replace that of the former with it to check if the resulting chimera can more easily infect human cells in culture. More generally, if you want to bioengineer a virus that can perform a certain task, you can either take pieces from different viruses and assemble them or start with a virus that already exists and simulate natural selection until you get something that can perform this task. Of course, those strategies are not mutually exclusive, you can use a combination of both. It’s certainly possible that SARS-CoV-2 was created in that way. For instance, scientists could have replaced the RBD of a bat coronavirus related to RaTG13-CoV by that of a pangolin coronavirus related to those which have been recently discovered and were found to be similar to SARS-CoV-2 in the RBD, which might have given rise to SARS-CoV-2 either directly or after enough rounds of simulated evolution. Again, this is certainly possible, but lots of things are possible and most of them never happened. If you want to blame China for the pandemic on the ground that SARS-CoV-2 was artificially created in a lab from which it accidentally escaped, you have to show that it’s likely, not just that it’s possible. The same thing is true if you prefer the version of the lab escape theory on which SARS-CoV-2 evolved naturally, but accidentally escaped from a lab where Chinese scientists were studying it.

Both Deigin and the people behind “Project Evidence” point out that researchers at the WIV have been involved in the kind of research I just described with bat coronaviruses, which is true, but I don’t think it’s evidence the virus was bioengineered in a lab except in such a weak sense that it’s totally uninteresting. Sure, it raises the probability that SARS-CoV-2 was bioengineered at the WIV somewhat, but the fact that I’m a man also raises the probability that I’m a serial killer (since they are more often male), but surely it doesn’t raise it enough to warrant suspecting me of being a serial killer. One reason Deigin and the people behind “Project Evidence” think it’s evidence in a more robust sense that SARS-CoV-2 accidentally escaped from a lab is that RaTG13-CoV, its closest known relative, was discovered in samples taken on bats found in Yunnan, more than 1,000 kilometers away from Wuhan. But this fact is not a good reason to think that SARS-CoV-2 accidentally escaped from a lab in Wuhan, regardless of whether you think it was bioengineered or evolved naturally but was studied by scientists when someone was accidentally infected. That’s because bat coronaviruses are massively undersampled. We only know about a handful of them, but we know there are thousands of them we haven’t discovered, so there could be and probably are bats in Hubei or somewhere else closer to Wuhan than Yunnan that carry a much closer relative of SARS-CoV-2. It’s just that scientists have not stumbled upon them, which is entirely unsurprising, since the probability they would chance upon those particular coronaviruses is infinitesimal. (By the way, the same point applies to the furin cleavage site in the spike protein, which as Deigin points out no other coronavirus in the same family as SARS-CoV-2 we know about has. Sure, but we only know about a tiny fraction of the coronaviruses in that family, so this is not a reason to conclude those 4 amino acids were artificially inserted by scientists.) There are plenty of bats in Hubei and, as this study shows, they also carry coronaviruses.

In connection to this point, it’s also important to keep in mind that, although RaTG13-CoV is the closest relative of SARS-CoV-2 we know about, it’s still pretty distantly related to it since it’s only ~96% similar genetically. Their most recent common ancestor probably goes back decades ago and there is no reason to think much closer relatives don’t circulate in Hubei. If scientists in Wuhan had accidentally been infected by RaTG13-CoV, assuming it even can infect humans, the result would definitely not have been SARS-CoV-2 but something much closer genetically to RaTG13-CoV. Similarly, if scientists at the WIV had artificially created a virus by replacing the RBD of RaTG13-CoV by that of a pangolin coronavirus, the result would have been very different from SARS-CoV-2 unless they had also used in vitro passages to simulate natural selection and create a very different virus. However, had they used RaTG13-CoV as a backbone and simulated natural selection to create SARS-CoV-2 in that way, there would be signs in the genome of the virus that it had recently been under strong selection (a large proportion of the nucleotides changes would have resulted in amino acid changes), but this is not the case. So if scientists artificially created SARS-CoV-2 by replacing the RBD of a bat coronavirus with that of a pangolin coronavirus, they almost certainly didn’t use RaTG13-CoV but another bat coronavirus nobody except them knows about. I guess it’s possible, but there is no reason to think it ever happened.

If researchers at the WIV had been doing that kind of experiments, it’s surprising that no scientists elsewhere had heard about it through the grapevine, but so far no one suggested they had heard that from their colleagues in Wuhan. Nor is there any evidence that anyone has discovered a bat coronavirus more closely related to SARS-CoV-2 than RaTG13-CoV. Thus, for the moment, it’s a purely ad hoc hypothesis. Only someone who is already committed to the lab escape theory would have a reason to make this hypothesis, because otherwise this version of the theory is untenable, but that’s not a reason to think the lab escape theory is true. Moreover, on this hypothesis, the researchers who created SARS-CoV-2 still used the RBD of pangolin coronaviruses that were found to be almost identical to SARS-CoV-2 in that region of the genome. But there was no reason to think this RBD should have been good at binding human ACE2 and, on the contrary, structural analyses would apparently have predicted it wasn’t optimal at doing so. So if they had wanted to create a chimera that can efficiently infect human cells by replacing the RBD of a bat coronavirus with that of another virus, scientists probably wouldn’t have used the RBD of those pangolin coronaviruses, but one they had reason to think would make the resulting chimera good at binding human ACE2. Of course, this is not dispositive, they might still have wanted to do that for some other reason, but it makes a hypothesis that is already purely ad hoc unlikely to begin with.

Moreover, as far as we know, nobody at the WIV knew about those pangolin coronaviruses until February of this year, when a team of researchers at other institutions published a paper about them after realizing they were very similar to SARS-CoV-2 in the RBD and uploaded the genomes on public databases. Similarly, not only is there no evidence that anyone has discovered a bat coronavirus more closely related to SARS-CoV-2 than RaTG13-CoV, which could have been used as a backbone to create SARS-CoV-2, but as far as we know even RaTG13-CoV wasn’t fully sequenced by researchers at the WIV until January. According to Shi Zengli and her co-authors, in the paper about SARS-CoV-2 they published in Nature at the beginning of February, they had previously detected the presence of RaTG13-CoV, along with various other coronaviruses, in samples taken on bats, but had only sequenced a short region of the genome coding for a protein called the RdRp. (As this paper explains, this region is highly conserved among coronaviruses, making it a good target to detect coronaviruses in samples by RT-PCR.) After SARS-CoV-2 emerged and was sequenced, they looked at the data they already had and noticed a RdRP sequence they had previously sequenced was very similar to that of SARS-CoV-2, so they went back to the sample and decided to sequence the full genome of the virus this RdRp belonged to, which is how they obtained the genome of RaTG13-CoV. Of course, this is what people at the WIV are saying and they could be lying, but their story is totally plausible and there is no evidence it’s false. They had no reason to sequence the whole genome, which is more expensive and time-consuming, until the homology with SARS-CoV-2 in the RdRp sequence was noticed.

Nevertheless, many people accuse Shi Zengli and her team of having withheld the genome of RaTG13-CoV until earlier this year even though they had already sequenced it long before that, but there is absolutely no reason to think so. Deigin also thinks they probably had already sequenced RaTG13-CoV before the outbreak of SARS-CoV-2, but his argument is confused. He points out that RaTG13-CoV is probably the same as BtCoV/4991, another bat coronavirus whose discovery was reported by Shi Zengli and her team in a 2016 paper and whose RdRp sequence was posted on NCBI at the same time. I think he is probably right about that since, among other things, the RdRp sequence of BtCoV/4991 is identical to that of RaTG13-CoV. But this doesn’t mean they had already sequenced the whole genome of RaTG13-CoV by 2016. In fact, if they had, they would presumably would have published it. If you look at the papers that cite the article on the detection method consisting in targeting the RdRp region I mentioned above, you will see that most of them didn’t do whole genome sequencing but only sequenced the RdRp  (including those published by researchers outside China), so there is nothing even remotely unusual about this. (In another paper they published in 2015, Shi Zengli and her team obtained 64 RdRp sequences, but only selected 11 of the samples for whole genome sequencing.) Deigin thinks they probably had sequenced the whole genome back then because, in another paper they published in 2019, BtCoV/4991 is included in a phylogenetic tree and he doesn’t think “RaBtCoV/4991’s place in that tree was determined based solely on” a short RdRp sequence. However, not only is this exactly what the paper says they did, but if you look at papers that use the detection method based on the RdRp, you will see that people do phylogenetic trees of coronaviruses using only RdRp sequences all the time, so this argument is totally unconvincing.

Thus, we have every reason to believe that, as Shi Zengli and her team say, the full genome of RaTG13-CoV was not sequenced until after the outbreak of SARS-CoV-2. Beside, even if RaTG13-CoV had already been sequenced long before that, this wouldn’t really give us a reason to think SARS-CoV-2 was artificially created by replacing the RBD of a bat coronavirus with that of a pangolin coronavirus, because as I already explained, even if SARS-CoV-2 was created in that way, RaTG13-CoV was almost certainly not used as a backbone. So in order to believe that SARS-CoV-2 was bioengineered, you have to assume that researchers at the WIV had previously discovered a bat coronavirus closely related to SARS-CoV-2, though almost certainly not RaTG13-CoV, but also pangolin coronaviruses almost identical in the RBD to those discovered by researchers at other institutions (who didn’t report this discovery until February of this year), then replaced the RBD of the former with that of the latter to create a chimera (not just a pseudovirus that can’t replicate after infecting a cell, as most of the studies mentioned by Deigin and the people behind “Project Evidence” did, but a fully infectious virus), without any researcher anywhere else in the world having ever heard anything about any of that. In other words, you have to make all sorts of purely ad hoc hypotheses, most of which are rather unlikely. Sure, it’s possible, but why on earth would you make all those hypotheses unless you already had decided that SARS-CoV-2 was artificially created at the WIV? Again, a lot of things are possible, but we don’t usually believe them without a good reason and neither should we.

The same thing can be said about the version of the lab escape theory according to which SARS-CoV-2 evolved naturally, but ended up in a lab where it was studied by scientists and from which it accidentally escaped. Of course, we can’t rule this possibility out (as the people behind “Project Evidence” correctly note, viruses have escaped from labs before), but we also have no reason to think it actually happened. We’d have to assume that scientists at the WIV had discovered SARS-CoV-2 before the outbreak in Wuhan or perhaps another, closely related virus that somehow infected someone at the lab or in the vicinity, circulated in humans or possibly another species for a while and gave rise to SARS-CoV-2 by accumulating enough mutations that made it well-adapted for humans. Shi Zengli says that, after she heard that a new coronavirus had appeared in Wuhan, she and her team went through their records but didn’t find anything that matched the sequences found in samples taken on patients suffering from pneumonia caused by the new virus. Obviously, she could be lying, but there is no evidence whatsoever that she is. As Dennis Carroll, the former director of USAID’s emerging threats division, told Vox scientists like to talk and it would be surprising if people at the WIV had discovered a virus as unusual as SARS-CoV-2 without anyone else hearing anything about it. Again, this is hardly dispositive, but it makes a theory that is already not supported by any serious piece of evidence unlikely to begin with.

Of course, proponents of this version of the lab escape theory bring up various facts, alleged or real, which they think are evidence in favor of it, but I don’t think any of them are evidence in any interesting sense. Again, I can’t discuss all of them, but I want to briefly discuss this column published by Josh Rogin in the Washington Post, because it’s often cited by proponents of the lab escape theory and because it’s been published in a prestigious newspaper. (Shortly before that, David Ignatius, Rogin’s colleague and another usual conduit for U.S. intelligence officials looking to leak a story to the press, had already published a piece suggesting the virus might have escaped from a lab by accident. At this point, given how many stories have been leaked to the press by U.S. intelligence officials about China’s alleged responsibility for the pandemic, it should be clear that some people in the U.S. intelligence community are engaged in a campaign to make China look bad.) Rogin was apparently given at least partial access to State Department cables sent by U.S. officials in 2018, after a visit to the WIV (where the first laboratory to achieve the highest level of international bioresearch safety, BSL-4, in China had recently opened), in which they noted a “shortage of appropriately trained technicians and investigators needed to safely operate this high-containment laboratory”. Proponents of the lab escape theory argue that it’s evidence in favor of that theory because it raises the probability that SARS-CoV-2, whether it was artificially created or evolved naturally and was studied at the WIV, might have accidentally escaped from this laboratory.

But I don’t think Rogin’s column should increase the credence of a rational person that SARS-CoV-2 escaped from the WIV in any meaningful way. First, Rogin is spinning the passage about the lack of properly trained personnel in the BSL-4 laboratory as a warning that it could result in some kind of accident, but without seeing the rest of the cable and therefore knowing the context, it’s hard to determine whether the U.S. officials who wrote it were actually worried that something like that might happen. He says that “sources familiar with the cables said they were meant to sound an alarm about the grave safety concerns at the WIV lab, especially regarding its work with bat coronaviruses”, but this is not clear from any of the passages he quotes, so this could just be post-hoc spin on them. If Rogin has the whole cables, he could just publish them, so we can determine whether the way in which he and his sources spun them was not misleading, but he hasn’t done so. On the other hand, if he didn’t even see the cables in their entirety, he should never have published a column about them or, at the very least, he should have explicitly warned against the possibility that he was being manipulated by his sources to push a specific narrative that serves their agenda, whatever this may be. Unfortunately, most journalists who receive leaks from intelligence officials tend not to worry about that, which is probably why they are chosen to receive them in the first place. This is hardly a purely theoretical concern, there has been many examples of precisely that kind of manipulation, especially since 2016. In fact, as we have seen previously, U.S. intelligence officials have made leaks that were later shown to be misleading about the outbreak in Wuhan, so we don’t even have to back very far to find reasons to question the motives of Rogin’s sources and the accuracy of their information.

The same thing can be said about the only other part of this cable that Rogin actually quotes:

“Most importantly,” the cable states, “the researchers also showed that various SARS-like coronaviruses can interact with ACE2, the human receptor identified for SARS-coronavirus. This finding strongly suggests that SARS-like coronaviruses from bats can be transmitted to humans to cause SARS-like diseases. From a public health perspective, this makes the continued surveillance of SARS-like coronaviruses in bats and study of the animal-human interface critical to future emerging coronavirus outbreak prediction and prevention.”

Rogin suggests this passage was meant as a warning that Shi Zhengli’s work on bat coronaviruses in particular represented a threat. However, without more context, this passage is more naturally interpreted as saying that the work in question was important to prevent the emergence of another coronavirus that can transmit to humans. If the context of this passage in the cable makes Rogin’s interpretation more likely, he just has to publish it, but I wouldn’t be holding my breath if I were you.

In his column, he also mentions the controversy that erupted in 2015 about the kind of research Shi Zhengli’s team has been doing, when a study involving the kind of experiments I described above was published in Nature and some researchers questioned whether creating such chimeras was really worth the risk. The people behind “Project Evidence” also cite this paper as evidence that people at the WIV were experimenting on bat coronaviruses. The problem is that, although people at the WIV were involved in that paper (in particular they provided the spike sequence used to create the chimeric virus), the controversial experiment was performed at the University of North Carolina, not the WIV. Thus, it’s misleading to bring up this study to claim that “other scientists questioned whether Shi’s team was taking unnecessary risks”, when in fact the WIV was not at the center of this controversy but US-based researchers were. In any case, research on SARS-like coronaviruses at the WIV might not even have been conducted at the BSL-4 lab, since this level of biosafety is not required for this kind of work, so even if we knew for a fact there were serious safety issues with the BSL-4 at the WIV, which we don’t, it would be largely irrelevant. The fact that there was not enough properly trained personnel at the WIV to safely operate a BSL-4 lab, even if that were a fact, would not show that research that doesn’t require that safety level could not be conducted safely over there. So even if we set aside the problems I discussed above, related to Rogin’s reliance on possibly misleading leaks by intelligence officials, I don’t think this story is evidence in favor of the lab escape theory in any interesting sense.

[Note: Since I wrote this part, the cables in question were published and, as I anticipated, the interpretation pushed by Rogin in his op-ed was extremely misleading. See part 3 of the version of this essay I published on Quillette for a discussion of what the cables actually show.]

In short, no matter which specific version of the lab theory you prefer, it requires that you make several hypotheses that seem purely ad hoc. In particular, you have to assume that, unbeknownst to anyone else, people at the WIV had discovered and experimented on a coronavirus more closely related to SARS-CoV-2 than RaTG13-CoV, perhaps SARS-CoV-2 itself if you think it evolved naturally. But there doesn’t seem to be any evidence that anything of the sort ever happened, so why on earth would you assume that it did? Sophisticated defenders of the lab escape theory admit there is no direct evidence that it did, but they think the fact that the outbreak started in Wuhan, precisely where one of the few labs where people study bat coronaviruses happen to be, constitute indirect evidence. A semi-formal version of this argument was recently given by Michael Abramowicz in Reason. He uses Bayes’s theorem to compute P(\textrm{WCDCP}|\textrm{HSF}), i. e. the probability that the virus accidentally escaped from the Wuhan Center for Disease Control & Prevention (WCDCP) given that the first cases were detected in the neighborhood where Huanan Seafood Market (HSM) is located, a few hundreds meters from the WCDCP. (Note that, because he thinks the first cases were detected in Huanan Seafood Market, he assumes that, if the virus escaped from a lab, it must have been from the WCDCP and not the WIV, since the former but not the latter is located near Huanan Seafood Market.) Making what he takes to be conservative assumptions, he estimates this probability to be 78.1%, whence he concludes that SARS-CoV-2 was probably released from the WCDCP by accident and that it’s how the outbreak started.

This sounds very convincing, but as I think is often the case with formal arguments, the appearance of precision and rigor is largely illusory. First, Abramowicz thinks the first cases were detected at Huanan Seafood Market, but as we have seen in part 3 this is not the case. We don’t know anything about where the first person known to have been infected with SARS-CoV-2 lived, except that it was somewhere in Wuhan. Thus, we have no reason to assume that, if the virus escaped from a lab, it was from the WCDCP and the probability we should be trying to estimate is P(\textrm{Lab}|\textrm{Wuhan}), i. e. the probability that SARS-CoV-2 escaped from a lab given that it was first detected in Wuhan. But more fundamentally, even if the first known case had really been someone who lived in the neighborhood where Huanan Seafood Market is located, it’s unclear what is the relevant thing to condition on when engaging in that exercise. In other words, why calculate P(\textrm{Lab}|\textrm{HSM}), i. e. the probability that SARS-CoV-2 escaped from a lab given that the first cases were detected in Huanan Seafood Market, rather than P(\textrm{Lab}|\textrm{Wuhan}) or indeed P(\textrm{Lab}|\textrm{HFC}), i. e. the probability that SARS-CoV-2 escaped from a lab given that the first person known to have been infected lived in the particular house where he happens to live? After all, even if the first known case lived in the neighborhood where Huanan Seafood Market is located, it would also be true that he lived in Wuhan and in this particular house. In fact, there are countless different conditional probabilities we could try to calculate, but they could lead to very different conclusions on whether SARS-CoV-2 accidentally escaped from a lab because they would in general not be the same and it’s not clear what principled reason there could be to prefer one to the others. This is known as the reference class problem and, while you may think it’s easy to solve, philosophers have been trying for decades and there is still no widely accepted solution.

Thus, people who think we can use Bayes’s theorem to show that SARS-CoV-2 probably escaped from a lab are fooling themselves, but let’s ignore this problem for the moment and calculate P(\textrm{Lab}|\textrm{Wuhan}), since in fact the first known case had no connection to Huanan Seafood Market and phylogenetic evidence suggests the outbreak didn’t originate from there. Actually, I think it’s useful to look at the question in a different way and ask whether P(\textrm{Lab}|\textrm{Wuhan}) > P(\textrm{!Lab}|\textrm{Wuhan}), i. e. whether SARS-CoV-2 is more likely to have escaped from a lab than to have infected people after a non-laboratory zoonosis given that the first people known to have been infected lived in Wuhan. It’s trivial to show with Bayes’s theorem plus a few algebraic manipulations that P(\textrm{Lab}|\textrm{Wuhan}) > P(\textrm{!Lab}|\textrm{Wuhan}) if and only if \frac{P(Wuhan|!Lab)}{P(Wuhan|Lab)} < \frac{P(Lab)}{P(!Lab)}. This inequality makes it easy to understand what it would take for the lab escape theory to be more likely than a natural origin of the outbreak. Nobody doubts that the probability that the outbreak started in Wuhan if we assume that SARS-CoV-2 did not escape from a lab is smaller than the probability that the outbreak started in Wuhan if we assume that it did escape from a lab. That’s because if the virus didn’t escape from a lab, the outbreak could have started from a lot of different places, but once you assume that it did escape from a lab there are only a handful of labs it could have come from and some of them are in Wuhan. So it should be uncontroversial that \frac{P(Wuhan|!Lab)}{P(Wuhan|Lab)} is less than 1. Similarly, nobody doubts that, if there is an outbreak of SARS-CoV-2 somewhere, it’s far more likely not to have started in a lab than to have originated from a lab. That’s because very few people work in labs where they come in contact with this kind of viruses and they use very strict safety protocols whereas, as we shall see shortly, countless people come in contact with coronaviruses in non-safe environments. So it should also be uncontroversial that \frac{P(Lab)}{P(!Lab)} is less than 1. The question of whether the outbreak started in a lab hinges on which ratio is smaller than the other.

In his calculation, Abramowicz assumes that P(Lab) = 0.0005, so I will do the same. It means that \frac{P(Lab)}{P(!Lab)} is slightly greater than 0.0005, which is indeed much smaller than 1. Since he incorrectly thought that the first known cases of infection by SARS-CoV-2 were associated with Huanan Seafood Market, he didn’t make any assumptions about P(Wuhan|!Lab) and P(Wuhan|Lab), but about P(HSF|!WCDCP) and P(HSF|WCDCP). In the case of P(HSF|WCDCP), he assumed that it was equal to 0.5, i. e. that conditional on the pandemic originating from the WCDCP there was a 50% chances that the first cases would be detected in the neighborhood where Huanan Seafood Market is located. I’m going to assume that P(Wuhan|Lab) = 0.99, i. e. that conditional on the pandemic originating from a lab in Wuhan there was a 99% chances the first cases would be detected in that city. In the case of P(HSF|!WCDCP), he assumed that the probability of a pandemic starting in any location was proportional to the share of world population living in that area, so he divided the population of the neighborhood where Huanan Seafood Market is located by the world population and multiplied the result by 100 to err on the side of caution. I’m going to proceed similarly and divide the population of Wuhan, not by the world population, but by the population of China and South-East Asia (which is ~2 billion), since as far as I know this is the only region where coronaviruses related to SARS-CoV-2 are found. The result is that P(Wuhan|!Lab) is approximately equal to 0.005, so it follows that \frac{P(Wuhan|!Lab)}{P(Wuhan|Lab)} is slightly greater than 0.005. In other words, on those assumptions, \frac{P(Wuhan|!Lab)}{P(Wuhan|Lab)} is greater than \frac{P(Lab)}{P(!Lab)} by a factor of more than 10.

Since this ratio is equal to \frac{P(!Lab|Wuhan)}{P(Lab|Wuhan)}, it means that P(Lab|Wuhan) is approximately equal to 8.5%, which is considerably lower than Abramowicz’s estimate of 78.1%, but I suspect that with more realistic assumptions it would be much lower than that. For instance, in order to compute P(Wuhan|!Lab), I have effectively assumed that any 2 places in China and South East Asia with the same population had the same probability of being the origin of a non-laboratory zoonosis of SARS-CoV-2, but that is clearly false. Indeed, not only are wild animals carrying relatives of SARS-CoV-2 not uniformly distributed over China and South East Asia, but a pandemic is more likely to start in a densely populated area than in a sparsely populated area where the same number of people live, because in a sparsely populated area even if someone is infected by a virus it’s more likely to be a dead-end as there are less people to whom it can be transmitted. However, one might dispute that \frac{P(Lab)}{P(!Lab)} is really as low as I and Abramowicz have assumed, which other things being equal would make P(Lab|Wuhan) higher. The truth is that we have so little data that we can reach totally opposed conclusions by making different but equally plausible assumptions and that’s the important point here. No matter whether you like the lab escape theory or think it’s nonsense, you can always make assumptions that will give you the result you want for which some kind of justification can be given, plug them into Bayes’s theorem and get the conclusion you had already decided was true. Since people are easily impressed by mathematical arguments, this will look convincing to a lot of people, but it won’t actually show anything. Moreover, as I already noted when I talked about the reference class problem, the problem is not just that you have a lot of degrees of freedom in what assumptions you make, but also that you have a lot of degrees of freedom in how you frame the question in the first place.

I think we shouldn’t fool ourselves into thinking that we’re going to prove or, on the contrary, decisively refute that SARS-CoV-2 escaped from a lab with this kind of mathematical tricks. They may have the appearance of rigor, but they are really tools of persuasion. The important point is that we have no evidence whatsoever that SARS-CoV-2 escaped from a lab and, since there would be nothing even remotely surprising about the pandemic starting with a naturally occurring zoonotic spillover event, we have no reason to suppose it did. Back in 2018, Shi Zhengli and her team published a paper that reported the results of a serological study they performed on 218 residents of 4 villages in Yunnan, near caves where they had previous found bat coronaviruses closely related to SARS-CoV-1 and, therefore, also to SARS-CoV-2. They found that 6 people, ~2.7% of the sample, had antibodies for such bat coronaviruses, which suggests they had previously been infected. In another study they published in 2019, they reported the results of a similar serological survey on 1,497 residents of rural areas in 3 provinces of Southern China (Yunnan, Guangxi and Guangdong), which found 9 people, ~0.6% of the sample, who tested positive for antibodies to bat coronaviruses. This may not seem much, but antibodies to coronaviruses apparently wane pretty quickly, so the seroprevalence found by those surveys probably underestimate the proportion of people who have been infected by a bat coronaviruses in those areas. Moreover, there are hundreds of millions of people in rural China, so it means that every year millions of people are probably infected by bat coronaviruses.

The overwhelming majority of those infections never result in a pandemic because those viruses are not very good at infecting humans and/or the people who are infected never transmit them to someone else because they live in low-density areas. (Even with SARS-CoV-2, which is quite good at infecting humans and for which most cases are located in densely populated areas, the vast majority of infections are dead-end.) But it just takes one person who has been infected by a bat coronavirus that, for whatever reason (perhaps it acquired a mutation while replicating in this person’s body), is good at infecting humans and transmits it to enough people, one of whom eventually travels to a densely populated area, to start a pandemic. Of course, such a chain of events is evidently very unlikely or we’d be constantly dealing with pandemics, but if millions of people are infected by bat coronaviruses every year in China it’s to be expected that it will happen from time to time. Presumably, it has been made significantly more likely by economic development and urbanization, which led to human encroachment on the habitat of bats that carry those coronaviruses and increased circulation between rural and urban areas. A likely scenario is that, somewhere in rural China, someone was infected by SARS-CoV-2 or its ancestor and traveled to Wuhan or infected someone who did to sell product or visit his family, until eventually the virus reached Huanan Seafood Market, which served as a springboard from which it spread to the rest of Wuhan and eventually to the entire world. The original spillover event may not even have happened in Wuhan’s countryside or even in Hubei, since Wuhan is a major transportation hub in the middle of China, where a very large number of people from all over China go through every day. We’ll likely never know how the outbreak started, but given how many people are likely infected by bat coronaviruses in China every year, that’s a plausible scenario. By contrast, only a very small number of people work in labs that study bat coronaviruses in Wuhan and, unlike regular people who come in contact with bats, they are trained to use strict safety protocols. So I don’t see why we should regard the lab escape theory as anything other than a theoretical possibility that is not particularly likely.