Did China lie about COVID-19? - Did China fudge the data?

[Note: A shorter, revised and on a few issues more up to date version of this piece has been published by Quillette as a four-part essay. The version on Quillette is more polished and a bit less polemical, but this one has more technical details on some issues, so which version you should read depends on what you’re looking for.]

Another accusation that is often made against China is that it fudged the figures about the epidemic. Back in April, Bloomberg reported that, according to anonymous U.S. officials, China intentionally released incomplete data on both the number of people who had been infected by SARS-CoV-2 and the number who died because of it. This report came as many people had already been claiming for weeks that China was fudging the data about the outbreak in Wuhan and in the rest of the country. Given how often anonymous leaks by members of the U.S. intelligence community have proven misleading or downright false recently, I don’t think we should give any weight to the report about the conclusions reached by the U.S. intelligence services and, since Bloomberg’s sources apparently refused to give any details about how they were reached, I even think it was completely irresponsible for any journalist to publish it. But it doesn’t mean that China didn’t manipulate the data it released and, since for a while those were the only data available on COVID-19 (which means they informed the policy response in other countries), it’s very important to know whether it did. One reason people look at China’s official counts as suspicious is that Beijing is often accused of manipulating economic statistics, although those claims seem very exaggerated. As with the lab escape theory, people have made countless arguments in support of the accusation that China fudged the figures about the epidemic, so I can’t reasonably hope to address all of them, but I will discuss the most common.

One argument I’ve heard is that it’s not plausible for the case fatality rate to be so low in China, when it’s so much higher in France, Italy and several other countries that are much richer than China. However, while it’s true that many countries have a much higher case fatality rate than China, many others don’t:Countries that have a lower case fatality rate than China include developed countries such as South Korea, Australia and Germany. I don’t see any legitimate reason to cherry-pick and only look at the countries that have been the most badly affected, while ignoring all the countries where the case fatality rate is much lower. As we shall see, in this debate, this kind of cherry-picking is a recurring problem.

Perhaps more importantly, between-country comparisons of case fatality rates are largely meaningless, because the number of cases varies greatly between countries for reasons that have nothing to with the actual progress of the pandemic, but with the fact that different countries have different testing policies, definitions of case, availabilities of tests, etc. In fact, those often vary over time in the same country, making even comparisons of the case fatality rate over time within a single country over time problematic. For this kind of reasons, data on the number of cases are very low quality in almost every country, but many people claim that China has deliberately and artificially deflated the number of cases on its territory. A widespread argument to that effect is that, given that China is home to almost 1.4 billion people, it’s unlikely that it only had about 84,000 cases and 4,600 deaths. In general, data on the number of deaths should be more reliable than data on the number of cases, because there is less room for them to be affected by differences in definitions and policies, but people also claim that China’s official death toll is underestimated. In fact, this argument is not just widespread, I think at this point it’s almost treated as received wisdom. Indeed, the case fatality rate is just the number of deaths attributed to COVID-19 divided by the number of cases, so if China underreported cases and deaths in roughly the same proportion, the case fatality rate could easily be within the range observed in the rest of the world even though China’s figures are completely bogus. According to this argument, the official figures released by China hide the true extent of the outbreak, which can’t possibly have been contained so well. But this argument is obviously question-begging, because the extent to which China was able to contain the outbreak within its borders is precisely what is disputed.

Since the case fatality rate is just the number of deaths divided by the number of cases, instead of comparing China’s case fatality rate to that of other countries, one way to approach this problem is to compare China’s number of cases and number of deaths separately with those of other countries. In making this comparison, we should somehow account for the population of each country, because the number of cases/deaths is obviously going to be correlated to the population. However, for reasons I will explain shortly, there is no reason to expect this correlation to be very high and there is no obvious way to do that. For the moment, I will just divide the number of cases/deaths by the population, which is the most straightforward way to account for population. Let’s start with the number of cases and see China’s official tally stands out:As you can see, China’s official number of cases per 100,000 is definitely on the low end, but there are plenty of countries with a similar or lower number of cases per 100,000 (it’s lower in 14 countries out of 208 in the dataset), so there is nothing particularly suspicious about it.

This is even more clear if, instead of comparing China to every other country, you only look at countries in East Asia. This makes sense because the number of cases per capita is correlated within region, while there are stark differences between regions. Indeed, if you only look at East Asian countries, you see that, compared to its neighbors, China’s number of cases per million is totally unremarkable since almost half of the countries in that region have more cases per million:As I already noted, the number of cases in a country is a very noisy indicator, because it’s affected by all sorts of things that have nothing to do with the extent of the epidemic in that country. Nevertheless, if the Chinese government deliberately underestimated the number of cases in China, it’s hardly obvious by looking at the numbers.

Let us now turn to the number of deaths. As I already noted, this is probably a better indicator, because there is less room for differences in definitions and policies to bias the comparison. Moreover, if we’re concerned about deliberate manipulations of the data, data about deaths are harder to fake, because dead people tend to get noticed. However, that is not to say that data about deaths are perfect, there is plenty of noise even in the absence of deliberate manipulations. For instance, not all countries have the resources to systematically test for SARS-CoV-2 recently diseased people, which can make a comparison misleading even if no one is trying to fudge the data. Even if every recently diseased person were tested everywhere, some countries will attribute to COVID-19 any death of someone who tested positive, while others will try to make a determination of the cause of death and only attribute a death to COVID-19 if they decide the person would not have died had they not been infected. Still, data on the number of deaths is probably not as noisy as data on the number of cases, so let’s see how China’s number of deaths per million compares to that of other countries:As with the number of cases per capita, it’s on the low end, but many countries have a similar or even lower number of deaths per million (it’s lower in 40 countries out of 208 in the dataset), so China’s official death toll is hardly the red flag people think.

Again, if we just compare China to its neighbors instead of the entire world, it becomes even more unexceptional since more than half of the countries in East Asia have fewer deaths per million than China:The truth is that, across East Asia, the number of deaths attributed to COVID-19 is remarkably low compared to Europe and the US. Of course, if you compare China to Italy or Belgium, it’s going to look suspicious, but this is only because you’re comparing it to outliers. It’s a bit like comparing a regular person’s 100 meter time to Usain Bolt’s and concluding that he must have some kind of disability because he’s doing so poorly.

Moreover, as I already noted above, looking at the number of cases/deaths per million to account for population is no panacea and can even be very misleading. Indeed, even if one country has twice as many inhabitants as another, you wouldn’t necessarily expect it to have twice as many cases and twice as many deaths. This would be the case if the virus randomly and simultaneously appeared with equal probability at every point in a country, but this is not how it works. Instead, the virus first appears in one or several places, either because that’s where it originated from as in Wuhan or because it was introduced from elsewhere, then it spreads unless something is done to prevent it. Now, compared to most other countries, China is huge and has a huge population. But the virus has mostly been contained in Hubei and, even more specifically, in Wuhan metropolitan area. This area is only home to between 0.8% and 4.1% of the Chinese population, depending on whether you consider only Wuhan itself, the metropolitan area or the province of Hubei. However, when looking at the number of cases/deaths per million, we are divided the number of cases/deaths by the entire population of China. This is bound to make China look better than most countries because most countries aren’t so huge, so once the virus started circulating in one region, a much larger proportion of the population is at risk. For instance, once the virus had established itself in the region of Paris, 18% of the French population was effectively at risk. Yet, when I looked at the number of cases/deaths per capita in China with the number of cases/deaths per capita in France, I divided the number of cases/deaths in both France and China by their entire respective populations. Thus, even if France’s response to the pandemic had not been more incompetent than China’s (which it assuredly was), this alone would have resulted in a higher number of cases/deaths per million.

In almost every country, different regions are very diversely affected by the pandemic, because the virus is not introduced everywhere simultaneously. Once things start going bad in the region or regions where the virus started circulating on a large scale first, the government and people start taking steps to slow down the spread by restricting movements, practicing social distancing, etc. The result is that regions where the virus was introduced first tend to be hit quite hard, while others are barely affected. This will bias a comparison that looks at the number of cases/deaths per capita against large countries with a large population distributed across many different areas. I know people think that, despite what the official figures show, the epidemic was also very bad outside Hubei in China, but frankly this is just nonsense. You may think that, either because the government deliberately fudged the numbers or because data about COVID-19 are very noisy everywhere, the official figures about what happened outside Hubei are not very reliable, but clearly it wasn’t nearly as bad as in Hubei or we’d know it. For instance, if the authorities had been forced to build a hospital in a hurry, as they did in Wuhan, in another province, it’s hard to believe nobody would have noticed. Indeed, when there was a resurgence of the epidemic in Beijing recently, everyone immediately knew it. Using phylogenetic evidence, a recent study about the epidemic in Guangdong, China’s most populous province, also showed that most infections were imported from elsewhere and that local circulation was very limited. This is strong evidence that, as the official figures suggest, China was able to limit the spread of the virus outside Hubei. In almost every country, it’s also the case that some regions were far more badly affected than others, but most countries are not as large and populous as China, so as I explained above dividing the number of cases/deaths by the whole population is going to make the comparison with them misleading.

If we compare the number of cases per 100,000 in the rest of the world with the number of cases per 100,000 not in China as a whole but in Hubei, here is what it looks like:As you can see, while the number of cases per 100,000 in China as a whole is on the low end compared to other countries, the number of cases per 100,000 in Hubei is just slightly below the median of the rest of the world, since 95 countries out of 208 in the dataset have fewer cases per 100,000.

Similarly, if we compare the number of deaths per million in Hubei with that in the rest of the world, we see that it’s well above the median since 148 countries out of 208 in the dataset have fewer deaths per million:Now, this comparison is not fair either, because Hubei was the worst affected region in China, so if we compared the number of cases/deaths per capita in Hubei with the number of cases/deaths per capita in the worst affected regions in other countries, China would look a lot better. I just made it to illustrate the point I made above, namely that it was misleading to compare the number of cases/deaths in China to that in other countries by looking at the number of cases/deaths in each country as a whole, since Hubei is home to a very small part of the Chinese population and the outbreak was mostly contained in Hubei.

When people claim that China is fudging the numbers about the epidemic on the ground that it’s not possible the government managed to contain the virus to the extent suggested by the official figures, they are essentially begging the question, because if the effort to contain the virus wasn’t as successful as the official figures suggest then obviously the figures in question are not accurate. If you want to argue that China manipulated the data about the outbreak, you have to produce independent evidence for that claim, you can’t just assume something that implies it’s false on pain of circularity, but that’s basically what most people do when they accuse China of having manipulated the data. As we have seen, when you compare China’s official number of cases/deaths with the number of cases/deaths recorded in other countries, it doesn’t stand out in a way that warrants the suspicion, let alone the assertion, that Beijing deliberately manipulated the data to make the outbreak look better contained than it actually was. The truth is that, if you gave someone who has lived in a cave for the past few months the official figures about the epidemic in every country without revealing which is which, told them that a country fudged its numbers and asked them to guess which one it was, they would almost certainly not pick China but probably another East Asian country such as Vietnam or Taiwan. I don’t say that because I think that Vietnam and Taiwan fudged their numbers, which I don’t, but because if you just look at the data, they are clear outliers in a way China isn’t.

Moreover, given the length at which the Chinese government went to prevent the spread of the virus after the authorities publicly acknowledged sustained human-to-human transmission was occurring, it’s not particularly surprising that the outbreak was quickly brought under control in Hubei and that circulation of the virus was very limited in the rest of the country. I don’t think most people in the West realize how extreme the measures taken by the Chinese government to contain the outbreak were, but if you want to get a sense of that, this thread on Twitter by Nicholas Christakis is a good place to start. Lockdowns in China were a very different kind of animal than anything that was done anywhere in the West. They were far more restrictive and much more strictly enforced, which should not surprise anyone given the nature of the Chinese regime. Furthermore, many people seem to be under the impression that only Wuhan and other cities in Hubei were forced to go on lockdown, but while those policies varied in strictness geographically, they were far from limited to Hubei. According to a New York Times analysis in April, at this point, residential lockdowns of varying strictness covered 760 million people, more than half of the country’s population, based on announcements made by provinces and major cities. Christakis and his team estimated that more than 930 million people, i. e. 2/3 of the population, were subjected to some kind of movement restrictions. Moreover, while testing capacity was insufficient in Wuhan early on, in the rest of the country, where the authorities were not caught by surprise to the same extent, testing seems to have taken place on a massive scale. For instance, according to a paper on Guangdong I already mentioned, 1.6 million tests were performed in the province between January 30 and March 19, only 1,388 of which came back positive. Thus, the rate of positive tests during this period was less than 1%, which is far less than almost anywhere else and indicates that testing was very extensive.

Another thing people often bring up when they claim China’s official figures about the epidemic are fake is this thread on Reddit. This blog post by a political scientist named Ben Hunt, which summarizes the argument, is also cited a lot. The author of the original Reddit post showed that, if you trained a quadratic model on China’s official data about the number of cases between January 20 and February 4, the fit was almost perfect:Moreover, as you can see on this chart, the prediction of the model out of sample remains pretty good for a few days after February 4, before diverging radically as the number of cases rapidly goes down.

The same thing holds if, instead of looking at the number of cases, you fit a quadratic model to the number of deaths using data between January 20 and February 4 as your training sample:As you can see, the prediction out of sample is pretty good for a few days, then starts underestimating the actual number of deaths by a few hundreds and eventually starts wildly overestimating it as the growth of the number of deaths begins to slow down rapidly around February 24.

People seem to think that the fact that a quadratic model is able to predict the number of cases/deaths very accurately for a few days is strong evidence that China was fabricating the data, but it’s clearly not. As you can see on the charts, the model’s predictions don’t remain accurate for very long and, unless you arbitrarily focus on just one part of the curve, it’s obviously not well described by a quadratic model. When you point out that, after a few days, the predictions of the model start to diverge from reality, people like Hunt just reply that it’s “what you’d expect from a politically adjusted epidemic model over time … at some point you have to show a rate-of-change improvement from your epidemic control measures”. Well, this may be so, but it’s also exactly what you’d expect if China had suddenly taken radical steps to bring the epidemic under control, which indeed we know for a fact is what it did. This is not evidence of fraud, it’s just people doing sloppy analysis and asserting it’s evidence of fraud, which is not the same thing. Showing a nice-looking chart and telling a story that assumes China manipulated the data is not the same thing as proving that it did. If this were evidence of fraud, you could prove that many other countries fabricated their data, since it’s easy to do something very similar with them.

For instance, here is what it looks like with South Korea, when you fit a quadratic model to the official data about the number of cases between February 15 and February 29:As you can see, the predictions out of sample remain very good for a few days, until the growth of the number of cases starts going down very rapidly and the predictions of the model starts departing from reality completely. As in the case of China, when you look at the whole curve, it’s obvious that it’s not well-described by a quadratic model.

The same thing happens again when you fit a quadratic model to data about the number of deaths:Here, the predictions of the model stay relatively good for a longer period of time, but it’s clear that even this part of the curve is more linear than quadratic and, if we looked at it beyond April 1, it would look very similar to China with a collapse of the number of deaths.

Hunt argues that, even more than how well the predictions of the model approximate reality for a few days after February 4, the really damning part of this exercise is that a quadratic model can fit the data in the early phase of the epidemic, even if eventually it departs from reality. He claims this “should be impossible” because, before they are brought under control, epidemics always take the form of some kind of exponential function, not a quadratic function. The problem with this claim is that 1) it’s not true and 2) even if it were, Hunt’s conclusion still wouldn’t follow. First, the early growth dynamics of epidemics is only exponential in naive epidemiological models, but more sophisticated models that, for instance, don’t assume homogenous population mixing predict sub-exponential growth patterns even in the absence of factors that mitigate the transmission rate over time and this is often what data about the early phase of real epidemics show. So the fact that a quadratic model fits the Chinese data about the outbreak during the early phase is not particularly surprising and certainly isn’t evidence that China manipulated the data. Moreover, even if it were true that, during the early phase, epidemics always grew exponentially, the fact that China’s official data fit a quadratic model in the early phase. Indeed, even if the actual number of infections and deaths grew exponentially in the early phase, this wouldn’t necessarily be the case of recorded infections and deaths. As I noted above, even in the absence of deliberate manipulations, data about the number of cases and even data about the number of deaths are very noisy. In particular, data about the number of cases may say more about how testing capacity grows than about the actual number of infections, especially in the case of a novel pathogen for which there is no existing stock of the kind of reagents needed for testing by PCR. So I think it’s clear that, although it may seem prima facie convincing, this argument doesn’t actually show anything.

Another claim several people have made is that Benford’s law shows that China manipulated the data about the epidemic. This law states that, in many naturally occurring numerical datasets, the leading digit $d$ occurs at the frequency $log_{10}(1+\frac{1}{d})$ . It has been found to hold empirically in a variety of contexts and has been used to demonstrate fraud in elections, finance, accounting, etc. I’m no specialist of Benford’s law, but from what I understand, except in some cases, there is no theoretical reason to expect data to obey the law. It’s just that, as a matter of empirical fact, they often do. If we knew that, in the absence of fraud, the number of cases/deaths attributed to COVID-19 should grow exponentially, there would actually be a reason to expect the data to obey Benford’s law. However, as I already noted, there is no reason to expect the number of cases/deaths to grow exponentially, not even in the early phase of the epidemic. Thus, insofar as the failure of a country’s official data to obey Benford’s law is suspicious, it’s only because it often seems to hold even though we generally don’t know why. This means that, even if a country’s data about the epidemic of COVID-19 don’t obey Benford’s law, it would hardly be conclusive evidence that it engaged in fraud. On the other hand, my sense is that it wouldn’t be easy to fabricate data about the epidemic that obey Benford’s law, so I think the fact that a country’s data obey Benford’s law, while hardly conclusive, is better evidence that it didn’t engage in fraud. A few months ago, some people checked whether the Chinese official data obeyed Benford’s law and found that it didn’t (here is one example), which strengthened the belief, already held by most at the time, that China was lying about how bad the outbreak was. However, several papers have since then done a similar analysis and concluded that China’s data obeyed Benford’s law (here, here and here), which suggests they are not fabricated.

I don’t know why some people found that China’s data violated Benford’s law, but for good measure, I did my own analysis using the daily number of new cases/deaths according to the ECDC. As you can see below, where I show the result for the daily number of new cases in a selection of countries, it confirms that China’s day obey Benford’s law:In fact, China’s data obey Benford’s law better than any other country, although this is probably because it has more days of data since the pandemic started over there. A chi-square test leads to the rejection of the hypothesis that the data-generating process obeys Benford’s law in Brazil, Canada, Denmark, France, Germany, Italy, Japan, Spain, Sweden and the United Kingdom. The p-value is also suspiciously low in New Zealand and Thailand.

Here is another chart that shows the result of the analysis for the daily number of new deaths in the same countries:Again, China’s data obey Benford’s law, but a chi-square test suggests a violation of that law in Australia, Brazil, Canada, Denmark, Italy, Japan, New Zealand and South Korea. The p-value is also suspiciously low in Thailand. Again, we have no theoretical reasons to expect the data to obey Benford’s law in any country, so it doesn’t show those countries engaged in fraud and I don’t believe they did. (In the full dataset, Benford’s law is violated in ~42% of the countries for the number of cases and 56% of them for the number of deaths at the conventional level of significance, which suggests that it’s not a good test of fraud in the case of the pandemic of COVID-19.) But it’s interesting that, even after seeing the results of this analysis, nobody is going to accuse any of those countries of fraud, yet that’s exactly what everybody would have done had China’s data violated Benford’s law. I think people should think more about whether their reasons for this double standard are as good as they think.

At the beginning of April, the Chinese health authorities announced they would start including asymptomatic cases in the official number of cases they released every day, which also fueled speculation that China was deliberately fudging the numbers to make the epidemic seem contained when it wasn’t really under control. However, the fact that China didn’t count people who had tested positive for SARS-CoV-2 but didn’t have any symptoms was not a secret, it had been known since February and many articles were published about it after this decision was made. For instance, the New York Times published a story on February 12 about this decision, as did many other news organizations at the time. On February 20, Nature published a piece about the exclusion of asymptomatic cases from official counts, in which some experts were criticizing it while others argued the practice made sense. To be honest, I don’t think it makes any sense, but I also don’t think it’s very likely the Chinese health authorities made the decision to exclude asymptomatic cases from official counts to hide the true extent of the epidemic. Again, you have to keep in mind that it was still early days, so nobody knew much about the virus yet and in particular the role played by asymptomatic carriers of the virus in transmission was still very unclear. (In fact, even 5 months later, the role of asymptomatic carriers in transmission remains much debated.) The Chinese health authorities were clearly struggling to determine how to count infections, as shown by the fact that, between January 15 and February 20, they revised the definition of case seven times.

And while the official number of cases would have been significantly higher, had China used the most inclusive definition throughout the epidemic, not all revisions of the definition resulted in a lower number of cases. In particular, as the New York Times article mentioned above noted, on February 13, the health authorities decided to start counting people in the province of Hubei who hadn’t been laboratory-confirmed as cases if they had enough symptoms consistent with COVID-19, which resulted in a large spike of cases on that day. They did so because there was a shortage of PCR tests in Hubei at the time, so requiring such a test to count someone as a case would have made the epidemic look better than it was. This is not the kind of decision you expect from people who are trying to hide the true extent of the epidemic. Again, I don’t think it was a good idea for China to exclude asymptomatic cases from the official count, but it’s weird to accuse China of trying to make the epidemic more contained than it was because of that when, as a matter of policy, several countries such as France are not even testing asymptomatic people and nobody is accusing them of fudging the numbers. (In fact, not only was France’s policy not to test people with symptoms, but it was to only test people with severe symptoms.) The truth is that, while back in March many people who attacked China’s decision not to count asymptomatic cases speculated that, despite what the official numbers suggested, the epidemic was still raging in several parts of the country, we now know this wasn’t the case since otherwise we would have noticed by then. There has been several localized resurgences of the epidemic in China since the end of March, such as the flare-up in Beijing in June (which forced the authorities to cancel flights and shut down schools again), but so far the authorities were able to contain all of them pretty quickly and didn’t try to hide them as far as we can tell.

People who claim that China’s government lied about the number of people who died of COVID-19 also like to talk about photos published by Caixin at the end of March, when funeral homes in Wuhan reopened after 2 months of lockdown, showing a large number of residents waiting in line to pick up the ashes of their dead and a truck loaded with 2,500 empty urns which it was apparently delivering to a funeral parlor, when the official death toll in the city is only 3,869. As various foreign media outlets reported, such as Radio Free Asia, Chinese social media users estimated that between 42,000 and 46,800 people had died in Wuhan during the lockdown. According to China’s National Bureau of Statistics, the death rate in Hubei was 7 per 1,000 in 2018 (the most recent year for which the data are available), so had it not been for the outbreak approximately 6,400 people should have died in Wuhan — a city of 11 million — during a period of 2 months. (Since mortality is higher in winter and it can vary quite a lot year-to-year, this figure could be a bit higher, but the number of deaths by month is not available, so it’s not possible to remove the seasonal effect.) It means that, even under conservative assumptions, those estimates imply that excess mortality due to COVID-19 was somewhere between 35,600 and 40,400, far more than the official death toll. The official number of deaths attributed to COVID-19 underestimate excess mortality in many countries, but not by a factor of 10, so if those estimates were reliable it would almost certainly mean that China’s authorities lied about how many people died of COVID-19.

The problem is that we have no reason to take those estimates seriously and every reason to dismiss them as baseless speculation. First, they are based on extrapolations based on figures that are very thinly sourced and various assumptions that are both unclear and, to the extent we can make out what they are, often seem very dubious. For instance, according to Radio Free Asia, “social media posts have estimated that all seven funeral homes in Wuhan are handing out 3,500 urns every day in total”. Since “funeral homes have informed families that they will try to complete cremations before the traditional grave-tending festival of Qing Ming on April 5, which would indicate a 12-day process beginning on March 23″, social media users just multiplied 3,500 by 12 and obtained the estimate of 42,000. We have no idea how the estimate of 3,500 urns a day was obtained, so we have no reason to think it’s accurate and, even if it is, this figure was estimated in the first days after funeral homes in Wuhan reopened (Radio Free Asia’s story was published on March 27), so we have no reason to think that funeral homes continued to hand out urns at the same rate after that. This is not serious and, were people not already convinced that China is lying about the outbreak, nobody would take it seriously. In fact, if Chinese state media were publishing the same kind of speculation, everyone would rightly call that propaganda. It’s nothing short of extraordinary that so many reputable news organizations chose to give credence to those estimates by publishing stories about them.

Moreover, if excess mortality due to COVID-19 were really as high as those estimates suggest, it would mean that the IFR in Wuhan was ridiculously high. Indeed, according to a serological survey conducted on 714 healthcare workers in Wuhan between March 30 and April 10 whose results were recently published in Nature, seroprevalence in that group was 3.8%. This is not a random sample but, if anything, healthcare workers were probably more likely to be infected than other people. Thus, if we accept that excess mortality due to COVID-19 was between 35,600 and 40,400 in Wuhan, it means the IFR was between 8.5% and 9.7%. This is much higher than any estimate of the IFR anywhere in the world, a recent meta-analysis estimates the IFR is 0.66%, so those estimates don’t even pass the smell test. (On the other hand, with the official number of deaths, the IFR is approximately 0.9%, which is within the range of what has been observed elsewhere.) Of course, since it comes from China, I imagine that many people will simply reject the seroprevalence estimate. But it’s broadly consistent with the results of another study published by a team of researchers from Hong Kong, which is harder to suspect of manipulation and found that, among 452 Hong Kong residents evacuated from Hubei (80% of which from Wuhan) on March 4-5, 4% were seropositive. Those estimates would also imply there were between 3,236 deaths per million and 3,672 deaths per million in Wuhan, which is significantly more than in NYC, where there were approximately 2,901 deaths per million between March 11 and May 2. Are we really supposed to believe that more people died in Wuhan, where the lockdown was incredibly strict and the sick were quarantined away from even their family in dedicated centers, than in a city where only a very lax shelter in place order was issued and in a state where the governor ordered recovering COVID-19 patients to be sent to nursing homes? Again this is not serious.

Another thing people often bring up when they argue that China deliberately understated the number of deaths caused by the virus is that, on April 17, the Chinese health authorities revised the number of deaths in Wuhan and added 1,290 deaths to the official count. People seem to think it’s evidence that China had previously been hiding the real number of deaths and decided to add a few deaths to the official tally in order to make it more realistic, but frankly that’s a ridiculous argument. Indeed, given that everybody was already accusing them of manipulating the data, it should have been obvious to the Chinese authorities that, on the contrary, any revision would only fuel the suspicion, which indeed is exactly what happened. People also seem to think that the fact the revision consisted in increasing the number of deaths by exactly 50% is somehow evidence of manipulation. As far as I know, nobody has spelled out explicitly what the argument is, but presumably it’s that, conditional on the revision being a manipulation, the probability that it would result in an increase of exactly 50% is higher than the unconditional probability of the same event, so Bayes’s theorem implies that the probability that the revision was a manipulation conditional on the fact that it result in an increase of exactly 50% is higher than the unconditional probability that it was a manipulation. But the assumption on which this whole argument rests is very implausible. As we have seen, if you want to say that China deliberately manipulated the data on the number of deaths, you have to assume it was able to fabricate data that obey Benford’s law. But are we really supposed to believe that people who are able to do that are stupid enough to increase the number of deaths in Wuhan by exactly 50%, because it’s a nice round number, when they add a few deaths to the official count to make it more credible? Color me skeptical.

A far more plausible explanation is simply that, in the middle of the outbreak, it was very difficult for the local health authorities to accurately count the number of people who died of COVID-19 (if only because they lacked tests), so after things calmed down they tried to find people who died of COVID-19 outside hospitals and therefore had not initially been counted, looked at their records to try to find deaths that might be due to COVID-19 but they had initially missed and revised the number of deaths as best they could. In fact, the health authorities have also made large revisions to the number of deaths in many Western countries (such as France, Italy, Spain, the UK and New York), but nobody accused them of manipulating the data. There is no doubt that such revisions will happen again as the health authorities everywhere take the time to look at the data more closely. It’s likely that even China’s revised death toll still underestimate how many people died of COVID-19, but as excess mortality analyses show, this is true almost everywhere. It wouldn’t be surprising if China’s data on the number of deaths caused by COVID-19 were somewhat worse than similar data in Western democracies, not because China manipulated them, but because China’s vital statistics are still very poor compared to what is available in the West. China has a huge population and is still relatively poor, so it didn’t even have a mortality surveillance system that provided representative data on both the total number of deaths and the distribution by cause-of-death until 2013, but even the system that was put in place after that remains sample-based and only covers 24% of the population.

There are other arguments in favor of the view that China manipulated the data about the epidemic, but I have discussed the most common and I don’t think it would add much value to examine them. As I said at the beginning of this section, the people who accuse China of fraud bring up so many different things that it’s not possible to address all of them, but I hope that my discussion will have convinced you that the claim that China deliberately fudged the numbers about the epidemic are not well-supported. If it hasn’t, I doubt that discussing even more arguments would, which is why I won’t. I’m sure that China’s data paint a very imperfect picture of the epidemic over there, but the same thing is true everywhere and I haven’t seen any evidence to convince me that it was worse in China or, to the extent it is, that it’s because the Chinese government deliberately manipulated the data to make the way it handled the crisis look better than it actually was, rather than because it doesn’t have the same resources as richer countries with a much smaller population to monitor epidemics. I’m not saying there weren’t cases of manipulations here and there, it wouldn’t be surprising in such a large, authoritarian country with a largely decentralized structure and as much corruption as China, but the claims of large-scale manipulations strike me as totally unsubstantiated and driven entirely by prejudice against the Chinese regime.

Conclusion

In this essay, I have examined in detail the accusations that are made against China in connection with the pandemic of COVID-19 and tried to remain as impartial as I could in doing so. My conclusion is that, while mistakes were made at the beginning of the crisis and the Chinese authorities were sometimes not entirely forthcoming with information about the epidemic, so there is a grain of truth to some of the accusations, a careful review of the evidence suggests that most of them are either wildly exaggerated, totally unsubstantiated or even completely nonsensical. In particular, the claim that China is somehow responsible for the botched response in most Western countries doesn’t stand up to even the most cursory scrutiny, yet it continues to be made not just by governments trying to scapegoat China for their own incompetence but also by journalists and citizens who should be more concerned about what their own government did or did not do. In many cases, I think it’s clear that, were people not prejudiced against the Chinese regime, they would use more reasonable evidentiary standards and would reject those accusations as unsubstantiated.

Unfortunately, the prejudice against the Chinese government has become so strong and pervasive in the West, especially in the US (where increasingly China is seen as the main geopolitical foe because it threatens the American hegemony), that it creates incentives that are not conducive to a careful examination of the evidence, but allow unsubstantiated allegations that cast China in a dark light to spread largely unchecked, as shown by the polls I mentioned in the introduction of this essay. Indeed, not only does this prejudice mean that people adopt a lower evidentiary standard to examine such allegations, but anyone who points out they are unsubstantiated risks being accused of being China’s dupe. For instance, when I pointed out that a tweet made by the Portuguese commentator Bruno Maçães completely misrepresented the evidence, he simply ironized instead of addressing my point:

This exchange is absolutely remarkable. I catch @MacaesBruno red-handed in a lie, but instead of deleting his tweet and apologizing, he replies with a joke! The specific point he makes really speaks volumes: basically, China is bad, so who cares about the truth? Shame on him. pic.twitter.com/K0PutM824L

— Philippe Lemoine (@phl43) April 23, 2020

Of course, he didn’t address my point because he couldn’t, but the problem is that increasingly people are allowed to get away with that kind of things, because nobody wants to lay himself open to the accusation that he is defending a regime that everybody hates.

In the course of this essay, I have highlighted several cases of people who misrepresented the evidence in a similar way, but the problem is not limited to them. It’s also that even people who know they are misrepresenting the evidence often won’t say it, either because they don’t want to be accused of helping China or because they think the ends justify the means and that since the Chinese regime is evil it’s not a big deal if people lie to make China look bad. This disregard for the truth is very dangerous, particularly on the part of journalists, who are supposed to do their best to discover the truth and not let such ideological considerations get in the way. It also contributes to a feedback loop I have observed in the past few months. People think it makes sense to blame China for the pandemic because they adopt low evidentiary standards when it comes to accusations against China, which makes them hysterical about China, which in turn lead them to lower even more their evidentiary standards, which makes them believe even more nonsensical accusations against China, etc. As I pointed out repeatedly throughout this essay, in many cases, if people had just paused to consider whether the accusations against China made sense, they would have immediately realized that they did not. There are plenty of legitimate criticisms of China to be made, I’m only asking that people focus on those instead of making stuff up and don’t use a double standard when examining the claims against China.

6 thoughts

Gus says:

September 7, 2020 at 4:50 pm

Where you say:

If we compare the number of cases per 100,000 in the rest of the world with the number of cases per 100,000 not in China as a whole but in Hubei, here is what it looks like:

doesn’t the chart immediately below show that Hubei is also lower than the median?
1. Philippe Lemoine says:
  
  September 7, 2020 at 5:52 pm
  
  Oops, you are right, I just corrected what I say below the chart. It doesn’t affect my argument though, since it’s still the case that 95 countries out of 208 in the dataset have fewer cases per 100,000 and, perhaps more importantly, Hubei’s number of deaths per capita is still much higher than the world’s median. Thanks for letting me know about this.
gda53 says:

September 17, 2020 at 1:45 pm

You fail to consider the apparent shut-down of the Wuhan Lab and surroundings in October, which is certainly indicative of an “accident” occurring there. This was widely reported, but perhaps it does not fit with your narrative?

You fail to account for the virus being found in the sewers around the world (Italy, Brazil etc) as early as November.

You assert the “pangolin” theory, which was one postulated early on, but fail to report on the later debunking of that theory (see for example Journal of Medical Virology – article published on 7 July, and Guangdong Academy of Science, published in May in the journal PLOS Pathogens)

You assert that the US authorities may be lying about their findings, but seem not to recognize or accept the enormous worldwide efforts of the CCP to shape the “narrative” i.e. lie themselves and have their factotums (the WHO etc) lie for them .

None of your “evidence” consists of actual evidence that the Wuhan Lab theory is inaccurate. It’s all just conjecture and attack of easy targets. Yet you seem more ready to adhere to the CCP narrative than to Occam’s Razor.

To quote the well-known Norwegian virologist Birger Sørensen (hardly a US pawn) ““I think it’s more than 90 percent certain (that it originated in a lab). It’s at least a far more probable explanation than it having developed this way in nature”

Finally, I find it peculiar that someone who is clearly so intelligent otherwise would appear on his twitter page in a mask. Surely you are aware that masks (certainly the one you are wearing) are useless (or virtually useless) against the virus.

Perhaps it’s a signalling device? Or perhaps you want to hide your face from the twitter world?
Doryphore says:

September 20, 2020 at 12:30 am

“Since the case fatality rate is just the number of deaths divided by the number of cases,…”

I think you meant to say ‘divided by number of “confirmed” cases’? Divided by number of total cases would be the IFR,, not CFR.
1. gda53 says:
  
  September 20, 2020 at 7:10 pm
  
  Fauci made the same “mistake” in his testimony to Congress in March. It’s what launched the campaigns for social distancing, lockdowns, and shelter-in-place orders to varying degrees nationwide.
  
  Lockdowns which were against the “science” before the China virus suddenly were the rage. Now, with hindsight (admittedly) we know (from places like Sweden, Florida, Arizona and Texas) that radical shutdowns really don’t work.
  
  Fauci (friend of China and the WHO) strikes again.
  
  Was it just a stupid error from someone who should be put out to pasture, or was it something more sinister? Given Fauci’s ideological and pecuniary connections to the left, it’s not an unfair question to ponder.
  
  Unlike the probability of a lab origin for the virus, which I would estimate at 80-90%, I would likely put this well under 50%. The probability that Fauci was just a stooge (useful idiot) or incompetent is much higher.
  
  I guess our host is too embarrassed to comment on my last post? Given his apparent non-loon ideological bent, the insistence on a non-lab origin for the Wuhan virus is……peculiar.
  
  It’s OK to admit when you’re wrong – I was wrong about Sweden.
noneandnoone says:

October 2, 2020 at 5:50 am

You really needed to write up a summary. A lot of the essay is not useful or applicable. The bottom line is this. The only real comparison one can make is between equivalent population densities. You cannot base it on countries. You touch on this but sort of ignore it in your conclusions while you revert back to scales that include populations that clearly arent comparable.

The only x factor in all of this is inherent social behavior. But the Chinese aren’t particularly hands off. They are even more touchy than westerners and far more touchy than the Japanese though less touchy than Middle Eastern people, Hispanics and possible eastern Euros. Still this doesnt really help their case.

Those 2 things above are really the biggest factors when calculating anything like a viral outbreak. Why? Because human behavior is not as different as many self important people like to believe. Humans are very similar overall. Much more alike in behavior than different.

Sometimes, when you focus too much on the rain drops, you end up missing the entire ocean.

Comments are closed.

Nec Pluribus Impar

Just because everyone says something doesn't mean it's true

Did China lie about COVID-19? – Did China fudge the data? – Part 5

6 thoughts