A quick look at Mumbai’s seroprevalence and fatality data


(Murad Banaji, 31/07/2020)

When we try to interpret Mumbai’s seroprevalence and fatality data in the light of Spain’s age-stratified IFR values, we are led to some interesting observations. Broadly Mumbai appears to have both unexpectedly high, and unexpectedly low fatality rates in different age groups.

Spain’s IFR data is taken as reported here.

Mumbai’s age structure is taken from here. The number in the age-bracket 75-80 is guessed to be 70% of that in the age bracket 70-75.

Seroprevalence data is taken from this survey report.

Age-stratified COVID-19 fatality data is available for Mumbai for July 29th on p19 of this MCGM document. Assuming three weeks from infection to death recording, fatalities recorded by July 29th should reflect infections which had occurred by July 8th, and so it is prevalence on this date that we need.

Prevalence is taken to be 38% in Mumbai by July 8th. This calculation is based on the following data. The headline claim of Mumbai’s seroprevalence study is that 57 per cent of those surveyed in slums and 16 per cent of those surveyed in non-slum areas were IgG seropositive. Further, it is assumed that 41.3% reside in slums. A simple calculation (0.57*0.413 + 0.16*0.587) gives a seroprevalence figure in early-mid July of 33%. However the authors of the study stress that their prevalence estimates do not take into account the sensitivity of the test (given as 93%). Assuming 100% specificity as given by the authors, 57% seropositivity in the slums becomes 61% when corrected; 16% seropositivity in non-slum areas becomes 17% when corrected; and overall 33% seropositivity gets corrected to 35%. Assuming a two week delay between infection and seropositivity on average means that the 35% seropositivity corresponds to 35% prevalence around June 24th. Noting the trajectory of the epidemic and a little modelling gives a prevalence of about 38% by July 8th.

Mumbai’s population is taken to be 13.5 million, an increase of 8% from the 2012 values given here.

Data from the survey does not really allow us to infer how prevalence was distributed across age groups. The report here suggests roughly constant prevalence across age groups in the slums, and a slight drop in prevalence with age in the non-slum areas – from about 16% in the 25-60 age group down to 12.6% in the 60+ age group. But, it may be that prevalence in older groups is considerably lower than in younger groups since, presumably, the slums with high prevalence have a different age-structure, skewed towards the younger end. So far, I have not been able to find a credible age-pyramid for Mumbai’s slums.

Crude IFR estimate for Mumbai from seroprevalence and fatality data. A naive calculation of infection fatality rate (IFR) proceeds as follows. With the assumptions above, seroprevalence on July 8th is reflected in fatalities on July 15th, by which time there had been 5464 recorded COVID-19 deaths in Mumbai. From this we get an IFR of 5464/(0.35*13500000) = 0.12%. Rounded to 2 d.p, this is the same value as I estimated earlier, although there I had used the uncorrected values of prevalence.

Crude IFR estimate for Mumbai from Spanish seroprevalence and IFR data, combined with Mumbai’s age structure. We can match Mumbai’s age structure with Spain’s reported age-dependent IFR values to get an IFR value for Mumbai of 0.22%. This assumes that all age groups are equally likely to get infected.

A crude comparison between the two IFR estimates suggests either that

  1. Approximately 46% of Mumbai’s fatalities have been missed from official figures; or
  2. Mumbai genuinely has lower IFR (perhaps across age groups) than would be expected from Spain’s values; or
  3. a combination of these effects.

But a closer look reveals something more complex. To see this, we’ll see what we can infer from two assumptions. Neither assumption is plausible in itself, but exploring them is still useful.

Scenario 1. As a first thought experiment let’s suppose that Mumbai’s prevalence is uniformly distributed across all age groups. In other words, about 38% were infected in each age group by July 8th. In this case, multiplying the Spanish IFR values by the assumed numbers of infected individuals, we find that there are too many fatalities in the middle age groups (age 40-50: 66% more fatalities than expected, and age 50-60: 60% more fatalities than expected); and not enough in the high ones (age 70-80: 64% fewer fatalities than expected; and age 80+: 87% fewer fatalities than expected).

These calculations give an expected number of fatalities of about 11,506 (as compared to the observed 6244), namely a COVID-19 fatality undercount of about 46% (consistent with the IFR value of 0.22% above). But this raw value actually hides an apparently much higher IFR than Spain’s in medium age groups, and a much lower IFR than expected in older groups. Simply put, in this scenario, the 40-60 age group are facing worse outcomes in Mumbai than in Spain, and the opposite is true for people who are 70+.

One conclusion from these numbers is that since you cannot have COVID-19 death overcounting on a significant scale – or at least it is very unlikely – the crude estimate of 46% fatalities missed is misleading.

However, this scenario is based on assuming uniform prevalence across age groups. As mentioned above, the very different prevalence in slum vs. non-slum areas, and the likely younger demographic in slum areas compared to non-slum areas would imply increased prevalence in younger age groups and reduced prevalence in older groups. So, let’s consider another scenario.

Scenario 2. Let’s suppose that all fatalities have been recorded in Mumbai and ask what this would imply about prevalence in different age groups if we use Spain’s age-stratified IFR values. We just take the deaths in a given age-group and divide by IFR to get estimated prevalence. From these calculations, we would expect high prevalence in the age groups 40-50 (63%) and 50-60 (61%), and much lower prevalence in the age groups 70-80 (14%) and particularly 80+ (5%). These calculations give a population prevalence of 34%, not far from the measured value from the serosurvey.

However, several of the values are inconsistent with the serosurvey data. For example, prevalence of >60% in the age group 40-60 is inconsistent with an approximate prevalence of 60% in this age group in the slums and 16% in non-slum areas (values given in the survey report). Similarly, prevalence of only about 5% in the over 80s would imply a similarly low prevalence in at least the non-slum areas, which seems highly unlikely, (although the seroprevalence data is not given specifically for the over 80s).

Conclusions. Although both experiments turn out to be problematic, they do allow us to say some things with confidence:

  1. Deaths in the 40-60 age range considerably exceed what we would expect using IFR values from Spain. Mumbai’s true IFR in this age range is higher than Spain’s, even without taking into account undercounting of deaths. If such undercounting is significant then the values would go up further. This should be a matter of concern. We should ask: is this about comorbidities in this age group? Is it about unavailability of healthcare? Or are there factors we do not yet fully understand?
  2. Deaths in the 70+ age group are much lower than expected from Spain’s IFR values, even if a significant majority in this age group live in the non-slum areas where prevalence is lower. Either there are specific reasons why fatality rates in this age group should be lower in India than in Spain, even while fatality rates in younger age groups are higher; or there has been significant fatality undercounting amongst elderly COVID-19 patients, perhaps with other underlying health conditions. If/when excess mortality data is made available for Mumbai during this period, it will be important to see its age stratification to check if, indeed, deaths of the elderly are being missed. To the extent that fatalities are low in this age group because the majority reside in low prevalence areas, this should also be a cause for concern, because it means that a large number of people who may potentially have serious outcomes are still vulnerable to infection despite the high prevalence in the city as a whole.

Putting all of this together, Mumbai’s data leads us to some very tentative but worrying conclusions which need to be matched against data from other cities.

  • First, the narrative of low fatality rates in India hides a picture which varies with age. In fact, if Mumbai’s data is anything to go by, India’s fatality rates may be higher than European ones in the age group 40-60.
  • Second, the bulk of the effect leading to apparently low IFR in India may be coming from the relatively low recorded fatalities amongst the elderly. Either there is some hidden variable leading to the elderly in India being somehow less vulnerable to COVID-19; or this is the age group in which fatality undercounting is occurring on the largest scale. This could be consistent with omission of fatalities from official statistics on account of comorbidities; and with families preferring not to have deaths of elderly relatively registered as COVID-19 deaths, of which there are anecdotal reports.

There are three kinds of data which would help to clarify the picture for Mumbai. 1) Accurate age-structure data for Mumbai’s slums and non-slum areas independently. 2) More granular age-stratified seroprevalence data. 3) Age-stratified excess mortality data for Mumbai.