Calculating a static, delay-adjusted estimate of disease severity
Source:vignettes/estimate_static_severity.Rmd
estimate_static_severity.Rmd
Understanding disease severity, and especially the case fatality risk
(CFR), is key to outbreak response. During an outbreak there is often a
delay between cases being reported, and the outcomes (for CFR, deaths)
of those cases being known, and accounting for these leads to better
estimates of CFR. cfr_static()
can be used to calculate a
static estimate of the severity of an outbreak using methods from Nishiura et al. (2009) while accounting for the
distribution of reporting delays.
New to calculating disease severity using cfr? You might want to see the “Get started” vignette first.
Use case
We want a static estimate of the severity of an outbreak in the form of the case fatality risk (CFR) while correcting for the delay in reporting the outcomes of cases.
What we have
- A time-series of cases and deaths, (cases may be substituted by another indicator of infections over time);
- Data on the distribution of delays, describing the probability an individual will die days after they were initially infected.
First load cfr and packages to access and plot data.
Severity of the 1976 Ebola Outbreak
This example data comes from the 1976 Ebola virus disease (EVD, or Ebola) outbreak in the Democratic Republic of the Congo (Camacho et al. 2014).
We focus on the roughly the first half of this dataset, by subsetting the data so that we only include days before 30th September, 1976.
data("ebola1976")
# view the first few rows
head(ebola1976)
#> date cases deaths
#> 1 1976-08-25 1 0
#> 2 1976-08-26 0 0
#> 3 1976-08-27 0 0
#> 4 1976-08-28 0 0
#> 5 1976-08-29 0 0
#> 6 1976-08-30 0 0
df_ebola_subset <- filter(ebola1976, date <= "1976-09-30")
Onset-to-death delay distribution
We retrieve the parameters of the distribution of durations (also called delays) between the onset of EVD symptoms and death from the literature (Barry et al. 2018). This is a Gamma distribution with shape = 2.40 and scale = 3.33.
Note that while we shall use a continuous distribution here, it is more appropriate to use a discrete distribution instead as we are working with daily data. See the vignette on delay distributions for more on when using a continuous instead of discrete distribution is acceptable, and on using discrete distributions with cfr.
Note also that we use the central estimates for each distribution parameter, and by ignoring uncertainty in these parameters the uncertainty in the resulting CFR is likely to be underestimated.
Intermediate step: Estimating cases with known outcomes
The function estimate_outcomes()
estimates the number of
cases whose outcomes are expected to be known by each day
of an outbreak, given a time-series of case onsets and the distribution
of delays between symptom onset and case outcome. In the context of CFR
estimation, the delay distribution is usually an ‘onset-to-death’
distribution.
The resulting data frame contains two new columns, “estimated_outcomes”, for the number of cases whose outcomes are expected to be known on each day, and “u_t” for the ratio of cumulative number of cases estimated to have known outcomes and the cumulative number of cases reported until each date specified in data.
# calculate known death outcomes
df_estimated_outcomes_ebola <- estimate_outcomes(
data = df_ebola_subset,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)
# print head of data frame
head(df_estimated_outcomes_ebola)
#> date cases deaths estimated_outcomes u_t
#> 1 1976-08-25 1 0 0.00000000 0.00000000
#> 2 1976-08-26 0 0 0.03323030 0.03323030
#> 3 1976-08-27 0 0 0.06494676 0.09817706
#> 4 1976-08-28 0 0 0.08485286 0.18302992
#> 5 1976-08-29 0 0 0.09400738 0.27703731
#> 6 1976-08-30 0 0 0.09515185 0.37218916
# print tail of data frame
tail(df_estimated_outcomes_ebola)
#> date cases deaths estimated_outcomes u_t
#> 32 1976-09-25 11 8 7.787856 0.5283224
#> 33 1976-09-26 7 11 8.301252 0.5563894
#> 34 1976-09-27 7 7 8.625233 0.5840532
#> 35 1976-09-28 4 13 8.761617 0.6207698
#> 36 1976-09-29 4 12 8.659704 0.6552761
#> 37 1976-09-30 3 9 8.371787 0.6904737
The estimated outcomes are lower than the number of cases at the
beginning of an outbreak as the case outcomes are only likely to become
known some days later; this also means that u_t
is likely
to be lower than 1.0 early in the outbreak. The u_t
for an
outbreak that has ended will be much closer to 1.0, as the outcomes of
all reported cases are expected to be known (depending on the quality of
outbreak monitoring). This depends on the distribution of delays between
onset and outcome, and the ratio u_t
will ‘catch up’ to 1.0
faster when the onset-to-outcome delay is short.
Note that the period between onset and death may be shorter than the period between onset and full recovery, and should not be considered equivalent. For CFR estimation, we are primarily interested in the former as the goal is estimating severity in the form of a fatality risk (or ratio for past outbreaks).
Note that estimate_outcomes()
is
exported but is primarily intended for internal use.
Estimating the naive and corrected CFR
The function cfr_static()
wraps the internal function
estimate_outcomes()
to provide a static CFR by
automatically correcting for reporting delays if a delay density
function is provided.
# calculating the naive CFR
cfr_static(
data = df_ebola_subset
)
#> severity_estimate severity_low severity_high
#> 1 0.7197802 0.6485503 0.7836984
# calculating the corrected CFR
cfr_static(
df_ebola_subset,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)
#> severity_estimate severity_low severity_high
#> 1 NA NA NA
Severity estimation methods
cfr_static()
automatically chooses a method of
calculating the severity estimate, based on whether delay correction is
applied, and further when delay correction is applied, depending on the
total number of cases and an initial rough estimate of the severity.
Naive estimate: When delay correction is not applied, the CFR is the ratio of total deaths to total cases, and the confidence interval is given by a Binomial test using
stats::binom.test()
where total deaths are successes, total cases are trials, and the hypothesised success is 1.0Delay correction, small outbreaks: For outbreaks where the total cases are below the user-specified ‘Poisson threshold’ (
poisson_threshold
, default = 100), the CFR and uncertainty around it is taken from a profile likelihood generated from a Binomial model of deaths (successes) and estimated known outcomes (trials).Delay correction, large outbreaks with low severity: For outbreaks with total cases greater than the Poisson threshold (default = 100) and with initial severity estimates < 0.05, the CFR and uncertainty are taken from a Poisson approximation of the Binomial profile likelihood (taking = for estimated outcomes and as the severity estimate).
Delay correction, large outbreaks with higher severity: For outbreaks with total cases greater than the Poisson threshold (default = 100) and with initial severity estimates 0.05, the CFR and uncertainty are taken from a Normal approximation of the Binomial profile likelihood.
Severity of the COVID-19 pandemic in the U.K.
This example shows static severity estimation using cfr and data from the Covid-19 pandemic in the United Kingdom.
We load example Covid-19 daily case and death data provided with the
cfr package as covid_data
, and subset for the
first year of U.K. data.
# get Covid data loaded with the package
data("covid_data")
# filter for the U.K
df_covid_uk <- filter(
covid_data,
country == "United Kingdom", date <= "2020-12-31"
)
# View the first few rows and recall necessary columns: date, cases, deaths
head(df_covid_uk)
#> date country cases deaths
#> 1 2020-01-03 United Kingdom 0 0
#> 2 2020-01-04 United Kingdom 0 0
#> 3 2020-01-05 United Kingdom 0 0
#> 4 2020-01-06 United Kingdom 0 0
#> 5 2020-01-07 United Kingdom 0 0
#> 6 2020-01-08 United Kingdom 0 0
Onset-to-death distribution for Covid-19
We retrieve the appropriate distribution for Covid-19 from Linton et al. (2020); this is a lognormal distribution with = 2.577 and = 0.440.
Note that Linton et al. (2020) fitted a discrete lognormal distribution and we use a continuous distribution, and that we are ignoring uncertainty in the distribution parameters and hence likely under-estimating uncertainty in the CFR.
Estimating the naive and corrected CFR
Finally, we calculate the naive and corrected CFRs for the Covid-19 pandemic in the U.K.
# calculating the naive CFR
cfr_static(
df_covid_uk
)
#> severity_estimate severity_low severity_high
#> 1 0.03640132 0.03617238 0.0366313
# calculating the corrected CFR
cfr_static(
df_covid_uk,
delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440)
)
#> severity_estimate severity_low severity_high
#> 1 0.0465 0.0462 0.0467
Details: Adjusting for delays between two time series
cfr_static()
follows Nishiura et
al. (2009) to calculate a
quantity
for each day within the input data which represents the proportion of
cases with a known adverse outcome (usually death) on day
.
where is the value of the probability mass function at time , and , are the number of new cases and new deaths at time (respectively). We then use in the following likelihood function to estimate severity.
$$ {\sf L}(\theta | C_{t},D_{t},u_{t}) = \log{\dbinom{u_{t}C_{t}}{D_{t}}} + D_{t} \log{\theta} + (u_{t}C_{t} - D_{t})\log{(1 - \theta)}, $$
and are the cumulative number of cases and deaths (respectively) until time .
Lastly (severity) is estimated using simple maximum-likelihood methods, allowing the functions within this package to be quick and easy tools to use.
The precise severity measure — CFR, IFR, HFR, etc — that represents depends upon the input data given by the user.
Case fatality risk (CFR) requires case and death incidence data, with a case-to-death delay distribution (or close approximation, such as symptom onset-to-death).
Infection fatality risk (IFR) requires infection and death incidence data, with an exposure-to-death delay distribution (or close approximation).
Hospitalisation Fatality Risk (HFR) requires hospitalisation and death incidence data, and the appropriate delay distribution (or close approximation).