How to estimate R and k from cluster size data using Markov chain Monte Carlo?
Author
Adam Kucharski
Published
November 1, 2024
Ingredients
Use Bayesian estimation methods to estimate the reproduction number (\(R\)) and extent of superspreading, represented by the dispersion of a negative binomial distribution for individual-level seconday cases (\(k\)), from data on MERS-CoV- outbreak clusters.
We will use Markov chain Monte Carlo (MCMC), specifically a simple Metropolis-Hastings algorithm to estimate the parameters.
Steps in code
# Load required packageslibrary(epichains)library(MCMCpack)# Define datamers_clusters =c(rep(1,27),c(2,2),c(3,4),c(4,3),c(5,2),7,13,26)# Show summary table of frequenciesfreq_df <-as.data.frame(table(mers_clusters))names(freq_df) <-c("Cluster size", "Frequency")# Frequencies of MERS Clustersfreq_df#> Cluster size Frequency#> 1 1 27#> 2 2 3#> 3 3 2#> 4 4 2#> 5 5 1#> 6 7 1#> 7 13 1#> 8 26 1# Define likelihood functionlik_function <-function(param) {# Ensure positive parametersif (any(param <=0)) return(-Inf)# Extract values of R and k r_val <-as.numeric(param[1]) k_val <-as.numeric(param[2])# Define likelihood log_likelihood <- epichains::likelihood(chains = mers_clusters,statistic ="size",offspring_dist = rnbinom,size = k_val,mu = r_val )# Assume non-informative priors for R and k log_prior <-0# But could add informative priors here if required# Return log-posterior (log-likelihood + log-prior)return(log_likelihood + log_prior)}# Define number of MCMC iterationsn_iter <-1e4# Define 'burn in' period for fitting, to be discardedn_burn <-1e3# Initial guess for c(R,k):init_param <-c(R=0.5, k=0.5)# Run MCMC to estimate parametersresult_mcmcpack <- MCMCpack::MCMCmetrop1R( lik_function, theta.init = init_param, burnin = n_burn, mcmc = n_iter, thin =1)#> #> #> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#> The Metropolis acceptance rate was 0.57318#> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@# Calculate effective sample size (i.e. measure of MCMC mixing)ess_mcmcpack <- coda::effectiveSize(result_mcmcpack)# Plot posterior estimates# plot(result_mcmcpack)# Define helper function to calculate median and 95% credible interval# from data.frame of MCMC samplesget_param <-function(x){apply(x,2,function(y){ val <-signif(quantile(y,c(0.5,0.025,0.975)),3) val_text <-paste0(val[1], " (95%: CrI: ", val[2], "-", val[3], ")")})}# Get posterior median and 95% CrIposterior_estimates <-get_param(result_mcmcpack)# Compile tableresults_table <-data.frame(Package ="MCMCpack",Posterior_R = posterior_estimates[1],Posterior_k = posterior_estimates[2],ESS_R = ess_mcmcpack[1],ESS_k = ess_mcmcpack[2])# Output the table with kableknitr::kable(results_table, caption ="MCMC Comparison Table", align ='c')
The {epichains} package is loaded for the likelihood functions and {MCMCpack} for MCMC fitting.
The likelihood is defined using the likelihood() function in epichains, with a negative binomial used (rnbinom). This allows us to define the likelihood of observing a specific cluster size distribution, assuming the \(R\) and \(k\)
The MCMC is run using MCMCmetrop1R() from {MCMCpack}, with number of iterations mcmc and burn in period burnin specified. {MCMCpack} is an R package for Bayesian statistical inference through Markov Chain Monte Carlo (MCMC) methods, offering a broad array of algorithms and models for efficient and straightforward Bayesian estimation.
Finally, we output a parameter estimate table with {kable}.