Estimate R and k from cluster size data using Markov chain Monte Carlo
Author
Adam Kucharski
Published
June 26, 2025
What do we have?
Data on MERS-CoV- outbreak clusters
Steps in code
# Load required packageslibrary(epichains)library(MCMCpack)# Define datamers_clusters =c(rep(1,27),c(2,2),c(3,4),c(4,3),c(5,2),7,13,26)# Show summary table of frequenciesfreq_df <-as.data.frame(table(mers_clusters))names(freq_df) <-c("Cluster size", "Frequency")# Frequencies of MERS Clustersfreq_df#> Cluster size Frequency#> 1 1 27#> 2 2 3#> 3 3 2#> 4 4 2#> 5 5 1#> 6 7 1#> 7 13 1#> 8 26 1# Define likelihood functionlik_function <-function(param) {# Ensure positive parametersif (any(param <=0)) return(-Inf)# Extract values of R and k r_val <-as.numeric(param[1]) k_val <-as.numeric(param[2])# Define likelihood log_likelihood <- epichains::likelihood(chains = mers_clusters,statistic ="size",offspring_dist = rnbinom,size = k_val,mu = r_val )# Assume non-informative priors for R and k log_prior <-0# But could add informative priors here if required# Return log-posterior (log-likelihood + log-prior)return(log_likelihood + log_prior)}# Define number of MCMC iterationsn_iter <-1e4# Define 'burn in' period for fitting, to be discardedn_burn <-1e3# Initial guess for c(R,k):init_param <-c(R=0.5, k=0.5)# Run MCMC to estimate parametersresult_mcmcpack <- MCMCpack::MCMCmetrop1R( lik_function, theta.init = init_param, burnin = n_burn, mcmc = n_iter, thin =1)#> #> #> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#> The Metropolis acceptance rate was 0.57318#> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@# Calculate effective sample size (i.e. measure of MCMC mixing)ess_mcmcpack <- coda::effectiveSize(result_mcmcpack)# Plot posterior estimates# plot(result_mcmcpack)# Define helper function to calculate median and 95% credible interval# from data.frame of MCMC samplesget_param <-function(x){apply(x,2,function(y){ val <-signif(quantile(y,c(0.5,0.025,0.975)),3) val_text <-paste0(val[1], " (95%: CrI: ", val[2], "-", val[3], ")")})}# Get posterior median and 95% CrIposterior_estimates <-get_param(result_mcmcpack)# Compile tableresults_table <-data.frame(Package ="MCMCpack",Posterior_R = posterior_estimates[1],Posterior_k = posterior_estimates[2],ESS_R = ess_mcmcpack[1],ESS_k = ess_mcmcpack[2])# Output the table with kableknitr::kable(results_table, caption ="MCMC Comparison Table", align ='c')
MCMC Comparison Table
Package
Posterior_R
Posterior_k
ESS_R
ESS_k
var1
MCMCpack
0.63 (95%: CrI: 0.439-0.954)
1.13 (95%: CrI: 0.146-15.4)
440.8191
7.772885
Steps in detail
Use Bayesian estimation methods to estimate the reproduction number (\(R\)) and extent of superspreading, represented by the dispersion of a negative binomial distribution for individual-level seconday cases (\(k\)), from data on MERS-CoV- outbreak clusters.
We will use Markov chain Monte Carlo (MCMC), specifically a simple Metropolis-Hastings algorithm to estimate the parameters.
The {epichains} package is loaded for the likelihood functions and {MCMCpack} for MCMC fitting.
The likelihood is defined using the likelihood() function in epichains, with a negative binomial used (rnbinom). This allows us to define the likelihood of observing a specific cluster size distribution, assuming the \(R\) and \(k\)
The MCMC is run using MCMCmetrop1R() from {MCMCpack}, with number of iterations mcmc and burn in period burnin specified. {MCMCpack} is an R package for Bayesian statistical inference through Markov Chain Monte Carlo (MCMC) methods, offering a broad array of algorithms and models for efficient and straightforward Bayesian estimation.
Finally, we output a parameter estimate table with {kable}.