Estimate R and k from cluster size data using Markov chain Monte Carlo

Author

Adam Kucharski

Published

June 26, 2025

What do we have?

Data on MERS-CoV- outbreak clusters

Steps in code

# Load required packages
library(epichains)
library(MCMCpack)

# Define data
mers_clusters = c(rep(1,27),c(2,2),c(3,4),c(4,3),c(5,2),7,13,26)

# Show summary table of frequencies
freq_df <- as.data.frame(table(mers_clusters))
names(freq_df) <- c("Cluster size", "Frequency")

# Frequencies of MERS Clusters
freq_df
#>   Cluster size Frequency
#> 1            1        27
#> 2            2         3
#> 3            3         2
#> 4            4         2
#> 5            5         1
#> 6            7         1
#> 7           13         1
#> 8           26         1

# Define likelihood function
lik_function <- function(param) {
  # Ensure positive parameters
  if (any(param <= 0)) return(-Inf)
  
  # Extract values of R and k
  r_val <- as.numeric(param[1])
  k_val <- as.numeric(param[2])

  # Define likelihood
  log_likelihood <- epichains::likelihood(
    chains = mers_clusters,
    statistic = "size",
    offspring_dist = rnbinom,
    size = k_val,
    mu = r_val
  )
  
  # Assume non-informative priors for R and k
  log_prior <- 0 # But could add informative priors here if required

  # Return log-posterior (log-likelihood + log-prior)
  return(log_likelihood + log_prior)
}
# Define number of MCMC iterations
n_iter <- 1e4

# Define 'burn in' period for fitting, to be discarded
n_burn <- 1e3

# Initial guess for c(R,k):
init_param <- c(R=0.5, k=0.5)
# Run MCMC to estimate parameters
result_mcmcpack <- MCMCpack::MCMCmetrop1R(
  lik_function, 
  theta.init = init_param, 
  burnin = n_burn, 
  mcmc = n_iter, 
  thin = 1
)
#> 
#> 
#> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
#> The Metropolis acceptance rate was 0.57318
#> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

# Calculate effective sample size (i.e. measure of MCMC mixing)
ess_mcmcpack <- coda::effectiveSize(result_mcmcpack)

# Plot posterior estimates
# plot(result_mcmcpack)

# Define helper function to calculate median and 95% credible interval
# from data.frame of MCMC samples
get_param <- function(x){
  apply(x,2,function(y){
    val <- signif(quantile(y,c(0.5,0.025,0.975)),3)
    val_text <- paste0(val[1], " (95%: CrI: ", val[2], "-", val[3], ")")})
}

# Get posterior median and 95% CrI
posterior_estimates <- get_param(result_mcmcpack)

# Compile table
results_table <- data.frame(
  Package = "MCMCpack",
  Posterior_R = posterior_estimates[1],
  Posterior_k = posterior_estimates[2],
  ESS_R = ess_mcmcpack[1],
  ESS_k = ess_mcmcpack[2]
)

# Output the table with kable
knitr::kable(results_table, caption = "MCMC Comparison Table", align = 'c')

MCMC Comparison Table
	Package	Posterior_R	Posterior_k	ESS_R	ESS_k
var1	MCMCpack	0.63 (95%: CrI: 0.439-0.954)	1.13 (95%: CrI: 0.146-15.4)	440.8191	7.772885

Steps in detail

Use Bayesian estimation methods to estimate the reproduction number (\(R\)) and extent of superspreading, represented by the dispersion of a negative binomial distribution for individual-level seconday cases (\(k\)), from data on MERS-CoV- outbreak clusters.
We will use Markov chain Monte Carlo (MCMC), specifically a simple Metropolis-Hastings algorithm to estimate the parameters.
We use MERS cluster sizes from Cauchemez et al, Lancet Inf Dis, 2013.
The {epichains} package is loaded for the likelihood functions and {MCMCpack} for MCMC fitting.
The likelihood is defined using the likelihood() function in epichains, with a negative binomial used (rnbinom). This allows us to define the likelihood of observing a specific cluster size distribution, assuming the \(R\) and \(k\)
The MCMC is run using MCMCmetrop1R() from {MCMCpack}, with number of iterations mcmc and burn in period burnin specified. {MCMCpack} is an R package for Bayesian statistical inference through Markov Chain Monte Carlo (MCMC) methods, offering a broad array of algorithms and models for efficient and straightforward Bayesian estimation.
Finally, we output a parameter estimate table with {kable}.