finalsize provides methods to calculate the final proportion of individuals in different age groups infected during an epidemic, as predicted by an age-stratified SIR model. In this vignette we lay out the mathematical concepts behind the functionality available in the package.

## From SIR to final size with homogeneous mixing

The dynamics of the proportion of susceptible and infectious individuals over time in the standard susceptible-infectious-recovered (SIR) epidemic model are given by the following ordinary differential equations: $\frac{d S(t)}{d t} = -\beta S(t) I(t) \\ \frac{d I(t)}{d t} = \beta S(t) I(t) -\gamma I(t)$ where $$\beta$$ is the transmission rate and $$\gamma$$ is the rate of recovery.

To calculate the proportion infected, we need to derive an expression for the proportion of susceptibles who got infected during the epidemic, i.e. $$\phi = 1- S(\infty)/S(0)$$. Rather than simulating the model over a very long time period (i.e. $$t \rightarrow \infty$$), we can instead calculate $$dI/dS$$ and integrate:

$\frac{d I}{d S} = - 1 + \frac{\gamma}{\beta S} \\ I(t) = - S(t) + \frac{\gamma}{\beta} \log S(t) + c$ where $$c$$ depends on initial conditions. Under the assumption the population is fully susceptible initially (i.e. $$S(0)=1$$), and $$I(t) \rightarrow 0$$ as $$t \rightarrow \infty$$, we obtain the following: $\log S(\infty) = - R_0 [1-S(\infty) ] \\ \mathrm{log}~\phi -R_0 (1-\phi) = 0 = F(\phi)$ This equation doesn’t have an analytical solution, but there are various methods available to allow it to be solved numerically to find the unique solution in (0,1]. For example, we can use Newton’s method (solver = "newton" in the final_size() function) to find the final epidemic size:

1. Select an arbitrary initial estimate $$x_0$$ and tolerance $$T$$ ;
2. Solve $$\frac{dF}{d \phi} (x_0) \Delta x = −F(x_0) ~ \text{for } ~ \Delta x$$ ;
3. Set $$x_{i+1} = x_i + \Delta x$$;
4. Repeat until $$x_i$$ converges sufficiently (i.e. difference in each step is less than $$T$$)

## Varying susceptibility

We can extend the above formulation to account for variable susceptibility among individuals, following the methods outlined in Miller (2012). First, we define $$p(x)$$ to be the probability density for a randomly chosen individual to be of susceptibility type $$x$$. If we define the probability an individual of type $$x$$ to be ultimately infected as $$\phi(x)$$, then the overall proportion of the population to be infected is defined as follows, depending on whether $$p(x)$$ is a continuous or discrete distribution:

$\hat\phi = \int_0^\infty \phi(x) p(x) dx \\ \text{or} \\ \hat\phi = \sum_x \phi(x) p(x) dx \\$

If we define $$x\hat\phi$$ to be the expected number of transmissions to a given individual of type $$x$$, we can therefore calculate the probability that this individual remains susceptible. This gives the following relationship between $$\phi$$ and $$\hat\phi$$:

$1-\phi(x) = S(x,0) e^{-x \hat\phi } \\$ And hence we can combine the above to give: $\hat\phi = \int_0^\infty (1- S(x,0) e^{-x \hat\phi }) p(x) dx \\ \hat\phi = 1- \int_0^\infty S(x,0) e^{-x \hat\phi } p(x) dx$

Under the assumption that $$S(x,0)$$ is near 1 initially (i.e. f), we therefore obtain: $\hat\phi = 1- \int_0^\infty p(x) e^{-x \hat\phi } dx$

## Final size equation with heterogeneous mixing

We can extend the same derivation for the above homogenous model for populations with multiple age groups, following Andreasen (2011). The age-dependent ODEs are defined as follows:

$\frac{d S(a,t)}{d t} = -S(a,t) \sum_b \beta_{ab} I(b,t) \\ \frac{d I(a,t)}{d t} = S(a,t) \sum_b \beta_{ab} I(b,t) -\gamma I(a,t)$ Where $$\beta_{a,b}$$ is the transmission rate from group $$b$$ to group $$a$$. If we define $$R_{a,b} =\beta_{a,b}/\gamma$$ to be group specific reproduction number, we can rearrange the above and integrate to derive an expression for the final size in each age group:

$- \gamma \sum_b R_{a,b} I(b,t) = \frac{d S(a,t)}{d t} \frac{1}{S(a,t)} \\ - \gamma \sum_b R_{a,b} \int_0^\infty I(b,t) ~dt = \int_0^\infty \frac{d S(a,t)}{d t} \frac{1}{S(a,t)} ~dt = \mathrm{log}~S(a,\infty) -\mathrm{log}~S(a,0) = \mathrm{log}~\frac{S(a,\infty)}{S(a,0)} \\ -\gamma \int_0^\infty I(a,t) ~dt = \int_0^\infty \frac{d S(a,t)}{d t} +\frac{dI(a,t)}{dt} ~dt = S(a,\infty) -S(a,0) ~.$ and hence:

$\mathrm{log}~\frac{S(a,\infty)}{S(a,0)} = - \sum_b R_{a,b} [ S(b,\infty) -S(b,0)]$

If we define $$\underline\phi = (\phi_1, ... \phi_n)$$ to be a vector of final sizes in each of $$n$$ age group, we can rewrite the above as

$\mathrm{log}~\underline\phi -\mathbf{R} (1-\underline\phi) = 0 = F(\underline\phi)$ where $$\mathbf{R}$$ is a matrix with entries $$R_{i,j}$$. Once again, we can use Newton’s method, this time with $$\underline\phi$$ as a vector, and hence we need to solve a set of linear equations in step 2.

## References

Andreasen, Viggo. 2011. “The Final Size of an Epidemic and Its Relation to the Basic Reproduction Number.” Bulletin of Mathematical Biology 73 (10): 2305–21. https://doi.org/10.1007/s11538-010-9623-3.
Miller, Joel C. 2012. “A Note on the Derivation of Epidemic Final Sizes.” Bulletin of Mathematical Biology 74 (9): 2125–41. https://doi.org/10.1007/s11538-012-9749-6.