--- title: "beers" author: "Wes Hinsley" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{beers} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE, echo = FALSE, results = "hide"} library(beers) knitr::opts_chunk$set( collapse = TRUE, fig.width = 7, fig.height = 5, comment = "#>" ) ``` ## Introduction The beers package provides the Beers ordinary and modified methods for interpolating between 5-yearly points, and subdividing 5-yearly agebands. The most likely usage is with demographic data that has been presented at 5-yearly intervals, or in 5-yearly age-bands. The Beers algorithm can be used to interpolate, or subdivide those data respectively, and is notably used by UNWPP for that purpose. ## Background The original algorithms were created by Henry S. Beers. I have not been able to locate these papers, but for the ordinary and modified methods respectively, the original papers are: * "Discussion of Papers Presented in the Record, No. 68: 'Six-Term Formulas for Routine Actuarial Interpolation', Henry. S. Beers, The Record of the American Institute of Actuaries 34, Part I(69): 59-60, June 1945. * "Modified Interpolation Formulas that Minimize Fourth Differences." Henry S. Beers, The Record of the American Institute of Actuaries 34, Part I(69): 19-29, June 1945. The coefficients are published in: * The Methods and Materials of Demography, 2nd Edition, Editors David A. Swanson and Jacob S. Siegel. Appendix C, Selected General Methods, D.H. Judson and Carole L. Popoff. p728-729. [http://demographybook.weebly.com/uploads/2/7/2/5/27251849/david_a._swanson_jacob_s._siegel_the_methods_and_materials_of_demography_second_edition__2004.pdf] But note that there are two typos in this edition: * p728, Beers Ordinary Interpolation, Middle Interval, N3.0, fourth column should be 0.000, not 1.0000 * p729, Beers Modified Interpolation, Last Interval, N5.6, 5th number should be +.8592, not +.8529. (See the symmetrical entry in First Interval, N1.4) ## Usage - how? Calling the functions is simple. ```{r example} beers_int_ordinary(c(1, 2, 4, 8, 16, 32)) beers_int_modified(c(1, 2, 4, 8, 16, 32)) beers_sub_ordinary(c(10, 20, 40, 80, 160)) beers_sub_modified(c(10, 20, 40, 80, 160)) ``` The interpolations require at least 6 points - for example, population in 1950, 1955, 1960, 1965, 1970 and 1975, hence providing 5 panels between the points, in which to interpolate. Subdivisions require at least 5 points to be subdivided - for example, population in age range 0-4, 5-9, 10-14, 15-19, 20-24. The ordinary algorithms have two particular properties:- * For interpolation, all of the original data points (ie, 1950, 1955, 1960...) are unchanged by the algorithm; interpolation occurs between the points. * For subdivision, every subdivided set of 5 populations sums to the original value - (ie, population for ages 5, 6, 7, 8, 9 will sum to give the original 5-9 value.) The modified algorithms cause extra smoothing to be carried out, such that:- * For interpolation, only the first and final data point's values are preserved; the interpolation provides new values for *all* the intermediate points, including those for which you provided data. * For subdivision, only the first, and the final age-bands have the property that the sub-divided populations sum to the original age-band population. ## Usage - why? When writing the package, the purpose was to replicate as exactly as possible algorithms used by UNWPP in their population interpolations, using the exact values given in the published tables. Presumably these figures are truncated from a some function that is possibly documented in the 1945 paper, if only we could find it. There are other options for interpolation of course, and perhaps the Beers algorithm here is approximately equivalent to one of the interpolation options available in R. ```{r ukr_interp, echo = FALSE} ukr <- data.frame(pop = c(37297648, 40019449, 42662149, 45261935, 47086761, 48758987, 49968812, 50920778, 51464348, 50905677, 48840074, 46892163, 45792501, 44657704, 43579234, 42452647, 41200374, 39896340, 38658013, 37512851, 36415702, 35315013, 34190485, 33061130, 31992330, 31056617, 30287940, 29673481, 29160406, 28678792, 28185563), year = seq(1950, 2100, 5)) ukr$pop = ukr$pop / 1000.0 ordinary <- data.frame(pop = beers_int_ordinary(ukr$pop), year = seq(1950,2100,1)) modified <- data.frame(pop = beers_int_modified(ukr$pop), year = seq(1950,2100,1)) app_lin <- approx(x = ukr$year, y = ukr$pop, xout = seq(1950,2100,1), method="linear") r_spline <- spline(x = ukr$year, y = ukr$pop, n = 151, method = "fmm") plot(x = ukr$year, y = ukr$pop, main = "Interpolated Population of Ukraine", xlab = "Year", ylab = "Population (k)") lines(x = ordinary$year, y = ordinary$pop, col = "red") lines(x = modified$year, y = modified$pop, col = "blue") lines(x = app_lin$x, y = app_lin$y, col = "darkgreen") lines(x = r_spline$x, y = r_spline$y, col = "brown") legend("topright", legend = c("Original", "Beers Ordinary", "Beers Modified", "R approx-linear", "R spline"), col = c("black", "red", "blue", "darkgreen", "brown"), lty = c(NA, 1, 1, 1, 1), pch = c(1, NA, NA, NA, NA) ) plot(x = ukr$year, y = ukr$pop, main = "Interpolated Population of Ukraine", xlim = c(1980,1995), ylim = c(49500,51500), xlab = "Year", ylab = "Population (k)") lines(x = ordinary$year, y = ordinary$pop, col = "red") lines(x = modified$year, y = modified$pop, col = "blue") lines(x = app_lin$x, y = app_lin$y, col = "darkgreen") lines(x = r_spline$x, y = r_spline$y, col = "brown") legend("bottomright", legend = c("Original", "Beers Ordinary", "Beers Modified", "R approx-linear", "R spline"), col = c("black", "red", "blue", "darkgreen", "brown"), lty = c(NA, 1, 1, 1, 1), pch = c(1, NA, NA, NA, NA) ) ``` Results are fairly inconclusive. So, it looks like you should use the Beers library if you really specifically want Beers; otherwise, you might as well look into R's spline function for more flexible and documented options!