Skip to contents

ColOpenData is a package designed to access curated and wrangled Colombian demographic, geospatial, climate and population projections data, retrieved from various open Colombian data sources. The package addresses the challenge of scattered Colombian data across multiple web sources by providing functions that enable users to select and load desired datasets without the need for extensive data acquisition processes. Additionally, the tidy data structure offered for demographic and climate data facilitates analysis and visualization.

ColOpenData is developed at Universidad de Los Andes as part of the Epiverse-TRACE program.

Installation

You can install the CRAN version of ColOpenData with:

install.packages("ColOpenData")
#> Installing package into 'C:/Users/Julia/AppData/Local/Temp/RtmpwZnATh/temp_libpath46443860632b'
#> (as 'lib' is unspecified)
#> Warning: package 'ColOpenData' is not available for this version of R
#> 
#> A version of this package for your version of R might be available elsewhere,
#> see the ideas at
#> https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

You can also install the development version of ColOpenData from GitHub with:

# install.packages("pak")
pak::pak("epiverse-trace/ColOpenData")

Quick Overview

ColOpenData contains data from two public data sources: The National Administrative Department of Statistics (DANE), and the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM). The available data is divided in four categories:

  • Demographic: Demographic and Socioeconomic data presents information from the National Population and Dwelling Census (CNPV) of 2018. The CNPV data corresponds to the most recent census available to date and the information is presented as an answer to three questions: How many are we?, Where are we? and How do we live?

  • Geospatial: This data is retrieved from the National Geostatistical Framework (MGN), which includes maps and a summarized version of the 2018 CNPV, aggregated to spatial geometries. The data is available at different aggregation levels including: Blocks, Urban and Rural Sections, Urban and Rural Sectors, Urban Areas, Municipalities and Departments.

  • Climate: Climate data is recovered from backup information provided by IDEAM, containing historical data from the first station in the country (January 1st 1920) until May 31st 2023. This backup includes temperature, precipitation, sunshine duration, wind direction, among others..

  • Population projections: Population Projections data contains the population projections and back projections from 1950 to 2070, considering the post COVID-19 update, which was calculated based on the results of 2018 CNPV. Further information can be consulted at DANE website.

Documentation and vignettes are available for the modules in the user vignettes.

Similar R packages are offered for international communities, allowing the user to download census, geospatial and climate data.

Lifecycle

This package is currently experimental, as defined by the RECON software lifecycle. Therefore, this is a functional draft and can be tested outside of the development team. However, it still may change over time.

Contributions

Contributions are welcome via pull requests.

Code of Conduct

Please note that the ColOpenData project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Funding

This work is part of the TRACE-LAC research project funded by the International Research Centre (IDRC) Ottawa, Canada.[109848-001-].