Skip to contents

Understanding Demographic and Geospatial Datasets

Demographic and geospatial data is provided from multiple datasets of the same source. To download the data, a naming framework has been implemented, which includes the source, group, year and final details for individual identification. Details are different for every dataset and are related to the internal information they contain. The general frame can be used as follows:

SOURCE_GROUP_YEARS_DETAILS

This naming framework is only used for the functions download_demographic and download_geospatial. For hands on examples please check A Deep Dive into Colombian Demographics Using ColOpenData and Maps and plots with ColOpenData

Demographic

Demographic datasets are available for municipalities and departments, and contain data for Dwellings, Households, Population and Population Projections in five categories.

  • Viviendas (Dwellings)
  • Hogares (Households)
  • Personas Social (Persons social)
  • Personas Demográfico (Persons demographic)
  • Proyecciones Poblacionales (Population Projections)

All datasets are retrieved from the National Administrative Department of Statistics (DANE). Naming is stated as follows:

  • Source: DANE
  • Group: Names include the categories
    • Viviendas: CNPVV
    • Hogares: CNPVH
    • Personas Social: CNPVPS
    • Personas Demográfico: CNPVPD
    • Proyecciones Poblacionales PP
  • Year:
    • Census data: 2018
    • Population projections: Various (see list_datasets below)
  • Details: These are related to each individual dataset. For further details please check the function list_datasets below.

Geospatial

Geospatial datasets naming is related to the level of aggregation, since they are available from Blocks to Departments. All these datasets come from DANE, and are part of the Geostatistical National Framework (MGN), which for 2018 included a summarized version of the National Population and Dwelling Census (CNPV). Naming is stated as follows:

  • Source: DANE
  • Group: MGN
  • Year: 2018
  • Details: These are related to the level of aggregation, and can be consulted using the function list_datasets.

Understanding Climate Dataset

This module’s data is stored in an unique dataset, and the information required to use the related functions is the area of interest, dates, and tags be consulted. Therefore, this module does not use the same naming framework. However, individual tags are required to download data and include:

Tags Variable
TSSM_CON Dry-bulb Temperature
THSM_CON Wet-bulb Temperature
TMN_CON Minimum Temperature
TMX_CON Maximum Temperature
TSTG_CON Dry-bulb Temperature (Termograph)
HR_CAL Relative Humidity
HRHG_CON Relative Humidity (Hydrograph)
TV_CAL Vapour Pressure
TPR_CAL Dew Point
PTPM_CON Precipitation (Daily)
PTPG_CON Precipitation (Hourly)
EVTE_CON Evaporation
FA_CON Atmospheric Phenomenon
NB_CON Cloudiness
RCAM_CON Wind Trajectory
BSHG_CON Sunshine Duration
VVAG_CON Wind Speed
DVAG_CON Wind Direction
VVMXAG_CON Maximum Wind Speed
DVMXAG_CON Maximum Wind Direction

These tags are meant to be used for download using download_climate, download_climate_geom and download_climate_stations. See How to download climate data using ColOpenData for further details.

List Data

To check available datasets you can use the list_datasets function. This function can also be filtered with the module parameter to indicate a specific module. Default is "all", but can be filtered by "demographic", "geospatial" and "climate".

# List all datasets
datasets <- list_datasets()
head(datasets)
#>                   name      group source year         level category
#> 1   DANE_MGN_2018_DPTO geospatial   DANE 2018    department     maps
#> 2   DANE_MGN_2018_MPIO geospatial   DANE 2018  municipality     maps
#> 3 DANE_MGN_2018_MPIOCL geospatial   DANE 2018  municipality     maps
#> 4   DANE_MGN_2018_SETU geospatial   DANE 2018  urban_sector     maps
#> 5   DANE_MGN_2018_SETR geospatial   DANE 2018  rural_sector     maps
#> 6   DANE_MGN_2018_SECU geospatial   DANE 2018 urban_section     maps
#>                                                                                 description
#> 1              Geographical and summarised census data from 2018 at the level of department
#> 2            Geographical and summarised census data from 2018 at the level of municipality
#> 3 Geographical and summarised census data from 2018 at the level of municipality with class
#> 4            Geographical and summarised census data from 2018 at the level of urban sector
#> 5            Geographical and summarised census data from 2018 at the level of rural sector
#> 6           Geographical and summarised census data from 2018 at the level of urban section
# List only geospatial datasets
geo_datasets <- list_datasets(module = "demographic")
head(geo_datasets)
#>                   name       group source year        level   category
#> 11 DANE_CNPVH_2018_1HD demographic   DANE 2018   department households
#> 12 DANE_CNPVH_2018_1HM demographic   DANE 2018 municipality households
#> 13 DANE_CNPVH_2018_2HD demographic   DANE 2018   department households
#> 14 DANE_CNPVH_2018_2HM demographic   DANE 2018 municipality households
#> 15 DANE_CNPVH_2018_3HD demographic   DANE 2018   department households
#> 16 DANE_CNPVH_2018_3HM demographic   DANE 2018 municipality households
#>                                                                                                                      description
#> 11   Number of households with persons under 15 years of age and number of persons under 15 years of age, by department and area
#> 12 Number of households with persons under 15 years of age and number of persons under 15 years of age, by municipality and area
#> 13                      Number of households with senior citizens and number of persons aged 60 and over, by department and area
#> 14                    Number of households with senior citizens and number of persons aged 60 and over, by municipality and area
#> 15                                                                                Households by headship, by department and area
#> 16                                                                              Households by headship, by municipality and area

We highly recommend using View() instead of head() in the local environment for a cleaner and easier visualization of the information.

Using this function, we can retrieve all names, source, aggregation level and information for individual datasets.

List Data Using Keywords

Sometimes, going through each dataset to find specific information can be tiring. If you want to look for an specific word or set of words within datasets quickly, you can use the look_upfunction, which takes by parameter:

  1. The module you wish to search within (default is "all").

  2. The word (or words) you are interested in (input as a character or vector of characters).

  3. The search condition: "and" to find datasets containing all specified words, or "or" to find datasets containing any of the specified words (default is "or"). If you are searching for a single word, you can use either "and"or "or" for this parameter.

# List all datasets, no matter the module, that present the word "age"
age_datasets <- look_up(keywords = "age")
head(age_datasets)
#>                    name       group source year        level
#> 11  DANE_CNPVH_2018_1HD demographic   DANE 2018   department
#> 12  DANE_CNPVH_2018_1HM demographic   DANE 2018 municipality
#> 13  DANE_CNPVH_2018_2HD demographic   DANE 2018   department
#> 14  DANE_CNPVH_2018_2HM demographic   DANE 2018 municipality
#> 17 DANE_CNPVPD_2018_1PD demographic   DANE 2018   department
#> 18 DANE_CNPVPD_2018_1PM demographic   DANE 2018 municipality
#>               category
#> 11          households
#> 12          households
#> 13          households
#> 14          households
#> 17 persons_demographic
#> 18 persons_demographic
#>                                                                                                                      description
#> 11   Number of households with persons under 15 years of age and number of persons under 15 years of age, by department and area
#> 12 Number of households with persons under 15 years of age and number of persons under 15 years of age, by municipality and area
#> 13                      Number of households with senior citizens and number of persons aged 60 and over, by department and area
#> 14                    Number of households with senior citizens and number of persons aged 60 and over, by municipality and area
#> 17                          Total census population, by department, area, age group, masculinity and femininity indexes, and sex
#> 18                        Total census population, by municipality, area, age group, masculinity and femininity indexes, and sex
# List all datasets in geospatial module that present the word "department"
urban_datasets <- look_up(module = "geospatial", keywords = "urban")
head(urban_datasets)
#>                 name      group source year         level category
#> 4 DANE_MGN_2018_SETU geospatial   DANE 2018  urban_sector     maps
#> 6 DANE_MGN_2018_SECU geospatial   DANE 2018 urban_section     maps
#> 9   DANE_MGN_2018_ZU geospatial   DANE 2023    urban_zone     maps
#>                                                                       description
#> 4  Geographical and summarised census data from 2018 at the level of urban sector
#> 6 Geographical and summarised census data from 2018 at the level of urban section
#> 9    Geographical and summarised census data from 2018 at the level of urban zone
# List all datasets in demographic module that present the word "department"
area_sex_datasets <- look_up(
  module = "demographic", keywords = c("area", "sex"),
  logic = "and"
)
head(area_sex_datasets)
#>                    name       group source year        level
#> 17 DANE_CNPVPD_2018_1PD demographic   DANE 2018   department
#> 18 DANE_CNPVPD_2018_1PM demographic   DANE 2018 municipality
#> 20 DANE_CNPVPD_2018_3PD demographic   DANE 2018   department
#> 21 DANE_CNPVPD_2018_3PM demographic   DANE 2018 municipality
#> 22 DANE_CNPVPD_2018_4PD demographic   DANE 2018   department
#> 23 DANE_CNPVPD_2018_4PM demographic   DANE 2018 municipality
#>               category
#> 17 persons_demographic
#> 18 persons_demographic
#> 20 persons_demographic
#> 21 persons_demographic
#> 22 persons_demographic
#> 23 persons_demographic
#>                                                                                                                       description
#> 17                           Total census population, by department, area, age group, masculinity and femininity indexes, and sex
#> 18                         Total census population, by municipality, area, age group, masculinity and femininity indexes, and sex
#> 20                                                           Total census population, by department, age group, age, area and sex
#> 21                                                         Total census population, by municipality, age group, age, area and sex
#> 22   Census population in particular households, by relationship or kinship to the head of household, by department, area and sex
#> 23 Census population in particular households, by relationship or kinship to the head of household, by municipality, area and sex

Dictionaries

Dictionaries are provided to understand some tags and column names inside each module. These dictionaries are provided in Spanish, since they are retrieved directly from the sources.

Demographic

Dictionaries are not provided for this module since they are not needed. Demographic datasets include comprehensive column names and variables, and they are self-contained.

Geospatial

Datasets inside the geospatial module contain a summarized version of the census and a dictionary is needed to understand all aggregated variables. These dictionaries contain the necessary metadata to use the available information. To retrieve them, we can use the dictionaryfunction, using the dataset name as a parameter:

dict_mpio <- dictionary("DANE_MGN_2018_MPIO")
head(dict_mpio)
#>     variable         tipo longitud
#> 1 DPTO_CCDGO         Text        2
#> 2 MPIO_CCDGO         Text        3
#> 3 MPIO_CNMBR         Text      250
#> 4 MPIO_CDPMP         Text        5
#> 5    VERSION Long Integer       NA
#> 6       AREA       Double       NA
#>                                                                                     descripcion
#> 1                                                                       Código del departamento
#> 2                                                            Código que identifica al municipio
#> 3                                                                          Nombre del municipio
#> 4                                                Código concatenado que identifica al municipio
#> 5                                                              Año de la información geográfica
#> 6 Área del municipio en metros cuadrados  (Sistema de coordenadas planas MAGNA_Colombia_Bogota)
#>   categoria_original
#> 1               <NA>
#> 2               <NA>
#> 3               <NA>
#> 4               <NA>
#> 5               <NA>
#> 6               <NA>

Climate

Climate data is not stored in multiple datasets but as an unique dataset with numerous tags. These tags can also be consulted through the dictionary function using the name of the only climate dataset.

dict_climate <- dictionary("IDEAM_CLIMATE_2023_MAY")
head(dict_climate)
#>   etiqueta                                   variable
#> 1 TSSM_CON                Temperatura seca (ambiente)
#> 2 THSM_CON                         Temperatura húmeda
#> 3  TMN_CON                         Temperatura mínima
#> 4  TMX_CON                         Temperatura máxima
#> 5 TSTG_CON Temperatura seca (ambiente) del termógrafo
#> 6   HR_CAL                           Humedad relativa
#>                                frecuencia
#> 1 Horaria (07:00, 13:00, 18:00 y/o 19:00)
#> 2 Horaria (07:00, 13:00, 18:00 y/o 19:00)
#> 3                                  Diaria
#> 4                                  Diaria
#> 5                      Horaria (24 horas)
#> 6 Horaria (07:00, 13:00, 18:00 y/o 19:00)