Documentation and Dictionaries • ColOpenData

Naming and structure

Understanding Demographic Datasets

Demographic data is provided from multiple datasets of the same source. To download the data, a naming framework has been implemented, which includes the source, group, year and final details for individual identification. Details are different for every dataset and are related to the internal information they contain. The general frame can be used as follows:

SOURCE_GROUP_YEARS_DETAILS

Demographic datasets are available for municipalities and departments, and contain data for Dwellings, Households, Population and Population Projections in five categories.

Viviendas (Dwellings).
Hogares (Households).
Personas Social (Persons Social).
Personas Demográfico (Persons Demographic).

All datasets are retrieved from Departamento Administrativo Nacional de Estadística (DANE). Naming is stated as follows:

Source: DANE.
Group: Names include the categories.
- Viviendas: CNPVV.
- Hogares: CNPVH.
- Personas Social: CNPVPS.
- Personas Demográfico: CNPVPD.
Year:
- Census data: 2018
Details: These are related to each individual dataset. For further details please check the function list_datasets() below.

For hands on examples please check A Deep Dive into Colombian Demographics Using ColOpenData.

Understanding Geospatial Datasets

Geospatial datasets naming is related to the level of aggregation, since they are available from Blocks to Departments. All these datasets come from DANE, and are part of the National Geostatistical Framework (MGN), which for 2018 included a summarized version of the National Population and Dwelling Census (CNPV). Available spatial levels include: department, municipality, urban and rural sector, urban and rural section, urban zone and blocks. Please check Maps and plots with ColOpenData for further details.

Understanding Climate Dataset

This module’s data is retrieved from Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM) and is stored in an unique dataset, and the information required to use the related functions is the area of interest, dates, and tags to be consulted. Individual tags are required to download data and include:

Tags	Variable
TSSM_CON	Dry-bulb Temperature
THSM_CON	Wet-bulb Temperature
TMN_CON	Minimum Temperature
TMX_CON	Maximum Temperature
TSTG_CON	Dry-bulb Temperature (Termograph)
HR_CAL	Relative Humidity
HRHG_CON	Relative Humidity (Hydrograph)
TV_CAL	Vapour Pressure
TPR_CAL	Dew Point
PTPM_CON	Precipitation (Daily)
PTPG_CON	Precipitation (Hourly)
EVTE_CON	Evaporation
FA_CON	Atmospheric Phenomenon
NB_CON	Cloudiness
RCAM_CON	Wind Trajectory
BSHG_CON	Sunshine Duration
VVAG_CON	Wind Speed
DVAG_CON	Wind Direction
VVMXAG_CON	Maximum Wind Speed
DVMXAG_CON	Maximum Wind Direction

These tags are meant to be used for download using download_climate(), download_climate_geom() and download_climate_stations(). See How to download climate data using ColOpenData for further details.

Understanding Population Projections

Population projections and back-projections retrieved from DANE are available for national, department and municipality levels, and divided by sex and ethnicity (the latter is only available for municipalities). The names of the datasets relate to the source, years included, sex and ethnicity.

For examples on how to consult the data please refer to Population Projection with ColOpenData

List Data

To check available datasets you can use the list_datasets() function. The associated information can be filtered with the module parameter to indicate a specific module. Default is "all", but can be filtered by "demographic", "geospatial", "climate" and "population_projections". This function can also be presented both in English (EN) and Spanish (ES) with the language parameter. Default is "ES", but can be "EN" as well.

library(ColOpenData)

datasets <- list_datasets(language = "EN")

head(datasets)
#> # A tibble: 6 × 7
#>   name                 group      source year  level        category description
#>   <chr>                <chr>      <chr>  <chr> <chr>        <chr>    <chr>      
#> 1 DANE_MGN_2018_DPTO   geospatial DANE   2018  department   maps     Geographic…
#> 2 DANE_MGN_2018_MPIO   geospatial DANE   2018  municipality maps     Geographic…
#> 3 DANE_MGN_2018_MPIOCL geospatial DANE   2018  municipalit… maps     Geographic…
#> 4 DANE_MGN_2018_SETU   geospatial DANE   2018  urban_sector maps     Geographic…
#> 5 DANE_MGN_2018_SETR   geospatial DANE   2018  rural_sector maps     Geographic…
#> 6 DANE_MGN_2018_SECU   geospatial DANE   2018  urban_secti… maps     Geographic…

To list only demographic datasets we can use:

demographic_datasets <- list_datasets(module = "demographic", language = "EN")

head(demographic_datasets)
#> # A tibble: 6 × 7
#>   name                group       source year  level        category description
#>   <chr>               <chr>       <chr>  <chr> <chr>        <chr>    <chr>      
#> 1 DANE_CNPVH_2018_1HD demographic DANE   2018  department   househo… Number of …
#> 2 DANE_CNPVH_2018_1HM demographic DANE   2018  municipality househo… Number of …
#> 3 DANE_CNPVH_2018_2HD demographic DANE   2018  department   househo… Number of …
#> 4 DANE_CNPVH_2018_2HM demographic DANE   2018  municipality househo… Number of …
#> 5 DANE_CNPVH_2018_3HD demographic DANE   2018  department   househo… Households…
#> 6 DANE_CNPVH_2018_3HM demographic DANE   2018  municipality househo… Households…

We highly recommend using View() instead of head() in the local environment for a cleaner and easier visualization of the information.

Using this function, we can retrieve all names, source, aggregation level and information for individual datasets.

List Data Using Keywords

Sometimes, going through each dataset to find specific information can be tiring. If you want to look for an specific word or set of words within datasets quickly, you can use the look_up() function, which takes by parameter:

The word (or words) you are interested in (input as a character or vector of characters).
The module you wish to search within (default is "all").
The search condition: "and" to find datasets containing all specified words, or "or" to find datasets containing any of the specified words (default is "or"). If you are searching for a single word, you can use either "and"or "or" for this parameter.
The language the keywords would be, can be "EN" or "ES" (default is "EN").

age_datasets <- look_up(keywords = "age")

head(age_datasets)
#> # A tibble: 6 × 7
#>   name                 group       source year  level       category description
#>   <chr>                <chr>       <chr>  <chr> <chr>       <chr>    <chr>      
#> 1 DANE_CNPVH_2018_1HD  demographic DANE   2018  department  househo… Number of …
#> 2 DANE_CNPVH_2018_1HM  demographic DANE   2018  municipali… househo… Number of …
#> 3 DANE_CNPVH_2018_2HD  demographic DANE   2018  department  househo… Number of …
#> 4 DANE_CNPVH_2018_2HM  demographic DANE   2018  municipali… househo… Number of …
#> 5 DANE_CNPVPD_2018_1PD demographic DANE   2018  department  persons… Total cens…
#> 6 DANE_CNPVPD_2018_1PM demographic DANE   2018  municipali… persons… Total cens…

We can specify a module to make a more narrow and precise search.

area_sex_datasets <- look_up(
  keywords = c("area", "sex"),
  module = "demographic",
  logic = "and",
  language = "EN"
)

head(area_sex_datasets)
#> # A tibble: 6 × 7
#>   name                 group       source year  level       category description
#>   <chr>                <chr>       <chr>  <chr> <chr>       <chr>    <chr>      
#> 1 DANE_CNPVPD_2018_1PD demographic DANE   2018  department  persons… Total cens…
#> 2 DANE_CNPVPD_2018_1PM demographic DANE   2018  municipali… persons… Total cens…
#> 3 DANE_CNPVPD_2018_3PD demographic DANE   2018  department  persons… Total cens…
#> 4 DANE_CNPVPD_2018_3PM demographic DANE   2018  municipali… persons… Total cens…
#> 5 DANE_CNPVPD_2018_4PD demographic DANE   2018  department  persons… Census pop…
#> 6 DANE_CNPVPD_2018_4PM demographic DANE   2018  municipali… persons… Census pop…

Geospatial dictionaries

Datasets inside the geospatial module contain a summarized version of the census and a dictionary is needed to understand all aggregated variables. These dictionaries contain the necessary metadata to use the available information. To retrieve them, we can use the function geospatial_dictionary(), using the spatial level and language as parameters:

dict_mpio <- geospatial_dictionary(
  spatial_level = "municipality",
  language = "EN"
)

head(dict_mpio)
#> # A tibble: 6 × 4
#>   variable                 type         length description                      
#>   <chr>                    <chr>         <dbl> <chr>                            
#> 1 codigo_departamento      Text              2 Department code                  
#> 2 codigo_municipio_sin_con Text              3 Municipality code                
#> 3 municipio                Text            250 Municipality name                
#> 4 codigo_municipio         Text              5 Concatenated municipality code   
#> 5 version                  Long Integer     NA Year of the geographic informati…
#> 6 area                     Double           NA Municipality area in square mete…

Climate tags

Climate data is not stored in multiple datasets but as an unique dataset with numerous tags. These tags can also be consulted through the function get_climate_tags(), which takes by parameter the tags language, that can be "EN" or "ES" (default is "ES").

dict_climate <- get_climate_tags(language = "EN")

head(dict_climate)
#>        tag                          variable
#> 1 TSSM_CON              Dry-bulb Temperature
#> 2 THSM_CON              Wet-bulb Temperature
#> 3  TMN_CON               Minimum Temperature
#> 4  TMX_CON               Maximum Temperature
#> 5 TSTG_CON Dry-bulb Temperature (Termograph)
#> 6   HR_CAL                 Relative Humidity
#>                                   frequency
#> 1 Hourly (07:00, 13:00, 18:00 and/or 19:00)
#> 2 Hourly (07:00, 13:00, 18:00 and/or 19:00)
#> 3                                     Daily
#> 4                                     Daily
#> 5                         Hourly (24 hours)
#> 6 Hourly (07:00, 13:00, 18:00 and/or 19:00)

DIVIPOLA

DIVIPOLA codification is a standardized frame for the whole country, and contains departments’ and municipalities’ codes. Departments have two digits for individual identification, while municipalities have five. The five numbers in municipalities’ codes include the department where they are located (first two digits) and the number of the municipality within the department (last three digits). The codes for each municipality and department can be consulted using the divipola_table() function.

divipola <- divipola_table()
head(divipola)
#>   codigo_departamento codigo_municipio departamento  municipio      tipo
#> 1                  05            05001    Antioquia   Medellín Municipio
#> 2                  05            05002    Antioquia  Abejorral Municipio
#> 3                  05            05004    Antioquia   Abriaquí Municipio
#> 4                  05            05021    Antioquia Alejandría Municipio
#> 5                  05            05030    Antioquia      Amagá Municipio
#> 6                  05            05031    Antioquia     Amalfi Municipio

To get the DIVIPOLA code of a municipality or department we can use the auxiliary functions divipola_municipality_code() and divipola_department_code() in ColOpenData. To retrieve a department code we only have to include the department’s name:

name_to_code_dep(department_name = "Guajira")
#> [1] "44"

To retrieve a municipality code we must include the department name and the municipality name. This is to consider repetition among municipalities’ names across departments.

name_to_code_mun(
  department_name = "Boyacá",
  municipality_name = "Tunja"
)
#> [1] "15001"

These individual codes can be used to filter information in the datasets.

On the other hand, departments’ and municipalities’ codes can be translated to retrieve their official names using divipola_municipality_name() and divipola_department_name().

code_to_name_mun(municipality_code = "15001")
#> [1] "Tunja"