Documentation and Dictionaries
Source:vignettes/documentation_and_dictionaries.Rmd
documentation_and_dictionaries.Rmd
Naming and structure
Understanding Demographic Datasets
Demographic data is provided from multiple datasets of the same source. To download the data, a naming framework has been implemented, which includes the source, group, year and final details for individual identification. Details are different for every dataset and are related to the internal information they contain. The general frame can be used as follows:
SOURCE_GROUP_YEARS_DETAILS
Demographic datasets are available for municipalities and departments, and contain data for Dwellings, Households, Population and Population Projections in five categories.
- Viviendas (Dwellings)
- Hogares (Households)
- Personas Social (Persons social)
- Personas Demográfico (Persons demographic)
All datasets are retrieved from the National Administrative Department of Statistics (DANE). Naming is stated as follows:
- Source: DANE
- Group: Names include the categories
- Viviendas: CNPVV
- Hogares: CNPVH
- Personas Social: CNPVPS
- Personas Demográfico: CNPVPD
- Year:
- Census data: 2018
- Details: These are related to each individual dataset. For further
details please check the function
list_datasets
below.
For hands on examples please check A Deep Dive into Colombian Demographics Using ColOpenData.
Understanding Geospatial Datasets
Geospatial datasets naming is related to the level of aggregation, since they are available from Blocks to Departments. All these datasets come from DANE, and are part of the Geostatistical National Framework (MGN), which for 2018 included a summarized version of the National Population and Dwelling Census (CNPV). Available spatial levels include: department, municipality, urban and rural sector, urban and rural section, urban zone and blocks. Please check Maps and plots with ColOpenData for further details.
Understanding Climate Dataset
This module’s data is stored in an unique dataset, and the information required to use the related functions is the area of interest, dates, and tags be consulted. Individual tags are required to download data and include:
Tags | Variable |
---|---|
TSSM_CON | Dry-bulb Temperature |
THSM_CON | Wet-bulb Temperature |
TMN_CON | Minimum Temperature |
TMX_CON | Maximum Temperature |
TSTG_CON | Dry-bulb Temperature (Termograph) |
HR_CAL | Relative Humidity |
HRHG_CON | Relative Humidity (Hydrograph) |
TV_CAL | Vapour Pressure |
TPR_CAL | Dew Point |
PTPM_CON | Precipitation (Daily) |
PTPG_CON | Precipitation (Hourly) |
EVTE_CON | Evaporation |
FA_CON | Atmospheric Phenomenon |
NB_CON | Cloudiness |
RCAM_CON | Wind Trajectory |
BSHG_CON | Sunshine Duration |
VVAG_CON | Wind Speed |
DVAG_CON | Wind Direction |
VVMXAG_CON | Maximum Wind Speed |
DVMXAG_CON | Maximum Wind Direction |
These tags are meant to be used for download using
download_climate
, download_climate_geom
and
download_climate_stations
. See How
to download climate data using ColOpenData for further details.
Understanding Population Projections
Population projections and back-projections are available for national, department and municipality levels, and divided by sex and ethnicity (the latter is only available for municipalities). The names of the datasets relate to the source, years included, sex and ethnicity.
For examples on how to consult the data please refer to Population Projection with ColOpenData
List Data
To check available datasets you can use the
list_datasets
function. This function can also be filtered
with the module
parameter to indicate a specific module.
Default is "all"
, but can be filtered by
"demographic"
, "geospatial"
,
"climate"
and "population_projections"
.
datasets <- list_datasets()
head(datasets)
#> name group source year level category
#> 1 DANE_MGN_2018_DPTO geospatial DANE 2018 department maps
#> 2 DANE_MGN_2018_MPIO geospatial DANE 2018 municipality maps
#> 3 DANE_MGN_2018_MPIOCL geospatial DANE 2018 municipality_class maps
#> 4 DANE_MGN_2018_SETU geospatial DANE 2018 urban_sector maps
#> 5 DANE_MGN_2018_SETR geospatial DANE 2018 rural_sector maps
#> 6 DANE_MGN_2018_SECU geospatial DANE 2018 urban_section maps
#> description
#> 1 Geographical and summarised census data from 2018 at the level of department
#> 2 Geographical and summarised census data from 2018 at the level of municipality
#> 3 Geographical and summarised census data from 2018 at the level of municipality with class
#> 4 Geographical and summarised census data from 2018 at the level of urban sector
#> 5 Geographical and summarised census data from 2018 at the level of rural sector
#> 6 Geographical and summarised census data from 2018 at the level of urban section
To list only demographic datasets we can use:
demographic_datasets <- list_datasets(module = "demographic")
head(demographic_datasets)
#> name group source year level category
#> 11 DANE_CNPVH_2018_1HD demographic DANE 2018 department households
#> 12 DANE_CNPVH_2018_1HM demographic DANE 2018 municipality households
#> 13 DANE_CNPVH_2018_2HD demographic DANE 2018 department households
#> 14 DANE_CNPVH_2018_2HM demographic DANE 2018 municipality households
#> 15 DANE_CNPVH_2018_3HD demographic DANE 2018 department households
#> 16 DANE_CNPVH_2018_3HM demographic DANE 2018 municipality households
#> description
#> 11 Number of households with persons under 15 years of age and number of persons under 15 years of age, by department and area
#> 12 Number of households with persons under 15 years of age and number of persons under 15 years of age, by municipality and area
#> 13 Number of households with senior citizens and number of persons aged 60 and over, by department and area
#> 14 Number of households with senior citizens and number of persons aged 60 and over, by municipality and area
#> 15 Households by headship, by department and area
#> 16 Households by headship, by municipality and area
List Data Using Keywords
Sometimes, going through each dataset to find specific information
can be tiring. If you want to look for an specific word or set of words
within datasets quickly, you can use the look_up
function,
which takes by parameter:
- The module you wish to search within (default is
"all"
). - The word (or words) you are interested in (input as a character or vector of characters).
- The search condition:
"and"
to find datasets containing all specified words, or"or"
to find datasets containing any of the specified words (default is"or"
). If you are searching for a single word, you can use either"and"
or"or"
for this parameter.
age_datasets <- look_up(keywords = "age")
head(age_datasets)
#> name group source year level
#> 11 DANE_CNPVH_2018_1HD demographic DANE 2018 department
#> 12 DANE_CNPVH_2018_1HM demographic DANE 2018 municipality
#> 13 DANE_CNPVH_2018_2HD demographic DANE 2018 department
#> 14 DANE_CNPVH_2018_2HM demographic DANE 2018 municipality
#> 17 DANE_CNPVPD_2018_1PD demographic DANE 2018 department
#> 18 DANE_CNPVPD_2018_1PM demographic DANE 2018 municipality
#> category
#> 11 households
#> 12 households
#> 13 households
#> 14 households
#> 17 persons_demographic
#> 18 persons_demographic
#> description
#> 11 Number of households with persons under 15 years of age and number of persons under 15 years of age, by department and area
#> 12 Number of households with persons under 15 years of age and number of persons under 15 years of age, by municipality and area
#> 13 Number of households with senior citizens and number of persons aged 60 and over, by department and area
#> 14 Number of households with senior citizens and number of persons aged 60 and over, by municipality and area
#> 17 Total census population, by department, area, age group, masculinity and femininity indexes, and sex
#> 18 Total census population, by municipality, area, age group, masculinity and femininity indexes, and sex
We can specify a module to make a more narrow and precise search.
area_sex_datasets <- look_up(
module = "demographic", keywords = c("area", "sex"),
logic = "and"
)
head(area_sex_datasets)
#> name group source year level
#> 17 DANE_CNPVPD_2018_1PD demographic DANE 2018 department
#> 18 DANE_CNPVPD_2018_1PM demographic DANE 2018 municipality
#> 20 DANE_CNPVPD_2018_3PD demographic DANE 2018 department
#> 21 DANE_CNPVPD_2018_3PM demographic DANE 2018 municipality
#> 22 DANE_CNPVPD_2018_4PD demographic DANE 2018 department
#> 23 DANE_CNPVPD_2018_4PM demographic DANE 2018 municipality
#> category
#> 17 persons_demographic
#> 18 persons_demographic
#> 20 persons_demographic
#> 21 persons_demographic
#> 22 persons_demographic
#> 23 persons_demographic
#> description
#> 17 Total census population, by department, area, age group, masculinity and femininity indexes, and sex
#> 18 Total census population, by municipality, area, age group, masculinity and femininity indexes, and sex
#> 20 Total census population, by department, age group, age, area and sex
#> 21 Total census population, by municipality, age group, age, area and sex
#> 22 Census population in particular households, by relationship or kinship to the head of household, by department, area and sex
#> 23 Census population in particular households, by relationship or kinship to the head of household, by municipality, area and sex
Dictionaries
Dictionaries are provided to understand some tags and column names inside each module. These dictionaries are provided in Spanish, since they are retrieved directly from the sources.
Demographic
Dictionaries are not provided for this module since they are not needed. Demographic datasets include comprehensive column names and variables, and they are self-contained.
Geospatial
Datasets inside the geospatial module contain a summarized version of
the census and a dictionary is needed to understand all aggregated
variables. These dictionaries contain the necessary metadata to use the
available information. To retrieve them, we can use the function
dictionary
, using the dataset name as a parameter:
dict_mpio <- dictionary("DANE_MGN_2018_MPIO")
head(dict_mpio)
#> variable tipo longitud
#> 1 codigo_departamento Text 2
#> 2 codigo_municipio_sin_con Text 3
#> 3 municipio Text 250
#> 4 codigo_municipio Text 5
#> 5 version Long Integer NA
#> 6 area Double NA
#> descripcion
#> 1 Código del departamento
#> 2 Código que identifica al municipio
#> 3 Nombre del municipio
#> 4 Código concatenado que identifica al municipio
#> 5 Año de la información geográfica
#> 6 Área del municipio en metros cuadrados (Sistema de coordenadas planas MAGNA_Colombia_Bogota)
#> categoria_original
#> 1 <NA>
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 <NA>
#> 6 <NA>
Climate
Climate data is not stored in multiple datasets but as an unique
dataset with numerous tags. These tags can also be consulted through the
function dictionary
, using the name of the only climate
dataset available.
dict_climate <- dictionary("IDEAM_CLIMATE_2023_MAY")
head(dict_climate)
#> etiqueta variable
#> 1 TSSM_CON Temperatura seca (ambiente)
#> 2 THSM_CON Temperatura húmeda
#> 3 TMN_CON Temperatura mínima
#> 4 TMX_CON Temperatura máxima
#> 5 TSTG_CON Temperatura seca (ambiente) del termógrafo
#> 6 HR_CAL Humedad relativa
#> frecuencia
#> 1 Horaria (07:00, 13:00, 18:00 y/o 19:00)
#> 2 Horaria (07:00, 13:00, 18:00 y/o 19:00)
#> 3 Diaria
#> 4 Diaria
#> 5 Horaria (24 horas)
#> 6 Horaria (07:00, 13:00, 18:00 y/o 19:00)
DIVIPOLA
DIVIPOLA codification is a standardized frame for the whole country, and contains departments’ and municipalities’ codes. Departments have two digits for individual identification, while municipalities have five. The five numbers in municipalities’ codes include the department where they are located (first two digits) and the number of the municipality within the department (last three digits). The codes for each municipality and department can be consulted in the following table. They can also be consulted using the DIVIPOLA table function:
divipola <- divipola_table()
head(divipola)
#> codigo_departamento codigo_municipio departamento municipio tipo
#> 1 05 05001 Antioquia Medellín Municipio
#> 2 05 05002 Antioquia Abejorral Municipio
#> 3 05 05004 Antioquia Abriaquí Municipio
#> 4 05 05021 Antioquia Alejandría Municipio
#> 5 05 05030 Antioquia Amagá Municipio
#> 6 05 05031 Antioquia Amalfi Municipio
To get the DIVIPOLA code of a municipality or department we can use
the auxiliary functions divipola_municipality_code
and
divipola_department_code
in ColOpenData.
To retrieve a department code we only have to include the department’s
name:
name_to_code_dep("Guajira")
#> [1] "44"
To retrieve a municipality code we must include the department name and the municipality name. This is to consider repetition among municipalities’ names across departments.
name_to_code_mun("Boyacá", "Tunja")
#> [1] "15001"
These individual codes can be used to filter information in the datasets.
On the other hand, departments’ and municipalities’ codes can be
translated to retrieve their official names using
divipola_municipality_name
and
divipola_department_name
.
code_to_name_mun("15001")
#> [1] "Tunja"