ColOpenData can be used to access open geospatial data from Colombia. This data is retrieved from the Geostatistical National Framework (MGN), published by the National Administrative Department of Statistics (DANE). The MGN contains the political-administrative division and is used to reference census statistical information. Further information can be obtained directly from DANE here.
This package contains the 2018’s version of the MGN, which also
included a summarized version of the National Population and Dwelling
Census (CNPV) in different aggregation levels. Each level is stored in a
different dataset, which can be retrieved using the
download_geospatial
function, which requires three
arguments:
-
spatial_level
character with the spatial level to be consulted -
include_geom
logical for including (or not) geometry. Default isTRUE
-
include_cnpv
logical for including (or not) CNPV demographic and socioeconomic information Default isTRUE
.
Available levels of aggregation come from the official spatial division provided by DANE, with their names corresponding to:
Code | Level | Name |
---|---|---|
DPTO | Department | DANE_MGN_2018_DPTO |
MPIO | Municipality | DANE_MGN_2018_MPIO |
MPIOCL | Municipality including Class | DANE_MGN_2018_MPIOCL |
MZN | Block | DANE_MGN_2018_MZN |
SECR | Rural Sector | DANE_MGN_2018_SECR |
SECU | Urban Sector | DANE_MGN_2018_SECU |
SETR | Rural Section | DANE_MGN_2018_SETR |
SETU | Urban Section | DANE_MGN_2018_SETU |
ZU | Urban Zone | DANE_MGN_2018_ZU |
In this vignette you will learn:
- How to download geospatial data using ColOpenData
- How to use census data included in geospatial datasets
- How to visualize spatial data using leaflet and ggplot2
We will be using geospatial data at the level of Municipality (MPIO) for the department of Tolima and we will calculate the percentage of houses with internet connection at each municipality. Later, we will build some plots using the previously mentioned approaches for dynamic and static plots.
We will start by importing the needed libraries.
Disclaimer: all data is loaded to the environment in the user’s R session, but is not downloaded to user’s computer. Spatial datasets can be very long and might take a while to be loaded in the environment
Downloading geospatial data
First, we download the data using the function
download_geospatial
, including the geometries and the
census related information.
mpio <- download_geospatial(
spatial_level = "MPIO",
include_geom = TRUE,
include_cnpv = TRUE
)
#> Original data is retrieved from the National Administrative Department
#> of Statistics (Departamento Administrativo Nacional de Estadística -
#> DANE).
#> Reformatted by package authors.
#> Stored by Universidad de Los Andes under the Epiverse TRACE iniative.
head(mpio)
#> Simple feature collection with 6 features and 90 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -76.1027 ymin: 0.9764735 xmax: -74.89527 ymax: 2.326755
#> Geodetic CRS: WGS 84
#> codigo_departamento codigo_municipio_sin_con municipio
#> 1 18 001 Florencia
#> 2 18 029 Albania
#> 3 18 094 Belén de Los Andaquíes
#> 4 18 247 El Doncello
#> 5 18 256 El Paujíl
#> 6 18 410 La Montañita
#> codigo_municipio version area latitud longitud encuestas enc_etnico
#> 1 18001 2018 2547637532 1.749139 -75.55824 71877 32
#> 2 18029 2018 414122070 1.227865 -75.88233 2825 24
#> 3 18094 2018 1191618572 1.500923 -75.87565 4243 54
#> 4 18247 2018 1106076151 1.791386 -75.19394 8809 0
#> 5 18256 2018 1234734145 1.617746 -75.23404 5795 0
#> 6 18410 2018 1701061430 1.302860 -75.23573 5113 15
#> enc_no_etnico enc_resguardo_indigena enc_comun_negras enc_area_protegida
#> 1 71845 32 0 0
#> 2 2801 24 0 0
#> 3 4189 54 0 1
#> 4 8809 0 0 0
#> 5 5795 0 0 0
#> 6 5098 15 0 0
#> enc_area_no_protegida un_vivienda un_mixto un_no_res un_lea
#> 1 71877 61176 2178 8436 87
#> 2 2825 1826 49 948 2
#> 3 4242 3223 109 900 11
#> 4 8809 6598 357 1850 4
#> 5 5795 4891 204 695 5
#> 6 5113 4077 241 786 9
#> un_mixto_no_res_industria un_mixto_no_res_comercio un_mixto_no_res_servicios
#> 1 39 1550 566
#> 2 3 34 12
#> 3 4 99 6
#> 4 11 259 87
#> 5 4 161 38
#> 6 2 205 32
#> un_mixto_no_res_agro un_mixto_no_res_sin_info un_no_res_industria
#> 1 18 5 54
#> 2 0 0 3
#> 3 0 0 8
#> 4 0 0 21
#> 5 1 0 5
#> 6 2 0 3
#> un_no_res_comercio un_no_res_servicios un_no_res_agro un_no_res_institucional
#> 1 2591 1061 535 368
#> 2 21 32 728 53
#> 3 88 6 2 61
#> 4 334 124 807 89
#> 5 239 104 4 70
#> 6 103 123 17 84
#> un_no_res_lote un_no_res_parque un_no_res_minero un_no_res_proteccion
#> 1 3172 233 7 19
#> 2 92 5 0 0
#> 3 626 8 0 8
#> 4 361 42 0 27
#> 5 211 16 0 0
#> 6 362 30 0 0
#> u_no_res_construccion u_no_res_sin_info viviendas viv_casa viv_apartamento
#> 1 371 25 63354 47817 13764
#> 2 14 0 1875 1793 40
#> 3 93 0 3332 3189 113
#> 4 39 6 6955 6006 775
#> 5 45 1 5095 4700 145
#> 6 58 6 4318 3890 224
#> viv_cuarto viv_trad_indigena viv_trad_etnica viv_otro viv_ocupado_personas
#> 1 1624 21 8 120 49809
#> 2 22 17 0 3 1409
#> 3 24 2 2 2 2883
#> 4 160 1 1 12 5767
#> 5 188 4 1 57 4568
#> 6 161 6 2 35 3553
#> viv_ocupado_sin_personas viv_temporal viv_desocupado hogares viv_energia
#> 1 2681 2150 8714 51430 48638
#> 2 13 55 398 1559 1300
#> 3 2 107 340 3161 2595
#> 4 304 388 496 6129 5375
#> 5 5 323 199 5848 4195
#> 6 151 308 306 3748 2159
#> viv_sin_energia viv_energia_estrato_1 viv_energia_estrato_2
#> 1 1171 34851 10343
#> 2 109 1184 106
#> 3 288 2118 366
#> 4 392 3548 962
#> 5 373 3330 770
#> 6 1394 1964 144
#> viv_energia_estrato_3 viv_energia_estrato_4 viv_energia_estrato_5
#> 1 2169 509 13
#> 2 1 0 0
#> 3 17 2 1
#> 4 793 1 1
#> 5 84 0 1
#> 6 9 1 0
#> viv_energia_estrato_6 viv_energia_sin_estrato viv_acueducto viv_sin_acueducto
#> 1 3 750 45179 4630
#> 2 0 9 808 601
#> 3 0 91 2017 866
#> 4 1 69 4175 1592
#> 5 1 9 2505 2063
#> 6 0 41 1441 2112
#> viv_alcantarillado viv_sin_alcantarillado viv_gas viv_sin_gas
#> 1 41138 8671 37028 12074
#> 2 703 706 26 1371
#> 3 1806 1077 52 2796
#> 4 4323 1444 57 5549
#> 5 2359 2209 1463 3041
#> 6 1329 2224 67 3454
#> viv_sin_info_gas viv_rec_basuras viv_sin_rec_basuras viv_internet
#> 1 707 45491 4318 13362
#> 2 12 727 682 27
#> 3 35 1905 978 73
#> 4 161 4348 1419 211
#> 5 64 2414 2154 125
#> 6 32 1273 2280 64
#> viv_sin_internet viv_sin_info_internet personas per_leas
#> 1 35727 720 156789 4315
#> 2 1370 12 4514 151
#> 3 2775 35 9075 346
#> 4 5395 161 17775 203
#> 5 4379 64 13014 192
#> 6 3457 32 12128 604
#> per_hogares_particulares hombres mujeres per_0_a_9 per_10_a_19 per_20_a_29
#> 1 152474 77620 79169 25503 30249 29951
#> 2 4363 2323 2191 725 1016 717
#> 3 8729 4551 4524 1592 2254 1388
#> 4 17572 8790 8985 3047 3811 2601
#> 5 12822 6601 6413 2346 2882 2170
#> 6 11524 6437 5691 2229 3022 1836
#> per_30_a_39 per_40_a_49 per_50_a_59 per_60_a_69 per_70_a_79 per_80_mas
#> 1 23602 17235 14349 8969 4687 2244
#> 2 568 536 445 253 162 92
#> 3 1121 986 816 487 286 145
#> 4 2302 2032 1792 1135 707 348
#> 5 1587 1460 1188 703 430 248
#> 6 1563 1441 1010 578 323 126
#> per_ed_primaria per_ed_secundaria per_ed_superior per_ed_posgrado
#> 1 37918 14123 14606 856
#> 2 1696 150 98 0
#> 3 2596 418 171 12
#> 4 6091 712 347 26
#> 5 4805 261 226 0
#> 6 5011 384 134 0
#> per_ed_sin_educacion per_ed_sin_info shape_length shape_area
#> 1 5892 3799 2.942508 0.20692777
#> 2 215 46 1.112829 0.03361758
#> 3 720 123 2.234657 0.09674460
#> 4 1095 171 3.154370 0.08986744
#> 5 916 99 3.529316 0.10030928
#> 6 724 182 3.402939 0.13817351
#> geom
#> 1 MULTIPOLYGON (((-75.42074 2...
#> 2 MULTIPOLYGON (((-75.89506 1...
#> 3 MULTIPOLYGON (((-75.78705 1...
#> 4 MULTIPOLYGON (((-75.36167 2...
#> 5 MULTIPOLYGON (((-75.36638 2...
#> 6 MULTIPOLYGON (((-75.40346 1...
After downloading, we have to filter by the municipality code using the DIVIPOLA code for Tolima. For further details on DIVIPOLA codification and functions please refer to Documentation and Dictionaries
name_to_code_dep("Tolima")
#> [1] "73"
To understand which column contains the departments’ codes and filter
for Tolima, we will need the corresponding dataset dictionary. To
download the dictionary we can use the dictionary
function.
This function uses the dataset name to download the associated
information. For further information please refer to the documentation
on dictionaries previously mentioned.
dict <- dictionary("DANE_MGN_2018_MPIO")
head(dict)
#> variable tipo longitud
#> 1 codigo_departamento Text 2
#> 2 codigo_municipio_sin_con Text 3
#> 3 municipio Text 250
#> 4 codigo_municipio Text 5
#> 5 version Long Integer NA
#> 6 area Double NA
#> descripcion
#> 1 Código del departamento
#> 2 Código que identifica al municipio
#> 3 Nombre del municipio
#> 4 Código concatenado que identifica al municipio
#> 5 Año de la información geográfica
#> 6 Área del municipio en metros cuadrados (Sistema de coordenadas planas MAGNA_Colombia_Bogota)
#> categoria_original
#> 1 <NA>
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 <NA>
#> 6 <NA>
After exploring the dictionary, we can identify the column that contains the individual municipality codes is codigo_departamento. We will filter based on that column.
To calculate the percentage of houses with internet connection, we will need to know the number of houses with internet connection and the total of houses in each SECU. From the dictionary we get that the number of houses with internet connection is STP19_INT1 and the total of houses is STVIVIENDA. We will calculate the percentage as follows:
Static plots (ggplot2)
ggplot2
can
be used to generate static plots of spatial data by using the geometry
geom_sf
as follows:
The generated plot by default uses a blue palette, which makes it
hard to observe small differences in internet coverage across
municipalities. Color palettes and themes can be defined for each plot
using the aesthetic and scales, which can be consulted in the
ggplot2
documentation.
We will use a gradient with a two-color diverging palette, to make the
differences more visible.
ggplot(data = tolima) +
geom_sf(mapping = aes(fill = internet), color = NA) +
theme_minimal() +
theme(
panel.grid = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()
) +
scale_fill_gradient("Percentage", low = "#10bed2", high = "#deff00") +
ggtitle(
label = "Internet coverage",
subtitle = "Tolima, Colombia"
)
Dynamic plots (leaflet)
For dynamic plots, we can use leaflet
,
which is an open-source library for interactive maps. To create the same
plot we first will create the color palette.
colfunc <- colorRampPalette(c("#10bed2", "#deff00"))
pal <- colorNumeric(
palette = colfunc(100),
domain = tolima$internet
)
With the previous color palette we can generate the interactive plot.
The package also includes open source maps for the base map like OpenStreetMap
and CartoDB. For further
details on leaflet
, please refer to the package’s documentation.
leaflet(tolima) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
stroke = TRUE,
weight = 0,
color = NA,
fillColor = ~ pal(tolima$internet),
fillOpacity = 1,
popup = paste0(tolima$internet)
) %>%
addLegend(
position = "bottomright",
pal = pal,
values = ~ tolima$internet,
opacity = 1,
title = "Internet Coverage"
)