Economic Inclusion and Human Mobility in Bogotá


Data and Code for the reproduction of the results in the paper: “Complexity-Based Informality, Mobility Barriers, and Economic Inclusion in Bogotá” by M. Coscia, F. Neffke, and R. Hausmann.

Folder “data” contains the raw data used for the analysis, namely:
– bogota_map.*: Shapefile of the city of Bogota.
– cell_attributes.csv: The attributes of the shapefile in CSV format without the geometry.
– dane_2011.csv: CSV file containing info about workers in Bogota from DANE’s Gran Encuesta de Hogares (aggregation of all 2011 months).
– encuesta_manual.pdf: Description of the variables in the encuesta data files (in Spanish).
– encuesta_mob_2011.dta: Stata data file with the results of the public DANE Encuesta Mobilidad conducted in 2011.
– encuesta_trips.dta: Stata data file with all trip information from the public DANE Encuesta Mobilidad conducted in 2011.
– gmaps_traveltimes_driving.csv: CSV file containing the distance and car travel time between the centroid of each of the cells in which we divide the city.
– gmaps_traveltimes_transit.csv: CSV file containing the distance and public transit travel time between the centroid of each of the cells in which we divide the city.
– manzana_cell.csv: CSV file mapping the city block ID (manzana) to its cell.
– od_traveltimes.csv: CSV file containing the number of observed commuters for each pair of cells, as aggregated from CDR cellphone data.

The other folders contain the scripts to reproduce all the figures and tables. In particular:
– Most of the data analysis script (*.py) need to be run in Python (tested on Ubuntu 18.04.1, Python 3.6.5. Requires several libraries such as pandas, numpy, scipy, shapely, geopandas, pysal, etc).
– Most of the plots are generated using Gnuplot v5.2 (*.gp).
– The maps (*.qgs) are generated using the free software QGIS (version 2.18).

The scripts follow the naming convention XX_scriptname.extension. XX is a zero-padded numeral (e.g. “02”). Unless otherwise specified, the scripts need to be run in the order of their name. So always run the “01” script before attempting to run “02”. The folders are not standalone: each folder depends on the content of the “data” folder and/or the content generated in lowered-numbered folders (e.g. “fig2” depends on some of the calculations made in “fig1”). The “supplementary” folder is equivalent to a “figX” folder, and should be run last.

To reproduce our results for different cities you’ll need these minimal inputs:
– An equivalent of cell_attributes.csv: for every spatial cell, the productivity of the economic activities included in that area
– An equivalent of either encuesta_mob_2011.dta: a “cell to cell” table telling you how many workers commuted from one cell to another, how many worked but did not commute, and their skill level.