Criminal Activities and Migration

When something unpleasant is happening around you, the most natural reaction you might have is to try and get away from it. For instance, if you live in Mexico and there is a lot of criminal activity in the neighborhood, you might want to move, if you can afford to. This seems intuitive, but is it true? Science is often the process of systematically testing whether our intuitions about the world are true; you never know when the easy answer is wrong if you never investigate it!

This is what I did with Roxana Gutiérrez-Romero: I went back to my old love – the investigation of Mexican drug trafficking – and this resulted in the paper “Displacement and disconnection: the impact of violence on migration networks and highway traffic in Mexico” which was recently published in the Spatial Economic Analysis journal.

The question is simple: do we see a disproportionate increase in emigration (whether local within Mexico or international) from municipalities that experience spikes of criminal violence? Answering this question is quite hard, and it involves controlling for many other potential explanations that might drive emigration. Roxana did a remarkable job in figuring out how to control for those other factors, leaving me to worry about a simpler, networky question. Let’s focus on local internal migration for now.

Nodes are municipalities, connected by migration links. I’ll get to the color meaning in the text later.

We can take a snapshot of internal migration by creating a network of municipalities. Two municipalities are connected by a directed edge weighted by the number of people who change their residence from one to the other. The picture above depicts just that.

A single network doesn’t tell us much, we need two different points in time. For each pair of municipalities, we have data about the migration links for several five-year intervals (2005-2010, 2010-2015, 2015-2020). We cannot simply compare the edge weights in two subsequent snapshots, as the change we observe might be just a random fluctuation. Moreover, how do we know whether a change is significant when there are many migration links? Immigration from a municipality might have increased at the same time as all other incoming links have decreased. For this reason, the question morphs into a network science one: if we have an edge observed across two different five-year intervals with two different weights, how do we know the edge weight changed in a statistically significant way?

We got help from an unexpected ally: network backboning. Normally, network backboning is the process of determining whether an edge measured with a noisy process exists. However, by using the Noise-Corrected approach I developed with Frank Neffke a while ago, we can do more than that.

See, noise-corrected backboning achieves the task of verifying an edge’s existence by modeling it, estimating its expected weight and variance. The same edge at different times will have different weights and different variances. By using bootstrapping, a fancy word that means “draw many random numbers from a distribution characterized by the edge’s weight and standard deviation,” we can create an edge weight distribution and figure out whether the edge’s weight truly increased, decreased, or stayed the same.

This is what you see in the picture above: green edges showed an increase in migration, red edges a decrease, and yellow ones stayed about the same. We can aggregate a municipality’s net migration change, which we use for the node’s color. As a robustness check, we create the same network, but using highway traffic instead of migration:

We can use this estimation of increased/decreased migration/traffic as the variable we want to predict in a big and complicated regression that takes into account many possible alternative explanations – ask Roxana for the painfully precise details, she worked literal years on it.

What Roxana found was that our intuition is accurate: violence indeed is associated with increased emigration. We also checked international emigration to the US (which accounts for 90% of Mexican emigration) and found a similar effect: violence is associated with a 5% rise in emigration to the US and a 3% drop in return migration from the US.

So, for once, we don’t have a puzzling counter-intuitive result: we indeed see violence and criminal activity discouraging people to stay around. It would be interesting to see whether this holds in different contexts and scenarios.

Data Trips Diary: Bogotá

My last post on this blog was about mobility in Colombia. For that study, I had the opportunity of dunking my hands into a bag filled with interesting data. To do so, I traveled to Bogotá. It is a fascinating place and I decided to dedicate this post to it: what the city looks like under the lens of some simple mobility and economic data analysis. If in the future I will repeat the experience somewhere else I will be more than happy to make this a recurrent column of this blog.

The cliché would demand from me a celebration of the chaos in Bogotá. After all, we are talking about one of the top five largest capitals in Latin America, the chaos continent par excellence. Yet, your data goggles would tell you a different story. Bogotá is extremely organized. Even at the point of being scary. There is a very strict division of social strata: the city government assigns each block a number from 1 (poorest) to 6 (richest) according to its level of development and the blocks are very clustered and homogeneous:

sisben_strata

In the picture: red=1, blue=2, green=3, purple=4, yellow=5 and orange=6 (grey = not classified). That map doesn’t seem very chaotic to me, rather organized and clustered. One might feel uneasy about it, but that is how things are. The clustering is not only on the social stratum of the block, but also in where people work. If you take a taxi ride, you will find entire blocks filled with the very same economic activities. Not knowing that, during one of my cab rides I thought in Bogotá everybody was a car mechanic… until we got passed that block.

The order emerges also when you look at the way the people use the city. My personal experience was of incredulity: I went from the city hall to the house of a co-worker and it felt like moving to a different city. After a turn left, the big crowded highway with improvised selling stands disappeared into a suburb park with no cars and total quiet. In fact, Bogotá looks like four different cities:

bogota_mobilityclusters

Here I represented each city block as a node in a network and I connected blocks if people commute to the two places. Then I ran a community discovery algorithm, and plotted on the map the result. Each color represents an area that does not see a lot of inter-commutes with the other areas, at least compared with its own intra-commutes.

Human mobility is interesting because it gives you an idea of the pulse of a place. Looking at the commute data we discovered that a big city like Bogotá gets even bigger during a working day. Almost half a million people pour inside the capital every day to work and use its services, which means that the population of the city increases, in a matter of hours, by more than 5%.

bogota1

It’s unsurprising to see that this does not happen during a typical Sunday. The difference is not only in volume, but also in destination: people go to different places on weekends.

cell_avgdaycommuters_weekdaydifference

Here, the red blocks are visited more during weekdays, the white blocks are visited more in weekends. It seems that there is an axis that is more popular during weekdays — that is where the good jobs are. The white is prevalently residential.

Crossing this commute information with the data on establishments from the chamber of commerce (camara de comercio), we can also know which businesses types are more visited during weekends, because many commuters are stopping in areas hosting such businesses. There is a lot of shopping going on (comercio al por menor) and of course visits to pubs (Expendio De Bebidas Alcoholicas Para El Consumo Dentro Del Establecimiento). It matches well with my personal experience as, once my data quests were over, my local guide (Andres Gomez) lead me to Andres Carne de Res, a bedlam of music, food and lights, absolutely not to be missed if you find yourself in Bogotá. My personal advice is to be careful about your beverage requests: I discovered too late that a mojito there is served in a soup bowl larger than my skull.

Most of what I wrote here (minus the mojito misadventure) is included in a report I put together with my travel companion (Frank Neffke) and another local (Eduardo Lora). You can find it in the working paper collection of the Center for International Development. I sure hope that my data future will bring me to explore other places as interesting as the capital of Colombia.

Data-Driven Borders

What defines the human division of territory? Think about it: cities are placed in particular areas for a number of good reasons: communication routes, natural resources, migration flows. But once cities are located in a given spot, who decides where one city ends and another begins? Likewise, who decides on the borders of a region or a nation and how? This decision, more often than not, is quite random.

Sometimes administrative borders are defined by natural barriers like mountains and rivers. This makes practical sense, although it is not always clear why the border should be that particular mountain or that particular river. In fact, the main criterion is usually historical: it’s because some dynasty of dudes conquered that area and then got lazy and didn’t go on (this may be the official version: unofficially, maybe, it’s because they found somebody who kicked their asses all day long, just like the complicated relationship of the Romans with the Parthians).

Of course, the borders of states or regions are sometimes re-arranged to better fit practical administrative purposes. In any case, these are nothing else than sub-optimal adjustments of a far-from-optimal process. Network analysis can be useful in this context, because it can provide an objective way to divide the territory according to a particular theory (and it can provide pretty pictures too).

The theory here is very simple: two territories are related if a lot of people travel regularly from one to the other. If people constantly travel back and forth between two territories, then it probably makes sense to combine these territories into one administrative unit. So, how do we determine which territories should be merged, and which shouldn’t be? This problem is easily solvable in network theory, because it contains a network in its very basic definition: two areas are strongly connected if many people travel from one to the other. What we aim for is a grouping of territories. This looks really familiar to the eyes of some readers of this website: grouping nodes in a network. Yes! Community discovery!

I am not claiming to be the first one to see the problem this way. There is a number of people who already worked on it: the two most important that I can think of are Brockmann et al. and Ratti et al. However, I am reporting this because I also have a paper on the topic. And, of course, I think it’s better than the alternatives, for a number of reasons that I won’t report because it’s boring for non nerd people. But then again, I am a narcissist, so I can’t resist giving you the short list:

  • The previous works are based on not so perfect data: Brockmann et al. work with the banknotes trajectories recorded by the “Where’s George?” website (an awesome idea, take a look at it), while Ratti et al. use cellphone mobility data. Both are not exact representations about how people move and contain critical error terms. In our work, we use GPS trajectories with very high frequency and precision: we are studying the real thing.
  • The previous works use outdated methods for community discovery which cannot detect small communities: we use a more up-to-date method that is considered the state-of-the-art of community discovery. For example, in Brockmann et al. the entire west part of the United States is apparently one single area, grouping California and Montana and creating a region of 60-something million people.
  • We actually create a framework that establishes the correct methodology to approach the problem in general, instead of just studying one particular case.

But enough blabbering! I promised pretty pictures and I’ll give you pretty pictures. The general shared methodology is the following (in the pictures, the example of  mobility in Tuscany, Italy):

1) We divide the territory in cells (either a regular grid or very fine grained census cells);

2) We connect the nodes according to how many cars went from one cell to the other;

3) We forget about geography and we obtain a complex network (here, the node layout has nothing to do with their location on the map);

4) We apply community discovery, grouping set of nodes (territories) that are visited by the same people;

5) We put the nodes back in their geographical positions, obtaining the borders we were yearning for.

Funnily enough, Italy is undergoing a re-organization process of its regions and provinces. The results in Tuscany are very similar to the insights of our work (not perfectly similar, as the current process is just a merge of the existing provinces and not a real re-design):

On the left the new provinces (colors) on top of the old ones (lines), on the right our clusters (click for a larger resolution).

The match suggests that our data-driven borders follow the general intuition about what the borders should look like. However, they are not just a blind merge of the existing provinces, such as the one made by the policy-makers, making them more connected with reality. Hurrah for network analysis!