16 February 2016 ~ 0 Comments

Data Trips Diary: Bogotá

My last post on this blog was about mobility in Colombia. For that study, I had the opportunity of dunking my hands into a bag filled with interesting data. To do so, I traveled to Bogotá. It is a fascinating place and I decided to dedicate this post to it: what the city looks like under the lens of some simple mobility and economic data analysis. If in the future I will repeat the experience somewhere else I will be more than happy to make this a recurrent column of this blog.

The cliché would demand from me a celebration of the chaos in Bogotá. After all, we are talking about one of the top five largest capitals in Latin America, the chaos continent par excellence. Yet, your data goggles would tell you a different story. Bogotá is extremely organized. Even at the point of being scary. There is a very strict division of social strata: the city government assigns each block a number from 1 (poorest) to 6 (richest) according to its level of development and the blocks are very clustered and homogeneous:

sisben_strata

In the picture: red=1, blue=2, green=3, purple=4, yellow=5 and orange=6 (grey = not classified). That map doesn’t seem very chaotic to me, rather organized and clustered. One might feel uneasy about it, but that is how things are. The clustering is not only on the social stratum of the block, but also in where people work. If you take a taxi ride, you will find entire blocks filled with the very same economic activities. Not knowing that, during one of my cab rides I thought in Bogotá everybody was a car mechanic… until we got passed that block.

The order emerges also when you look at the way the people use the city. My personal experience was of incredulity: I went from the city hall to the house of a co-worker and it felt like moving to a different city. After a turn left, the big crowded highway with improvised selling stands disappeared into a suburb park with no cars and total quiet. In fact, Bogotá looks like four different cities:

bogota_mobilityclusters

Here I represented each city block as a node in a network and I connected blocks if people commute to the two places. Then I ran a community discovery algorithm, and plotted on the map the result. Each color represents an area that does not see a lot of inter-commutes with the other areas, at least compared with its own intra-commutes.

Human mobility is interesting because it gives you an idea of the pulse of a place. Looking at the commute data we discovered that a big city like Bogotá gets even bigger during a working day. Almost half a million people pour inside the capital every day to work and use its services, which means that the population of the city increases, in a matter of hours, by more than 5%.

bogota1

It’s unsurprising to see that this does not happen during a typical Sunday. The difference is not only in volume, but also in destination: people go to different places on weekends.

cell_avgdaycommuters_weekdaydifference

Here, the red blocks are visited more during weekdays, the white blocks are visited more in weekends. It seems that there is an axis that is more popular during weekdays — that is where the good jobs are. The white is prevalently residential.

Crossing this commute information with the data on establishments from the chamber of commerce (camara de comercio), we can also know which businesses types are more visited during weekends, because many commuters are stopping in areas hosting such businesses. There is a lot of shopping going on (comercio al por menor) and of course visits to pubs (Expendio De Bebidas Alcoholicas Para El Consumo Dentro Del Establecimiento). It matches well with my personal experience as, once my data quests were over, my local guide (Andres Gomez) lead me to Andres Carne de Res, a bedlam of music, food and lights, absolutely not to be missed if you find yourself in Bogotá. My personal advice is to be careful about your beverage requests: I discovered too late that a mojito there is served in a soup bowl larger than my skull.

Most of what I wrote here (minus the mojito misadventure) is included in a report I put together with my travel companion (Frank Neffke) and another local (Eduardo Lora). You can find it in the working paper collection of the Center for International Development. I sure hope that my data future will bring me to explore other places as interesting as the capital of Colombia.

Continue Reading

15 January 2016 ~ 0 Comments

The Limited Power of Telecommunication

As a kid from the 80s*, I remember how revolutionary the cellphone era was. It happened so fast. It seemed that, overnight, you could carry in your pocket a device connecting you to everybody you knew, no matter how far. To me, it changed everything. But did it? Yes, over-apprehensive parents can check their babies at the swipe of a finger, and whoever does not carry their cellphone with themselves at all times is labeled as a weirdo — I’m guilty of that. But the telecommunication revolution promised something more: the elimination of distance in communication. Did it deliver? This question was the motivation engine for the paper “Evidence That Calls-Based and Mobility Networks Are Isomorphic” which I wrote with my boss Ricardo Hausmann and which recently appeared in PLoS One.

The question is rather daring, so we decided to take it step by step. The simplest thing we came up was: let’s draw a map of cellphone calls and see if it looks like a geographical map. If it does, we might be onto something. To do so, we obtained data from telecommunication operators in Colombia. They provided us call detail records, where identifiers were encrypted to preserve the anonymity of the people making and receiving the calls. We also aggregated the data to make even the slightest re-identification impossible: every ID was associated to the municipality in which it spent most of its time and so all data was lumped together at the municipality level. At this point, we could draw a map of which municipalities had a significant call traffic with one another. This we called the “Call-based” network:

colombia_social

Click to enlarge

Before jumping to conclusions with this picture, we built a sister network. Since we just said we knew the location of a phone when making a call, we can keep a record of the different municipalities where we spotted the phone. Again, we joined together all data at the municipality level. This sister network is then a “Mobility” network of Colombia:

colombia_mobility

Click to enlarge

It seems there’s something here. The two networks appear to be similar: Bogotá seems to be a prominent center and the connections have a geographical component embedded into them. To make this more evident, we drew the networks on a Colombian map. The color of the municipalities is the same color of the nodes in the pictures above: nodes with the same color are very related in the network — network clusters.

plos1

Click to enlarge

The call-based network is on the left, the mobility is on the right. Blocks of the same color on the left are a clear indication of the call connections being influenced by geography. If there was no relation, the map would look like the Harlequin shirt, with colors scattered evenly across the territory. Mobility clusters are also short-range, although the pattern is harder to see because I had to use many more colors: the clusters are smaller. But the two networks are closely related: in fact, the larger call-based clusters contain the smaller mobility ones, as we show in the paper. We can say that there is a strong relationship between calls and mobility.

This is nice, because it fits with many works in computer science that actually use social relationships to predict human mobility… and vice versa. On the other hand, it is not nice because the existence of these papers also tells us ours is not a new result. Moreover, my starting point was to hint that the call-based and mobility networks are obeying the same laws, not that they are merely correlated. We need to go a step further.

Our step was to consider the difference that distance makes in the two networks. When looking at mobility, the distance between an origin and a destination is an important cost. In the call-based networks, things are a bit trickier. If modern telecommunication really delivered what it promised, distance should be a really low cost, and probably non-linear. To start a social relationship it is not needed to be in the same place at any given time, and even if we move to opposite ends of the world, we can still call each other. As a consequence, there shouldn’t be a way to scale the cost of distance in the call-based network to look like the one in the mobility network.

When we attempted to perform such scaling, we discovered it was actually possible. We checked, at any given distance, the ratio between commuters and callers. If two municipalities are at 50km distance, and there are twice as many commuters than callers, we have a dot on coordinates (50, 2). If we take two municipalities at 100km distance, and the commuters are just a third of the number of callers, the data point is at coordinates (100, .33). Once we consider all data points, we can fit our green line, AKA the scaling function from calls to mobility:

plos3

When we used this adjustment to calculate new call-based clusters using the distance cost “as if” it was the mobility network, we obtained the mobility clusters. We detail in the paper the reasons why this is not as circular as it seems.  In practice, our green line is a transformation function that morphs the call-based network into the mobility network. If modern telecommunication really killed distance, that green line shouldn’t exist, or at least it should be so wobbly to be practically useless.

There are many ways in which you could interpret this result. One that Ricardo and I like focuses on the relationship between face-to-face and electronic mediated meetings. It’s not like the people you call are the ones you really would rather meet but you cannot. It’s more like you call AND you meet, whenever it is possible. Face-to-face and electronic mediated meetings are not really substitutes in this world, they are more like complements. To come back to my opening, I’d say new technologies didn’t eliminate distance from the communication equation. Alleviate, yes. But ultimately, it’s more like an increased bandwidth than a revolution. At least so far.


* Shut up, I’m still in my twenties. Everybody knows 1996 was only 10 years ago.

Continue Reading