Archive | Memetics

22 January 2015 ~ 0 Comments

Surprising Facts About Shortest Paths

Maybe it’s the new year, maybe it’s the fact that I haven’t published anything new recently, but today I wanted to take a look at my publication history. This, for a scientist, is something not unlike a time machine, bringing her back to an earlier age. What was I thinking in 2009? What sparked my interest and what were the tools further refined to get to the point where I am now? It’s usually a humbling (not to say embarrassing) operation, as past papers always look so awful – mine, at least. But small interesting bits can be found, like the one I retrieved today, about shortest paths in communication networks.

A shortest path in a network is the most efficient way to go from one node to another. You start from your origin and you choose to follow an edge to another node. Then you choose again an edge and so on until you get to your destination. When your choices are perfect and you used the minimum possible number of edges to follow, that’s a shortest path (it’s A shortest path and not THE shortest path because there might be alternative paths of the same length). Now, in creating this path, you obviously visited some nodes in between, unless your origin and destination are directly connected. Turns out that there are some nodes that are crossed by a lot of shortest paths, it’s a characteristic of real world networks. This is interesting, so scientists decided to create a measure called betweenness centrality. For each node, betweenness centrality is the share of all possible shortest paths in the network that pass through them.

Intuitively, these nodes are important. Think about a rail network, where the nodes are the train stations. High betweenness stations see a lot of trains passing through them. They are big and important to make connections run faster: if they didn’t exist every train would have to make detours and would take longer to bring you home. A good engineer would then craft rail networks in such a way to have these hubs and make her passengers happy. However, it turns out that this intuitive rule is not universally applicable. For example some communication networks aren’t willing to let this happen. Michele Berlingerio, Fosca Giannotti and I stumbled upon this interesting result while working on a paper titled Mining the Temporal Dimension of the Information Propagation.

tas2

We built two communication networks. One is corporate-based: it’s the web of emails exchanged across the Enron employee ecosystem. The email record has been publicly released for the investigation about the company’s financial meltdown. An employee is connected to all the employees she emailed. The second is more horizontal in nature, with no work hierarchies. We took users from different email newsgroups and connected them if they sent a message to the same thread. It’s the nerdy version of commenting on the same status update on Facebook. Differently from most communication network papers, we didn’t stop there. Every edge still carries some temporal information, namely the moment in which the email was sent. Above you have an extract of the network for a particular subject, where we have the email timestamp next to each edge.

Here’s where the magic happens. With some data mining wizardry, we are able to tell the characteristic reaction times of different nodes in the network. We can divide these nodes in classes: high degree nodes, nodes inside a smaller community where everybody replies to everybody else and, yes, nodes with high betweenness centrality, our train station hubs. For every measure (characteristic), nodes are divided in five classes. Let’s consider betweenness. Class 1 contains all nodes which have betweenness 0, i.e. those through which no shortest path passes. From class 2 to 5 we have nodes of increasing betweenness. So, nodes in class 3 have a medium-low betweenness centrality and nodes in class 5 are the most central nodes in the network. At this point, we can plot the average reaction times for nodes belonging to different classes in the two networks. (Click on the plots to enlarge them)

tas1

The first thing that jumps to the eye is that Enron’s communications (on the left) are much more dependent on the node’s characteristics (whether the characteristic is degree or betweenness it doesn’t seem to matter) than Newsgroup’s ones, given the higher spread. But the interesting bit, at least for me, comes when you only look at betweenness centrality – the dashed line with crosses. Nodes with low (class 2) and medium-low (class 3) betweenness centrality have low reaction times, while more central nodes have significantly higher reaction times. Note that the classes have the same number of nodes in them, so we are not looking at statistical aberrations*. This does not happen in Newsgroups, due to the different nature of the communication in there: corporate in Enron versus topic-driven in Newsgroup.

The result carries some counter intuitive implications. In a corporate communication network the shortest path is not the fastest. In other words, don’t let your train pass through the central hub for a shortcut, ’cause it’s going to stay there for a long long time. It looks like people’s brains are less elastic than our train stations. You can’t add more platforms and personnel to make more things passing through them: if your communication network has large hubs, they are going to work slower. Surprisingly, this does not hold for the degree (solid line): it doesn’t seem to matter with how many people you interact, only that you are the person through which many shortest paths pass.

I can see myself trying to bring this line of research back from the dead. This premature paper needs quite some sanity checks (understatement alert), but it can go a long way. It can be part of the manual on how to build an efficient communication culture in your organization. Don’t overload people. Don’t create over-important nodes in the network, because you can’t allow all your communications to pass through them. Do keep in mind that your team is not a clockwork, it’s a brain-work. And brains don’t work like clocks.


* That’s also the reason to ditch class 1: it contains outliers and it is not comparable in size to the other classes.

 

Continue Reading

13 November 2014 ~ 3 Comments

Average is Boring

You fire up a thesaurus online and you look for synonyms of the word “interesting”. You can find words like “unusual”, “exotic”, “striking”. These are all antonyms of “average”. Average is the grey uniform shirt of the post office employee calling out the number of the next person in the queue, or the government-approved video that teaches you how to properly wash your hands. Of course “average is boring”. Why should we be interested in the average? I am. Because if we understand the average we understand how to avoid it. We can rekindle our interest for lost subjects, each in its own unique way. Even washing your hands. We can live in the tail of the distribution, instead of on top of the bell.

Untitled

My quest for destroying the Average is a follow-up of my earlier paper on memes. Its subtitle is “How similarity kills a meme’s success” and it has been published in Scientific Reports. We are after the confirmation that the successful memes are unique, weird, unexpected. They escape from the blob of your average meme like a spring snake in a can. The starting point of every mission is to know your enemy. It hides itself in internet image memes, those images you can find everywhere on the Web with a usually funny text on top of them, just like this one.

I lined up a collection of these memes, downloaded from Memegenerator.net, and I started examining them, like a full-metal-jacket drill instructor. I demanded them to reveal me all about each other. I started with their name, the string of text associated with them, like “Socially Awkward Penguin” or “Bad Luck Brian”. I noted these strings down and compared their similarity, just like Google does when it suggests “Did you mean…?”. This was already enough to know who is related to whom (I’m looking at you, band of penguins).

Then it was time to examine what they look like. All of them gave me their best template picture and I ran it through the electronic eye of SURF, an amazing computer vision software able to detect image features. Again, I patiently noted down who looked like whom. Finally, I asked them to tell me everything about their history. I collected anything that was ever said on Memegenerator.net, meaning all the texts that the users wrote when creating an instance belonging to each meme. For example, the creation of this picture:

pr

results in associating “If guns don’t … toast toast toast?” with the Philosoraptor meme. I condensed all this text into a given number of topics and exposed which of the memes are talking about the same things. At this point, I had all I needed to know about who is average and who could spark our interest. It’s an even more nerdy version of Hot or Not. So I created a network of memes, connecting two memes if they are similar to each other. I enlarged and highlighted in orange the memes that are widely used and popular. I won’t keep you on your toes any longer: here is the result.

network

I knew it! The big, orange nodes are the cool guys. And they avoid to mingle in the center of the neighborhood. They stay on the periphery, they want to be special, and they are. This conclusion is supported by all kinds of robustness checks, but I’m not going to report them because it’s hard enough for me to keep you awake while you have to read through all this boring stuff. “Ok”, you now think, “You proved what we already knew. Good job. What was this for?”.

This result is not as expected as you might think. Let it settle down in your brain for a second: I am saying that given your name, your image template and your topic I can tell you if you are likely to be successful or not. Plenty of smart people have a proof in their hand saying that a meme’s content isn’t necessary to explain why some memes are successful and some are less memorable than your average Congress hearing. They have plenty of good reasons to say that. In fact, you will never hear me reciting guru-like advices to reach success like “be different”. That’s just bollocks.

Instead of selling the popularity snake oil, I am describing what the path to success looks like. The works I cited do not do that. Some describe how the system works. It’s a bit like telling you that, given how the feudal system worked in the Middle Ages, some people had to be emperors. It doesn’t say so much about what characteristics the emperors had. Otherwise they tell you how good an emperor already on the throne could be. But not so much about how he did get to sit on that fancy chair wearing that silly hat. By looking at the content in a different way, and by posing different questions, I started writing emperor’s biographies and I noticed that they all have something in common. At the very least, I am the court jester.

We are not enemy and we are not contradicting each other. We are examining the same, big and complex ecosystem of silly-pictures-on-the-internet with different spectacles. We all want to see if we can describe human cultural production as a concrete thing following understandable laws. If you want to send a rocket to the moon, you need to know how and why if you throw up a ball it falls back to the ground. Tedious, yes, but fundamental. Now, if you excuse me, I have a lot of balls to throw.

Continue Reading

10 October 2013 ~ 0 Comments

The Paradox of Social Controllability

“It’s a bit sad that some among the most brilliant minds of our generation are working tirelessly on strategies to increase clicks on online ads” popped up on my Facebook stream some days ago (I don’t remember who wrote it, so you are welcome to contact me to restore credit where credit is due :) ). But the point still remains. I actually don’t find it that bad. Yes, it’s bad, but it could be worse. It reminds me of other “wrong” reasons to do incredible improvements in science and stuff. For example, war is responsible for many technology advancements. Even if the aim of online marketing is just to increase revenues, what it actually requires is to understand human psychology, behavior and social interactions. In practice, that’s philosophy of the human mind at its best: how does the brain work? How does a collection of brains work? What drives our behavior and needs?

When you put together many minds in the real world, you have to deal with complex networks. We are not connected with one another at random, and the eyes of our friends are the channel through which we observe the world. This fact is studied in complex network analysis, in the sub-branch of cascade behaviors. Cascade behaviors happen when a person in a social network decides to modify her behavior according to the behavior of the people she is connected to. As a consequence, there are some people in the network who are in a very particular position: given the people they know and their prominence among them, they can modify their behavior and they will modify their friends’ behavior and so on an so forth, changing forever how every node in the network behaves. And that’s the cascade. If you find a way to identify these prominent actors in the network, you can control the behavior of the entire system. Now you can see why there is a mountain of work about it. In the computer science approach, we have threshold models simulating the cascade for many starting nodes and thus identify the practical leaders (for example Jon Kleinberg’s work); in physics we have models, aiming at understanding the degree of controllability of complex systems (I’ll go with Laszlo Barabasi in this).


Visualization of network cascade, from my good friend Mauro Martino. The red dots at the bottom are the “drivers”, who influence the collection of green nodes they are attached to.

Genuinely curious about the topic, I started my own track of research on it. One thing that Diego Pennacchioli, Giulio Rossetti, Luca Pappalardo, Dino Pedreschi, Fosca Giannotti and me found curious is that everybody working on social prominence was looking at it from a monodimensional perspective. That means: the only thing they are interested in is how to maximize the number of nodes influenced by the leaders. The bigger this number, the better. All fun and games, but why? I can think about several scenarios where the final total number is not the most important thing. For example:

  • What if I want people to buy a product? The total number of people knowing about the product is nice, but I want them to be strongly committed, strongly enough to buy it.
  • What if I am actually looking to reach a particular person? Then I care how deeply my message can go through the network.
  • What if I just care about my friends? Then screw their friends (and everybody else), as long as I can influence a wide range of my direct connections!

toy
To calculate our measure we need to infer the diffusion trees. So from the left, where the number on each arrow gives you the action time of the node at the base of the arrow, we go to the right by selecting the lowest possible combination of arrows.

Strength, depth and width of social prominence. That’s why our paper is called “The Three Dimensions of Social Prominence” (check it out). Strength is how committed the people you influenced are to keep doing what you influenced them to do. Depth is how many degrees of separation (or, how far) the cascade of influence that you triggered can go. Width is simply the ratio of your friends that you are able to influence. By analyzing how much a user in Last.fm (a social website based on music) is able to influence her friends in listening to new artists, we found a collection of very interesting facts.

For example, it is well known that in social networks there are some nodes that are structurally very important. They are the central users, the ones that keep the network connected. Intuitively, they are the only way, or the easiest way, through which a signal (in our case social influence) can go from one part of the network to the other. Guess what: they can’t do it. We found a significant anti-correlation between centrality and width and depth. That is bad news, because those nodes are the ones in the only position with a theoretical ability of controlling the network and a practical inability in doing so. I like to call it “The Paradox of Social Controllability” (hence, the post title).

ds
The anti-correlation between depth and strength.

Another piece of food for thought is the trade off between strength and depth. While width is unrelated to both, we found that if you want to go deeply into the network, then you can’t expect that the people you touch will be extremely committed to your message.

The third big thing is the distribution of connections per leader. We found that the leaders showing highest values of strength, depth and width were those who used Last.fm with average frequency. The highly connected and very active users (hubs, in network lingo) scored poorly, as we saw. So did the occasional users, the ones with just two or three connections (that is the majority of the system). The people who have control over the network are the mildly engaged. They are you, in practice: chances are that you are not a record label, nor a music fanatic, but just a person with her tastes and preferences. So you have control. Problem is, the control is scattered equally on the vast set of people like you.

To conclude, we saw what wonderful things network cascades are: they could empower us to do a lot of good. We also saw how there are theoretical results about the possibility of identifying people who can trigger them. But my unfortunate conclusion is about the paradox between theory and practice. Those who theoretically should, apparently can’t.

Continue Reading

20 May 2013 ~ 3 Comments

Memetics, or: How I can spend my entire day on Reddit claiming that I’m working

In his 1976 book “The Selfish Gene“, Richard Dawkins proposed a shift in the way we look at evolution: instead of considering the organisms as the center of evolution, Dawkins proposed (providing tons of evidence) to consider single genes as the fundamental evolution unit. I am not a biologist nor interested in genetics, so this idea should not concern me. However, Dawkins added one chapter to his book. He felt that it could be possible that culture, too, is made out of self-replicating units, just like genes, that can compete and/or collaborate with each other in forming “cultural organisms”. He decided to call these units “memes”.

The idea of memes was mostly in the realm of intellectual and serious researchers (not like me); you can check out some pretty serious books like “Metamagical Themas” by Hofstadter or “Thought Contagion: How Belief Spreads Through Society” by Lynch. But then something terrible was brought to the world. Then, the World Wide Web happened, bringing with itself a nexus of inside jokes, large communities, mind hives, social media, 2.0s, God knows what. Oh and cats. Have one, please:

With the WWW, studying memes became easier, because on the Internet every piece of information has to be stored somehow somewhere. This is not something I discovered by myself, there are plenty of smart guys out there doing marvelous research. I’ll give just three examples out of possibly tens or hundreds:

  • Studies about memes competing for the attention of people in a social network like “Clash of the contagions: Cooperation and competition in information diffusion” or “Competition among memes in a world with limited attention” ;
  • Studies about the adoption of conventions and behaviors by people, like “The emergence of conventions in online social networks”or “Cooperative behavior cascades in human social networks”;
  • Studies about how information diffuses in networks, like “Virality and susceptibility in information diffusions” or “Mining the temporal dimension of the information propagation” which, absolutely incidentally, is a paper of mine.

There is one thing that I find to be mostly missing in the current state of the research on memes. Many, if not all, of the above mentioned works are focused in understanding how memes spread from one person to another and they ask what the dynamics are, given that human minds are connected through a social network. In other words, what we have been studying is mostly the network of connections, regardless of what kinds of messages are passing through it. Now, most of the time these “messages” are about penguins that don’t know how to talk to girls:

and in that case I give you that you can fairly ignore it. But my reasoning is that if we want to really understand memes and memetics, we can’t put all of our effort in just analyzing the networks they live in. It is like trying to understand genes and animals and analyzing only the environment they inhabit. If you want to know how to behave in front of a “tiger” without ever having met one, it is possibly useful to understand something about the forest it is dwelling in, but I strongly advise you to also take a look at its claws, teeth and how fast it can run or climb.

That is exactly what I study in a paper that I got accepted at the ICWSM conference, titled “Competition and Success in the Meme Pool: a Case Study on Quickmeme.com” (click to download). What I did was fairly simple: I downloaded a bunch of memes from Quickmeme.com and I studied the patterns of their appearances and upvotes across a year worth of data. Using some boring data analysis techniques borrowed from ecology, I was able to understand which memes compete (or collaborate) with which other ones, what are the characteristics of memes that make them more likely to survive and whether there are hints as the existence of “meme organisms” (there are. One of my favorites is the small nerd-humor cluster:

).

One of the nicest products of my paper was a simple visualization to help us understand the effect of some of the characteristics of memes that are associated with successful memes. As characteristics I took the number of memes in competition and in collaboration with the meme, whether the meme is part of a coherent group of memes (an “organism”) and if the meme had a very large popularity peak or not. The result, in the picture below (click to enlarge), tells us an interesting story. In the picture, the odds of success are connected by arrows that represent the filters I used to group the memes, based on their characteristics.

This picture is saying: in general, memes have a 35.47% probability of being successful (given the definition of “successful” I gave in the paper). If a meme has a popularity peak that is larger than the average, then its probability of success decreases. This means that, my dear meme*, if you want to survive you have to keep a low profile. And, if you really can’t keep a low profile, then don’t make too many enemies (or your odds will go down to 6.25%). On the other hand, if you kept a low profile, then make as many enemies as you can, but only if you can count on many friends too, especially if you can be in a tightly connected meme organism (80.3%!). This is an exciting result that seems to suggest that memes are indeed collaborating together in complex cultural organisms because that’s how they can survive.

What I did was just scratching the surface of meme-centered studies, as opposed to the network-centered meme studies. I am planning to study more deeply the causal effect between a meme and its fitness to survive in the World Wild Web and to understand the mechanics of how memes evolve and mutate. Oh, and if you feel like, I am also releasing the data that I collected for my study. It is in the “Quickmeme” entry under the Datasets tab (link for the lazies).


* I deeply apologize to Dawkins, any readers (luckily they are few) and to the scientific community as a whole, for my personification of memes. I know that memes have not a mind, therefore they can’t “decide” to do anything, but it really makes it so much easier to write!

Continue Reading