23 January 2018 ~ 0 Comments

Hitting the Front Page Triggers an Evolutionary Arms Race

I’m a conformist. Just like everyone else in computer science working on memes, I am lured by the Holy Grail of questions. Can I figure out what makes content go viral? I really want to answer “yes”, but my absence from Reddit’s front page speaks louder than any advice I could give to you. That didn’t dissuade me from trying to look for a question fitting my preferred answer. After building Earth and waiting a few billion years for it to process the question, this is what I got: “can I figure out what makes content not go viral?” I guess that’s half of the job, right?

In 2014 I proposed my explanation, a rather trivial one: the content that does not go viral is the one that is unoriginal, the one that is too close a copy of something that is already out there. My explanation sounds uncontroversial: to be successful you have to be original, yeah thank you very much. But there was a little problem with it: karma trains. Very often, topics stay popular multiple days: Reddit and social media are flooded with posts about the same thing, seemingly tapping into a neverending pit of attention and upvotes. Unoriginal content actually makes it to the front page. Was it time to throw my theory in the dustbin?* I didn’t think so. So, in this month’s issue of Communications of the ACM, I published a sequel: “Popularity Spikes Hurt Future Chances for Viral Propagation of Protomemes.”

I need to defuse the danger that karma trains represent for my theory, and I do so in two parts, asking two questions. First: is it really true that, in karma trains, content that stays closer to the original post gets more attention and success? Second: is it really true that karma trains contain exclusively unoriginal content? For these questions, I define specifically karma trains to be the collection of posts using a meme after it hit the front page. To answer these questions I focus mainly on Reddit. I use data kindly donated to me by Tim Weninger from the University of Notre Dame (in particular from this paper of his). I look at all catchphrases used frequently — hence the word “protomeme” in the title: my definition is a bit wider than just memes — and I track down how successful they are in each day.

For the first question I have to debunk the notion that unoriginal content is successful in karma trains. First, I check if a meme hit the front page on a particular day. Then I look at all the Reddit posts containing that meme the day after. A karma train implies that more people will use the meme — so, more posts using the catchphrase — and that posts including the meme will be on average more successful. The first part is true: karma trains do exist and, after a meme hits the front page, more people will use it. But the second part is crucially false: on average these posts score poorly. This is not just regression towards the mean: obviously if the meme just hit the front page, its average popularity the day after can only go down. But I control for that. I control for the entire history of the meme. Its average popularity the day after hitting the front page is significantly lower than its regular average popularity, its recent popularity trends, and the average popularity of everything else that day.

So what gives? If the meme is doing poorly, why are karma trains a thing? Because, over those many attempts, a few are going to hit the front page again. Since the front page is very noticeable, we’re tricked into thinking that all posts using the meme are doing well. This transitions nicely into the second part of the paper. Why are those few posts successful? Bell-shaped random distributions teach us that, if you try enough times, one of the attempts is going to be really good. Is that the case? Are we looking at statistical aberrations or is there something interesting? There is: these posts are not ending up on the top randomly. There’s something special about them.

I made up a measure of post originality. Given that a post contains a meme, I want to know if it is repeating it in a boring way, or if it is adding something to the mix. It answers the question: how canonical is the usage of the meme in this post? That is why I called this measure “canonicity”. In practice, for every word that ever appeared in a Reddit title, I calculate the probability that the word is going to be used in a post including that meme. So for every new post I can calculate the average word probability, and ending up with an estimation of how surprising this particular post title is.

You know what I’m going to say next. The more unsuprising a post is, the less likely it is to be successful. A high-canonicity post has roughly half the odds of being widely successful — e.g. hitting the front page — than a low-canonicity one. And the fact that there are low-canonicity posts in karma trains is interesting of itself. It confirms my hunch that, among the posts that jump on the bandwagon of popular memes, there are also original thoughts. This is the evolutionary arms race I mention in the title: when a meme hits the front page, subsequent implementations of it have to constantly innovate, else the meme will be infested by high-canonicity copies, and fade into oblivion. This is generally true for all memes, but it is an effect on steroids for recently successful memes, because they are the ones that are being copied the most in a particular time period.

The story has another interesting turn. Low-canonicity is a double-edged sword. It gives you better odds to hit the front page, but if you fail at it then your performance is going to be atrocious. In that case, high-canonicity posts are actually doing better than low-canonicity ones. In other words, a meme after hitting the front page does a sort of “canonicity sandwich”: the very best AND very worst posts are low-canonicity ones, and in the middle you have high-canonicity posts. Why does this happen? Maybe it’s because of familiarity. Familiar content is reassuring and so people “get it” and upvote. It just does not soar high enough. Or it can be a million other reasons that I haven’t tested yet, so I can only speculate.

What the canonicity sandwich means is that content originality has a varying effect: high canonicity harms you in some cases, but it’s good for you in others. This discovery is important, because other researchers have found that a post’s content doesn’t seem to explain its success very well. The sandwich effect might very well be the cause of our disagreement.

To wrap up, I hope I put on a credible defense of my theory in the face of karma trains. These annoying meme critters are dangerous to my explanation of popularity, because they directly contradict it. Karma trains seems to be a collection of popular unoriginal content: the exact thing my theory says it shouldn’t exist. Upon closer inspection, I noticed that (a) it isn’t really true that karma train posts are particularly successful and (b) it isn’t really true that they only contain unoriginal content. So, my theory is going to die another day, like in all good James Bond flicks**.


* Yes, but I need tenure, so I’ll have to put up a fight.

** Which Die Another Day wasn’t.

Continue Reading

13 November 2014 ~ 3 Comments

Average is Boring

You fire up a thesaurus online and you look for synonyms of the word “interesting”. You can find words like “unusual”, “exotic”, “striking”. These are all antonyms of “average”. Average is the grey uniform shirt of the post office employee calling out the number of the next person in the queue, or the government-approved video that teaches you how to properly wash your hands. Of course “average is boring”. Why should we be interested in the average? I am. Because if we understand the average we understand how to avoid it. We can rekindle our interest for lost subjects, each in its own unique way. Even washing your hands. We can live in the tail of the distribution, instead of on top of the bell.

Untitled

My quest for destroying the Average is a follow-up of my earlier paper on memes. Its subtitle is “How similarity kills a meme’s success” and it has been published in Scientific Reports. We are after the confirmation that the successful memes are unique, weird, unexpected. They escape from the blob of your average meme like a spring snake in a can. The starting point of every mission is to know your enemy. It hides itself in internet image memes, those images you can find everywhere on the Web with a usually funny text on top of them, just like this one.

I lined up a collection of these memes, downloaded from Memegenerator.net, and I started examining them, like a full-metal-jacket drill instructor. I demanded them to reveal me all about each other. I started with their name, the string of text associated with them, like “Socially Awkward Penguin” or “Bad Luck Brian”. I noted these strings down and compared their similarity, just like Google does when it suggests “Did you mean…?”. This was already enough to know who is related to whom (I’m looking at you, band of penguins).

Then it was time to examine what they look like. All of them gave me their best template picture and I ran it through the electronic eye of SURF, an amazing computer vision software able to detect image features. Again, I patiently noted down who looked like whom. Finally, I asked them to tell me everything about their history. I collected anything that was ever said on Memegenerator.net, meaning all the texts that the users wrote when creating an instance belonging to each meme. For example, the creation of this picture:

pr

results in associating “If guns don’t … toast toast toast?” with the Philosoraptor meme. I condensed all this text into a given number of topics and exposed which of the memes are talking about the same things. At this point, I had all I needed to know about who is average and who could spark our interest. It’s an even more nerdy version of Hot or Not. So I created a network of memes, connecting two memes if they are similar to each other. I enlarged and highlighted in orange the memes that are widely used and popular. I won’t keep you on your toes any longer: here is the result.

network

I knew it! The big, orange nodes are the cool guys. And they avoid to mingle in the center of the neighborhood. They stay on the periphery, they want to be special, and they are. This conclusion is supported by all kinds of robustness checks, but I’m not going to report them because it’s hard enough for me to keep you awake while you have to read through all this boring stuff. “Ok”, you now think, “You proved what we already knew. Good job. What was this for?”.

This result is not as expected as you might think. Let it settle down in your brain for a second: I am saying that given your name, your image template and your topic I can tell you if you are likely to be successful or not. Plenty of smart people have a proof in their hand saying that a meme’s content isn’t necessary to explain why some memes are successful and some are less memorable than your average Congress hearing. They have plenty of good reasons to say that. In fact, you will never hear me reciting guru-like advices to reach success like “be different”. That’s just bollocks.

Instead of selling the popularity snake oil, I am describing what the path to success looks like. The works I cited do not do that. Some describe how the system works. It’s a bit like telling you that, given how the feudal system worked in the Middle Ages, some people had to be emperors. It doesn’t say so much about what characteristics the emperors had. Otherwise they tell you how good an emperor already on the throne could be. But not so much about how he did get to sit on that fancy chair wearing that silly hat. By looking at the content in a different way, and by posing different questions, I started writing emperor’s biographies and I noticed that they all have something in common. At the very least, I am the court jester.

We are not enemy and we are not contradicting each other. We are examining the same, big and complex ecosystem of silly-pictures-on-the-internet with different spectacles. We all want to see if we can describe human cultural production as a concrete thing following understandable laws. If you want to send a rocket to the moon, you need to know how and why if you throw up a ball it falls back to the ground. Tedious, yes, but fundamental. Now, if you excuse me, I have a lot of balls to throw.

Continue Reading

20 May 2013 ~ 3 Comments

Memetics, or: How I can spend my entire day on Reddit claiming that I’m working

In his 1976 book “The Selfish Gene“, Richard Dawkins proposed a shift in the way we look at evolution: instead of considering the organisms as the center of evolution, Dawkins proposed (providing tons of evidence) to consider single genes as the fundamental evolution unit. I am not a biologist nor interested in genetics, so this idea should not concern me. However, Dawkins added one chapter to his book. He felt that it could be possible that culture, too, is made out of self-replicating units, just like genes, that can compete and/or collaborate with each other in forming “cultural organisms”. He decided to call these units “memes”.

The idea of memes was mostly in the realm of intellectual and serious researchers (not like me); you can check out some pretty serious books like “Metamagical Themas” by Hofstadter or “Thought Contagion: How Belief Spreads Through Society” by Lynch. But then something terrible was brought to the world. Then, the World Wide Web happened, bringing with itself a nexus of inside jokes, large communities, mind hives, social media, 2.0s, God knows what. Oh and cats. Have one, please:

With the WWW, studying memes became easier, because on the Internet every piece of information has to be stored somehow somewhere. This is not something I discovered by myself, there are plenty of smart guys out there doing marvelous research. I’ll give just three examples out of possibly tens or hundreds:

  • Studies about memes competing for the attention of people in a social network like “Clash of the contagions: Cooperation and competition in information diffusion” or “Competition among memes in a world with limited attention” ;
  • Studies about the adoption of conventions and behaviors by people, like “The emergence of conventions in online social networks”or “Cooperative behavior cascades in human social networks”;
  • Studies about how information diffuses in networks, like “Virality and susceptibility in information diffusions” or “Mining the temporal dimension of the information propagation” which, absolutely incidentally, is a paper of mine.

There is one thing that I find to be mostly missing in the current state of the research on memes. Many, if not all, of the above mentioned works are focused in understanding how memes spread from one person to another and they ask what the dynamics are, given that human minds are connected through a social network. In other words, what we have been studying is mostly the network of connections, regardless of what kinds of messages are passing through it. Now, most of the time these “messages” are about penguins that don’t know how to talk to girls:

and in that case I give you that you can fairly ignore it. But my reasoning is that if we want to really understand memes and memetics, we can’t put all of our effort in just analyzing the networks they live in. It is like trying to understand genes and animals and analyzing only the environment they inhabit. If you want to know how to behave in front of a “tiger” without ever having met one, it is possibly useful to understand something about the forest it is dwelling in, but I strongly advise you to also take a look at its claws, teeth and how fast it can run or climb.

That is exactly what I study in a paper that I got accepted at the ICWSM conference, titled “Competition and Success in the Meme Pool: a Case Study on Quickmeme.com” (click to download). What I did was fairly simple: I downloaded a bunch of memes from Quickmeme.com and I studied the patterns of their appearances and upvotes across a year worth of data. Using some boring data analysis techniques borrowed from ecology, I was able to understand which memes compete (or collaborate) with which other ones, what are the characteristics of memes that make them more likely to survive and whether there are hints as the existence of “meme organisms” (there are. One of my favorites is the small nerd-humor cluster:

).

One of the nicest products of my paper was a simple visualization to help us understand the effect of some of the characteristics of memes that are associated with successful memes. As characteristics I took the number of memes in competition and in collaboration with the meme, whether the meme is part of a coherent group of memes (an “organism”) and if the meme had a very large popularity peak or not. The result, in the picture below (click to enlarge), tells us an interesting story. In the picture, the odds of success are connected by arrows that represent the filters I used to group the memes, based on their characteristics.

This picture is saying: in general, memes have a 35.47% probability of being successful (given the definition of “successful” I gave in the paper). If a meme has a popularity peak that is larger than the average, then its probability of success decreases. This means that, my dear meme*, if you want to survive you have to keep a low profile. And, if you really can’t keep a low profile, then don’t make too many enemies (or your odds will go down to 6.25%). On the other hand, if you kept a low profile, then make as many enemies as you can, but only if you can count on many friends too, especially if you can be in a tightly connected meme organism (80.3%!). This is an exciting result that seems to suggest that memes are indeed collaborating together in complex cultural organisms because that’s how they can survive.

What I did was just scratching the surface of meme-centered studies, as opposed to the network-centered meme studies. I am planning to study more deeply the causal effect between a meme and its fitness to survive in the World Wild Web and to understand the mechanics of how memes evolve and mutate. Oh, and if you feel like, I am also releasing the data that I collected for my study. It is in the “Quickmeme” entry under the Datasets tab (link for the lazies).


* I deeply apologize to Dawkins, any readers (luckily they are few) and to the scientific community as a whole, for my personification of memes. I know that memes have not a mind, therefore they can’t “decide” to do anything, but it really makes it so much easier to write!

Continue Reading