This is the companion data of the paper “Average is Boring: How Similarity Kills a Meme’s Success“. Please cite the paper if you find the data useful!

You can download the data by clicking here. The data has been collected from from June 27th, 2013 to July 6th, 2013.

The ZIP archive contains two text files.

The first file, “instanceid_generatoridfiltered_votes_text_timestep” contains the actual data in 5 tab-separated columns. The columns contains:

  1. ID of the meme implementation: this is a progressive ID. You can use it to retrieve the corresponding meme implementation using the URL<implementation_id>. So meme “10057023” can be retrieved at the URL;
  2. ID of the meme that has been used for the implementation;
  3. Number of upvotes that the meme implementation got until the crawling time;
  4. Text of the implementation, the text that the user superimposed to the meme;
  5. The bimonthly timestep that has been derived from the meme implementation’s ID.

The second file “generators” contains additional metadata on the memes in 4 tab-separated columns:

  1. ID of the meme, this column is the primary key of the file and it matches column #2 of the previous file, as a relational database foreign key;
  2. URL slug of the meme;
  3. Meme name;
  4. URL of the meme template image.

You can find an additional meme dataset, extracted from a different website (, at this page.

Have (again) a good meme hunting!