05 March 2024 ~ 0 Comments

My Winter in Cultural Data Analytics

Cultural analytics means using data analysis techniques to understand culture — now or in the past. The aim is to include as many sources as possible: not just text, but also pictures, music, sculptures, performance arts, and everything that makes a culture. This winter I was fairly involved with the cultural analytics group CUDAN in Tallinn, and I wanted to share my experiences.

CUDAN organized the 2023 Cultural Data Analytics Conference, which took place in December 13th to 16th. The event was a fantastic showcase of the diversity and the thriving community that is doing work in the field. Differently than other posts I made about my conference experiences, you don’t have to take my word for its awesomeness, because all the talks were recorded and are available on YouTube. You can find them at the conference page I linked above.

My highlights of the conference were:

  • Alberto Acerbi & Joe Stubbersfield’s telephone game with an LLM. Humans have well-known biases when internalizing stories. In a telephone game, you ask humans to sum up stories, and they will preferably remember some things but not others — for instance, they’re more likely to remember parts of the story that conform to their gender biases. Does ChatGPT do the same? It turns out that it does! (Check out the paper)
  • Olena Mykhailenko’s report on evolving values and political orientations of rural Canadians. Besides being an awesome example of how qualitative analysis can and does fit in cultural analytics, it was also an occasion to be exposed to a worldview that is extremely distant from the one most of the people in the audience are used to. It was a universe-expanding experience at multiple levels!
  • Vejune Zemaityte et al.’s work on the Soviet newsreel production industry. I hardly need to add anything to that (how cool is it to work on Soviet newsreels? Maybe it’s my cinephile soul speaking), but the data itself is fascinating: extremely rich and spanning practically a century, with discernible eras and temporal patterns.
  • Mauro Martino’s AI art exhibit. Mauro is an old friend of mine, and he’s always doing super cool stuff. In this case, he created a movie with Stable Diffusion, recreating the feel of living in Milan without actually using any image from Milan. The movie is being shown in various airports around the world.
  • Chico Camargo & Isabel Sebire made a fantastic analysis of narrative tropes analyzing the network of concepts extracted from TV Tropes (warning: don’t click the link if you want to get anything done today).

But my absolute favorite can only be: Corinna Coupette et al.’s “All the world’s a (hyper)graph: A data drama”. The presentation is about a relational database on Shakespeare plays, connecting characters according to their co-appearances. The paper describing the database is… well. It is written in the form of a Shakespearean play, with the authors struggling with the reviewers. This is utterly brilliant, bravo! See it for yourself as I cannot make it justice here.

As for myself, I was presenting a work with Camilla Mazzucato on our network analysis of the Turkish Neolithic site of Çatalhöyük. We’re trying to figure out if the material culture we find in buildings — all the various jewels, tools, and other artifacts — tell us anything about the social and biological relationships between the people who lived in those buildings. We can do that because the people at Çatalhöyük used to bury their dead in the foundations of a new building (humans are weird). You can see the presentation here:

After the conference, I was kindly invited to hold a seminar at CUDAN. This was a much longer dive into the kind of things that interest me. Specifically, I focused on how to use my node attribute analysis techniques (node vector distances, Pearson correlations on networks, and more to come) to serve cultural data analytics. You can see the full two hour discussion here:

And that’s about it! Cultural analytics is fun and I look forward to be even more involved in it!

Continue Reading

29 January 2013 ~ 0 Comments

Exploring Classical Archaeology

Science is awesome. It’s awesome to write and to read papers and learning a lot in the process. All this awesomeness comes with a price: the price of popularity. In the last decades, universities and research institutes became better and better in capturing talented people and in multiplying their scientific output. As a result, the number of peer-reviewed conferences and journals exploded, as well as the number of papers itself (the actual numbers are kind of scary). When browsing papers in this open sea of scientific publications, it’s hard to know what is relevant and hopeless to know what is related to what else.

Let’s make an example. Suppose you are back from a holiday in Italy and you are still amazed by the beautiful Greek temples of Paestum. You are a scientist, so you want to read papers (sigh). You go to a bibliographic database. You search for “Paestum” and you get a couple of hundreds works that spans from focused papers on Paestum to publications that mention Paestum by accident. They are sorted more or less by importance, as you would expect from Google Search. There’s not really much that tells you briefly what it is related to Paestum, where Paestum is in the landscape of classical archaeology and which are the sub-fields Paestum is more relevant to.

With this problem in mind, I teamed up with Maximilian Schich, a very bright guy I met when I was a guest researcher at Northeastern University in 2011. Max is an atypical art historian with a strong background in network analysis and he had the problem of finding a way to make sense of 370,000 publications by 88,000 authors collected in the Archäologische Bibliographie, a bibliographic database that collects and classifies literature in classical archaeology since 1956. Every publication is classified using 45,000 different classifications (think of tags describing the content of a paper).

Given our common interest in networks, and the fact that we were sharing a desk with a gigantic window providing inspiring landscapes for several months, we decided to team up and the result was a paper published in a KDD workshop. To solve our quest for Paestum, we created a browsing framework that adds two extra levels to the plain paper search I just described: a global level and a meso-level.

The global level aims at providing a general picture of a field, excluding details but allowing to understand where and how big are the sub-fields composing one field. It will tell us where Paestum is in the landscape of classical archaeology. At the global level, we created a network of classifications by connecting two of them if they are used to classify the same publication. On this network, we performed overlapping community discovery, i.e. we grouped together sets of classifications present in a set of related publications, allowing classifications to be in different communities at the same time. Instead of obtaining the expected structurless hairball, our community network shows structure. Classifications can be of different types: locations, people, periods, subject themes … . We assigned a color to each type. Then, we characterize each community (and link) with the type of classifications they contain.

We found that there is an uneven and structured distribution of the different types of classifications in communities and clusters of communities (see the above picture: the colors are not randomly placed, click on it to enlarge). We found the first pill to cure our Paestum headache: when you look for it in the global level, you obtain 12 different communities, each one giving you a piece of information of where Paestum is in the landscape of classical archaeology

The meso-level stands in the middle between the papers and the global level. Its function is to provide information about what significantly characterizes a sub-field, in our case the sub-fields and all the other classifications relevant for Paestum. In the meso-level we are interested in putting together a coherent set of classifications that properly describe a sub-field of classical archaeology. To create it, we consider papers as customers “purchasing” classifications at the classification supermarket (remember: each publication is tagged according to its content). We then mine association rules from these purchases. Association rules are a mining tool that efficiently explore all possible significant purchases of the same products by the same customers, with surprising results in the same line of the (urban) legendary beer-diapers correlation. In our case, we end up with a subject theme network where we understand which subject theme is related with which other (in the below picture, the Plastic Art and Sculpture branch, click on it to enlarge).

In this meso-level we can characterize each one of the 12 communities with the sub-fields Paestum is related to: the period of time of the construction of the temples, the Magna Graecia geographical cluster, the fate of ancient monuments (pieces of the temples were used in other buildings), you get the idea. You have the possibility of switching back on the global level, by checking one of the related classifications connected to Paestum in one (or more than one) community and go on virtually to infinity (and beyond). Here’s what Paestum looks like in our system:

Exploring the two layers is lots of fun, because they provide complementary information. By jumping from one to another, you can find interesting and possibly unexplored combinations of classifications. On the one hand, the global level gives you an overview of the sub-fields and where and how the different sub-fields relate to each other, at the price of having a community network, where the single classifications disappeared. On the other hand, the meso-level focuses on the significant connections between single classifications and it highlights a true description of what a sub-field is about, with the caveat that we lack a general picture of where this sub-field is located in classical archaeology. In other words, you can create your own research niche in classical archaeology and be a successful scientist in the field (please acknowledge us if you do).

If you like the pictures and you want to have a clearer idea, you can check out the poster related to the paper, as it has a much higher level of detail, it’s an easier read than the paper itself and it’s a great piece of decoration for your living room.

As said above, science is awesome. When science goes meta and it uses itself to make sense of itself, it’s breathtaking.

Continue Reading