Michele Coscia

31 January 2024 ~ 0 Comments

Predictability, Home Advantage, and Fairness in Team Sports

There was a nice paper published a while ago by the excellent Taha Yasseri showing that soccer is becoming more predictable over time: from the early 90s to now, models trying to guess who would win a game had grown in accuracy. I got curious and asked myself: does this hold only for soccer, or is it a general phenomenon across different team sports? The result of this question was the paper: “Which sport is becoming more predictable? A cross-discipline analysis of predictability in team sports,” which just appeared on EPJ Data Science.

My idea was that, as there is more and more money and professionalism in sport, those who are richer will become stronger over time, and dominate for a season, which will make them more rich, and therefore more dominant, and more rich, until you get Juventus, which came in first or second in almost 50% of the 119 soccer league seasons played in Italy.

My first step was to get data about 300,000 matches played across 49 leagues in nine disciplines (baseball, basket, cricket, football, handball, hockey, rugby, soccer, and volleyball). My second step was to blatantly steal the entire methodology from Taha’s paper because, hey, why innovate when you can just copy the best? (Besides, this way I could reproduce and confirm their finding, at least that’s the story I tell myself to fall asleep at night)

Predictability (y axis, higher means more predictable) over time (x axis) across all disciplines. No clear trend here!

The first answer I got was that Taha was right, but mostly only about soccer. Along with volleyball (and maybe baseball) it is one of the few disciplines that is getting more predictable over time. The rest of the disciplines are a mixed bag of non-significant results and actual decreases in predictability.

One factor that could influence these results is home advantage. Normally, the team playing home has slighter higher odds of winning. And, sometimes, not so slight. In the elite rugby tournament in France, home advantage is something like 80%. To give an idea, 2014 French champions Toulon only won 4 out of their 13 away games, and two of them were against the bottom two teams of the league that got relegated that season.

It’s all in the pilou pilou. Would you really go to Toulon and tell this guy you expect to win? Didn’t think so.

Well, this is something that actually changed almost universally across disciplines: home advantage has been shrinking across the board — from an average of 64% probability of home win in 2011 to 55% post-pandemic. The home advantage did shrink during Covid, but this trend started almost a decade before the pandemic. The little bugger did nothing to help — having matches played behind closed doors altered the dynamics of the games –, but it only sped up the trend, it didn’t create it.

What about my original hypothesis? Is it true that the rich-get-richer effect is behind predictability? This can be tested, because most American sports are managed under a socialist regime: players have unions, the worst performing teams in one season can pick the best rookies for the next, etc. In Europe, players don’t have unions and if you have enough money you can buy whomever you want.

Boxplot with the distributions of predictability for European sports (red) and American ones (green). The higher the box, the more predictable the results.

When I split leagues by the management system they follow, I can clearly see that indeed those under the European capitalistic system tend to be more predictable. So next time you’re talking with somebody preaching laissez-faire anarcho-capitalism tell them that, at least, under socialism you don’t get bored at the stadium by knowing in advance who’ll win.

Tags: complex networks, covid19, data mining, data science, machine learning, predictability, socialism, sport, team sports

23 July 2019 ~ 0 Comments

Lipari 2019 Report

Conferencing

Last week I answered the call of duty and attended the complex network workshop in the gorgeous Mediterranean island of Lipari (I know, I’m a selfless hero). I thank the organizers for the invitation, particularly Giancarlo Ruffo, fellow nerd Roberta Sinatra, and Alfredo Ferro. This is my usual report, highlighting the things that most impressed me during the visit. Well, excluding the granitas, the beaches, and the walks, because this is not a blog about tourism, however difficult it might be to tell the difference.

Differently from NetSci, there weren’t parallel sessions, so I was able to attend everything. But I cannot report on everything: I don’t have the space nor the skill. So, to keep this post from overflowing and taking over the entire blog, I need to establish some rules. I will only write about a single talk per session, excluding the session in which I presented — I was too tense mentally preparing for my talk to give justice to the session.

Any overrepresentation of Italian speakers in the following line-up is — quite obviously — part of your imagination.

Get ready for a bunch of sunset pictures. Did you know Lipari is a net exporter of sunsets?

Session 1: Ronaldo Menezes talked about spatial concentration and temporal regularities in crime. Turns out, you can use network and data science to fight the mob. One of Ronaldo’s take-home messages was that police should try to nudge criminals to operate outside the areas where they’re used to work in. The more you can push them to unfamiliar territory, the more mistakes they’ll make.

Session 2: The theme of the workshop was brain research, and Giulia Bassignana‘s talk on multiple sclerosis was the first that caught my eye. Giulia presented some models to study the degeneration of physical connections in the brain. While I love all that is related to the brain, seeing people working on the actual physical connections tickles me more than looking at correlation networks from fMRI data, and Giulia was really spot on.

Session 3: Daniela Paolotti presented a wide array of applications of data science for the greater good. Her talk was so amazing it deserves an entire blog post by itself. So I’ll selfishly only mention a slice of it: a project in which Daniela is able to predict the spread of Zika by analyzing human mobility patterns from cellphone data. Why selfishly? Because I humbly played a small role in it by providing the cellphone data from Colombia.

That on the background is Stromboli. With my proverbial bravery, I did not get any closer than this to that lava-spewing monster.

Session 4: If some of you are looking for an academic job this year, I suggest you to talk with Alessandra Urbinati, who presented some intriguing analysis on scientific migration networks. Alessandra showed which countries are emitters and attractors — or both. My move to Denmark seemed to be spot on, as it ranks highly as an attractor. Among countries of comparable size, only Switzerland does a bit better — that’s probably why my sister works there (always one-upping me!).

Session 6: As her custom, Tina Eliassi-Rad proved yet again she is completely unable to give an uninteresting talk. This time she talked about some extremely smart way to count occurrences of graph motifs without going through the notoriously expensive graph isomorphism problem. Her trick was to use the spectrum of non-backtracking matrices. Tina specializes in finding excellent solutions to complex problems by discovering hidden pathways through apparently unrelated techniques. (Seriously, Tina rocks.)

Session 7: Ciro Cattuto‘s talk on graph embeddings really had it all. Not only did Ciro present an extremely smart way to create graph embeddings for time-evolving networks, but he also schooled everybody on the basics of the embedding technique. Basically graph embeddings boil down to representing nodes as vectors via random walks, which can then be used as input for machine learning. I always love when a talk not only introduces a new technique, but also has pedagogical elements that make you a better researcher.

To be fair, we tried to apply some natural selection and get rid of the weakest network scientists by climbing Vulcano. Turns out, we are all pretty fit, so we’re back to evaluating ourselves via the quality of our work, I guess. *shrugging emoticon*

Session 8: Philipp Hövel spoke about accelerating dynamics of collective attention. Have you ever felt that memes and fads seem to pop in and out of existence faster and faster? Philipp showed it’s not your imagination: we’re getting better and faster at producing popular content on social media. This causes a more rapid exhaustion of humanity’s limited attention and results in faster and faster meme cycles.

Session 9: Only tangentially related to networks, Daniel Fraiman talked about some intriguing auction models. The question is: how do you price a product with zero marginal cost — meaning that, once you have the infrastructure, producing the next item is essentially free? The answer is that you don’t: you have an auction where people state their price freely, and at each new bid the current highest bidder gets the next item. This model works surprisingly well in making the full system converge to the actual value of the product.

Session 10: Andrea Tacchella‘s was another talk that was close to my heart. He taught us a new and better way to build the Product Space. I am the author of the current incarnation of it in the Atlas of Economic Complexity, so I ought to hate Andrea. However, my Product Space is from 2011 and I think it is high time to have a better version. And Andrea’s is that version.

Is this group photo a possible contestant with 1927’s 5th Solvay for the best conference group picture? … No, it isn’t, not even close. Why would anyone even bring that up?

Session 11: Did I mention graph isomorphism before? Did I also mention how fiendishly complex of a problem that is? Good. If you can avoid dealing with it, you’ll be happier. But, when life throws graph isomorphism problems at you, first you make isomorphism lemonade, then you can hardly do better than calling Alfredo Pulvirenti. Alfredo showed a very efficient way to solve the problem for labeled multigraphs.

Session 12: The friendship paradox is a well-known counter-intuitive aspect of social networks: on average your friends are more popular than you. Johan Bollen noticed that there is also a correlation between the number of friends you have and how happy you are. Thus, he discovered that there is a happiness paradox: on average your friends are happier than you. Since we evaluate our happiness by comparison, the consequence is that seeing all these happy people on social media make us miserable. The solution? Unplug from Facebook, for instance. If you don’t want to do that, Johan suggests that verbalizing what makes you unhappy is a great way to feel better almost instantly.

And now I have to go back to Copenhagen? Really?

Now, was this the kind of conference where you find yourself on a boat at 1AM in the morning singing the Italian theme of Daitarn 3 on a guitar with two broken strings? I’m not saying it was, but I am saying that that is an oddly specific mental image. Where was I going with this concluding paragraph? I’m not sure, so maybe I should call it quits. Invite me again, pls.

Tags: brain networks, complex networks, data science, lipari, workshop

09 July 2018 ~ 0 Comments

Small Changes, Big Changes

General Info

A very quick note to point out a tiny change in my website. The top banner that always welcomes you to the home page of this blog has been changed. It is an insignificant text difference, which hides a slightly larger one in my life. I’ve changed jobs! After six years of postdoc fellowship, I decided I had enough of it and made the big jump: I’m now an assistant professor. I’ve also decided to hop to a different side of the Atlantic pond. I left the United States and decided to make a home of Denmark. My new affiliation is the IT University of Copenhagen.

There are a few more things changing in the next weeks/months. The most important one is that I’ll start teaching the class on Network Analysis. This is a class for the first semester of the second year in the Bachelor program of Data Science. This Data Science ship is captained by the amazing Natalie Schluter. I’m not alone in this adventure: my friend Luca Rossi will be co-teaching with me this year. I look forward to give my contribution to the data science and network science development of the Copenhagen area.

This means that you’ll see soon a new tab in the top menu of the blog: “Teaching”. I’ll use it to host the materials from the classes I’ll be teaching: slides, additional texts, etc. This should allow you to get a hint of my lecturing style and — why not? — see if you can learn something new.

If you are in the Copenhagen area and you think you’d like to talk a bit of network business, you can probably find me at my new shiny office, while I try to hide behind the largest monitors I could find. The office number is 4E04, meaning that I’m at the 4th floor, wing E, office 4 (on the right).

Interesting times ahead!

Tags: assistant professor, career, data science, new job, teaching

Connecting Humanities

Predictability, Home Advantage, and Fairness in Team Sports

Lipari 2019 Report

Small Changes, Big Changes

Twitter

People I find interesting

Categories

Recent Posts

Archives