Sport Predictability

Download ZIP Archive

This is the data and code necessary to reproduce the results from the paper “Which Sport is Becoming more Predictable? A Cross-Discipline Analysis of Predictability in Team Sports.”

The data folder only contains the observations that we can share, which excludes the discilines of baseball, football, handball, hockey, and volleyball, given that they come from proprietary sources. If you want to obtain the proprietary data, you will have to contact Enetpulse at sales@enetpulse.com.

The code folder requires the following python libraries to run: pandas, statsmodels, trueskill, numpy, networkx, sklearn, scipy.

The provided scripts should be run in sequence of their numbering, because some scripts depend on the outputs of the previous ones. They reproduce the main results of the paper and they are:

01_table_1.py: prints the Latex code of Table 1. Runs without parameters.

02_tables_2_s1.py: prints the Latex code for Table 2 or S1. Requires one command line positional parameter: the predictor to use. Options are: pagerank, elo, naive. Throws an error of the predictor is not specified.

03_table_3.py: prints a tab-separated table with the information used to make Table 3. Requires one command line positional parameter: the predictor to use. Options are: pagerank, elo, naive. Throws an error of the predictor is not specified.

04_fig_1.py: prints a tab-separated table with the information used to make Figure 1. Obviously, the figure will look different because it is based only on the subset of disciplines for which we can share the data. Requires one command line positional parameter: the predictor to use. Options are: pagerank, elo, naive. Throws an error of the predictor is not specified.

05_tables_4_s2.py: prints on screen Table 4, and then writes a csv file to recreate Table S2. You will need to run a simple linear model to get the regression table.

06_figs_2_3_part1.py & 07_figs_2_3_part2.py: print two tab-separated table with the information used to make Figures 2 and 3. Run without parameters. The first script generates the discipline’s overall (the thick red line in the foreground in the figure), the second script generates the results by league (the thin faded lines in the background).

08_tables_5_6_s3_s4.py: prints on screen Tables 5, 6, S3 and S4. Runs without parameters.

09_figs_4_s1_s2.py: saves in tab-separated files the data at the basis of Figures 4, S1 and S2, then prints on screen the results of the Mann-Whitney U test — remember results are different due to fewer disciplines considered. Runs without parameters.

10_fig_5.py: saves in tab-separated files the data at the basis of Figure 5. Runs without parameters.

predictor.py: An helper custom library implementing our prediction framework as outlines in Figure 6. It is used by the other scripts and should not be called by itself.

Download ZIP Archive