Community Discovery Network

Download the Algorithm Similarity Network

This is the companion supplementary material for the paper “Discovering Communities of Community Discovery”, currently under review to the ASONAM conference. The package includes the data & code to replicate the main results of the paper.

If you’re only interested in the main Algorithm Similarity Network, you can find it in the asn.tsv file in the main folder of the ZIP archive. The network’s nodes are algorithm solving the community discovery problem. I ran more than a thousand benchmarks and I counted the number of times in which the two algorithms returned very similar results, which is the basis for the edges. This version of the network has been filtered using the noise corrected backboning technique. The file has four columns: the first two are the algorithms’ names, and the second two are: nij (the number of benchmark in which the two algorithms agree) and score (a measure of significance of the edge weight).

Otherwise, the replication material is divided into two folders.

01_network contains the full non-backboned network (asn_all_complete_top5count.tsv) and various view of it, plus Python scripts to generate figures 1 & 2 in the paper.

02_robustness contains several alternative ways to build ASN: by averaging similarity scores (asn_all_complete_nmiavg.tsv); imposing a hard threshold to count the similarities between two algorithms (asn_all_backbone_hardthresh.tsv); only considering LFR benchmarks (asn_lfr_complete_top5count.tsv); or only considering real world networks as benchmarks (asn_real_complete_top5count.tsv).

Note that the Python scripts require a set of standard Python scientific libraries (numpy, scipy, pandas, scikit-learn, networkx, …). The package also includes compiled binaries to calculate Infomap communities (code from Daniel Edler & Martin Rosvall) and the overlapping normalized mutual information (code from McDaid’s GitHub). I compiled them on Ubuntu 18.04, thus they might not work on other systems and you’re encouraged to re-compile them yourself.

Due to space constraints, I could not cite the authors of all 73 algorithms in my paper. Thus I report here a map on how to find out which implementations I used. The tag I use to identify the algorithm is consistent with the one I use in the figure. (Disclaimer: of course, some of these links are bound to break in the future. I accessed them last time around November 2018, so the Internet Archive could help)

edgebetween, fastgreedy, hrg, labelperc, leadeig, louvain, spinglass, walktrap: igraph implementation in R (https://igraph.org/r/)
mcl: https://www.micans.org/mcl/#source
tabu, extr: [5] and [6] in http://deim.urv.cat/~sergio.gomez/radatools.php
ganet, ganet+, moganet: http://staff.icar.cnr.it/pizzuti/codes.html
ganxis: https://sites.google.com/site/communitydetectionslpa/
conclude: http://www.emilio.ferrara.name/code/conclude/
conga, copra, cliquemod, peacock: http://gregory.org/research/networks/
mlrmcl: https://sites.google.com/site/stochasticflowclustering/
metis: http://glaros.dtc.umn.edu/gkhome/metis/hmetis/download
slpa, fluid, kerlin, kclique: networkx implementation in python (https://networkx.github.io/documentation/stable/)
pmm: http://leitang.net/heterogeneous_network.html
crossass: https://faculty.mccombs.utexas.edu/deepayan.chakrabarti/software.html
demon: https://www.michelecoscia.com/?page_id=42
bigclam, agm: part of the SNAP library (https://snap.stanford.edu/)
hlc: http://barabasilab.neu.edu/projects/linkcommunities/
tiles: https://github.com/GiulioRossetti/TILES
oslom: https://sites.google.com/site/andrealancichinetti/software
kmeans, dbscan, ward, agglomerative, spectral, meanshift, affinity, birch: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster
code-dense: https://link.springer.com/article/10.1007/s10618-014-0373-y
moses, collapsed-sbm: https://sites.google.com/site/aaronmcdaid/downloads
gce: https://sites.google.com/site/greedycliqueexpansion/
ilcd: http://cazabetremy.fr/rRessources/iLCD.html
svinet, mmsb: https://github.com/premgopalan/svinet
bnmtf: http://www.cse.ust.hk/~dyyeung/code/BNMTF.zip
rmcl: https://rdrr.io/github/DavidGilgien/ML.RMCL/man/ML_RMCL.html
OLC: http://www-personal.umich.edu/~mejn/OverlappingLinkCommunities.zip
cme-td, cme-bu: https://github.com/linhongseba/ContentMapEquation
edgeclust: http://homes.sice.indiana.edu/filiradi/Data/radetal_algorithm.tgz
infocentr: https://www.w3.org/People/Massimo/papers/2004/community_pre_04.pdf (my own implementation)
msg, vm: http://www.biochem-caflisch.uzh.ch/node/385
ocg: http://tagc.univ-mrs.fr/tagc/index.php/software/ocg
savi: http://dsec.pku.edu.cn/~tieli/
mixnet: http://www.math-evry.cnrs.fr/logiciels/mixnet/mixnet
vbmod: https://github.com/jhofman/vbmod_python
bridgebound, bagrowLocal, clausetLocal, lwplocal: https://github.com/kleinmind/bridge-bounding
infomap, infomap-overlap: http://www.mapequation.org/code.html#Download-and-compile
graclus: http://www.cs.utexas.edu/users/dml/Software/graclus.html
graclus2stage: https://www.researchgate.net/profile/Padraig_Cunningham/publication/222089361_Community_Finding_in_Large_Social_Networks_Through_Problem_Decomposition/links/09e4150fe534d66b1c000000/Community-Finding-in-Large-Social-Networks-Through-Problem-Decomposition.pdf (my own implementation)
fuzzyclust: https://github.com/ntamas/fuzzyclust
netcarto: http://seeslab.info/downloads/network-cartography-netcarto/
linecomms: https://sites.google.com/site/linegraphs/ (to generate the line graph + igraph’s implementation of Louvain)

The 819 real world networks we use as a benchmark come from the following papers/links:

E. Gabasova, “The Star Wars social network.” Evelina Gabasova’s Blog
T. Snijders, G. van de Bunt, and C. Steglich, “Introduction to stochastic actor-based models for network dynamics.” Social Networks 32(1), 44-60 (2010)
G. Van de Bunt, M. Van Duijn, and T. Snijders. “Friendship networks through time: An actor-oriented dynamic statistical network model.” Computational & Mathematical Organization Theory 5.2. 167-192. (1999)
C. J. Rhodes and P. Jones, “Inferring missing links in partially observed social networks.” Journal of the Operational Research Society 60(10), 1373-1383 (2009)
K. Descormiers, and C. Morselli, “Alliances, conflicts, and contradictions in Montreal’s street gang landscape.” International Criminal Justice Review 21(3), 297-314 (2011)
S. R. Sundaresan et al., “Network metrics reveal differences in social organization between two fission-fusion species, Grevy’s zebra and onager.” Oecologia 151(1), 140-149 (2007)
M. W. Schein and M. H. Fohrman. “Social dominance relationships in a herd of dairy cattle.” The British J. of Animal Behaviour 3(2), 45-55 (1955)
A. G. S. Framis, “Illegal networks or criminal organizations: Power, roles and facilitators in four cocaine trafficking structures.” In ‘Third Annual Illicit Networks Workshop,’ Montreal (2011)
A. Beveridge and J. Shan, “Network of Thrones.” Math Horizons 23(4), 18-22 (2016)
W. De Nooy, “A literary playground: Literary criticism and balance theory.” Poetics 26 (5-6), 385-404 (1999)
http://vlado.fmf.uni-lj.si/pub/networks/data/bio/foodweb/foodweb.htm
http://wwwlovre.appspot.com/support.jsp
D. F Lott, “Dominance relations and breeding rate in mature male American bison.” Zeitschrift für Tierpsychologie 49(4), 418-432 (1979)
J. Kaminski et al., “Moviegalaxies – Social Networks in Movies.” https://doi.org/10.7910/DVN/T4HBA3, Harvard Dataverse, V3 (2018)

Download the Algorithm Similarity Network