## Node Vector Distance

This archive contains the code and the data to replicate the study of network node distance measures detailed in the paper “The Node Vector Distance Problem in Complex Networks”. (For the code for the paper “Generalized Euclidean Measure to Estimate Network Distances”, scroll down) We provide only the code and the data we have the right to share, pointing to the original sources when the original material could not be repackaged.

The implementation folder contains the library implementing in Python 3.6.5 all known node vector distance measures. The minimal code to use it, assuming you placed network_distance.py in your path or in the same folder of execution, is the following:

import network_distance as nd

ge_dist = nd.ge(src, trg, G)

Assuming that G is a networkx unweighted graph and that src and trg are two dictionaries, whose keys are the nodes the agent is occupying and whose values are the occupation intensity. The above code calculates the Laplacian Generalized Euclidean distance.

The library prerequisites are the following (the versions are the ones for which the library has been developed, newer or older version could still work): Numpy 1.17.2, Scipy 1.3.1, Pandas 0.25.1, Networkx 2.4, pyemd 0.5.1. Additionally, some experiments also require statsmodels 0.10.1.

WARNING: The calculation of shortest path lengths relies on the multiprocessing library. Moreover, MAPF relies on running an external binary with the subprocess library. Both operations are tested and work reliably on Ubuntu 18.04.1 LTS. I know Windows might have issues with the code as it stands. Thus you might need to calculate the shortest paths independently from the library.

Each folder in this archive allows you to replicate the figures and the tables of the result section. The specific figure and table to replicate is determined by the folder name.

To run the MAPF node vector distance (and reproduce its results) you will need the binary insolver_reLOC from here. For some experiments, you will also need the binary_network benchmark from here to generate LFR synthetic networks.

This is the code necessary to replicate the results of the paper “Generalized Euclidean Measure to Estimate Network Distances”

The archive contains the library implementing in python the generalized Euclidean approach, the Graph Fourier Transform, and the Earth Mover Distance.

Each folder allows the replication of a subsection of the experiments. Simply cd into the folders and run the scripts. Each script should generate (among other things) a csv file with the result of the experiment.

IMPORTANT NOTES:

– We cannot repackage the Anobii dataset from Section 5.3. The script in the corresponding folder would work if you manage to obtain the following files from the original authors and place them in the sec53 folder:

– anobii-friendship.dat: a space-separated unweighted edgelist of the social relationships.

– anobii-bookeshelves-*.dat: a set of six files, each containing a space-separated list of which book is in which user’s bookshelf. The first column is the id of the user, the second column is the id of the book.

– anobii-books-*.dat: a set of six files, each containing a tab-separated list of metadata per book. The columns are, in order: id of the book, isbn of the book, title of the book, author of the book.

– The python libraries we use assume you are using Python 3 with the following additional packages: numpy, scipy, pandas, sklearn, networkx, and pyemd.