## Node Vector Distance

This page serves as an entry point for multiple publications. If you only want the NVD library **click here**. You can use it by putting the Python file in your path and then use it as:

import network_distance as nd

ge_dist = nd.ge(src, trg, G)

Assuming that src and trg are Python dictionaries whose keys are node IDs and values are the value corresponding to the node; and G is a networkx object. More advanced usage instructions are provided to replicate the results of the following papers (click to scroll):

- Pearson Correlations on Complex Networks;
- The Node Vector Distance Problem in Complex Networks;
- Generalized Euclidean Measure to Estimate Network Distances.

**Download Pearson Correlations on Complex Networks material**

This is the code necessary to reproduce the results in the paper “Pearson Correlations on Complex Networks”. The folder “implementation” contains the code necessary to calculate (among other things) the network Pearson correlation. Simply put the Python script in your path and then use it like shown:

import network_distance as nd

netcorr = nd.correlation(v1, v2, G)

v1 and v2 are python dictionaries and G is a networkx object.

The library has the following dependencies: numpy, pandas, networkx, pyemd, scipy. For this specific paper you only need numpy and networkx, so you could delete all the code you don’t need. Be aware that the calculation of shortest path lengths relies on the multiprocessing library. It has been tested on Ubuntu 20.04, but not on Windows or Mac. It likely won’t work on those platforms unless patched.

Each of the “secXX” folders allows you to replicate the result from that specific subsection. All scripts can be just ran in the terminal as they are. They either print their output on standard output in the terminal or in a file. Specific instructions:

sec031:

- fig02.py outputs four files with two tab-separated columns: the network correlation (column 1) and the non-network Pearson (column 2) for the random vectors.
- fig03.py outputs a file with two tab-separated columns: the network correlation (column 1) and the non-network Pearson (column 2) for a Stochastic BlockModel.

sec032:

- fig04.py produces an LFR network and some node attributes for differently correlated vectors. Note that the LFR benchmark creation might fail, so you might need to try a few times before succeeding.
- fig05a.py outputs the table with network correlation values for a random sample of pairs of sources at a given distance in a GNM graph. Outputs a file with two tab-separated columns: the network distance between sources and the correlation value.
- fig05b.py is the same as fig05a.py, but for a Small World network rather than a GNM.
- tab1-2_*.py create the input for the regression between the path length between sources and the network correlation. Each script produces the correlation for its corresponding network topology. Run all these scripts before tab1-2.r.
- tab1-2.r runs the regression and calculates accuracy and mean absolute error to reproduce Tables 1 and 2 in the paper.

sec04:

- fig6.py should be run before tab3.r. It creates a file with four tab-separated columns: the two countries and their network and non-network correlations.
- tab3.r runs the regression between countries’ trade volume and geographical distance, and their trade correlations.
- tab4_part1.py performs the optimization of the product space via simulated annealing. WARNING: it takes soooooo long. That’s why I include already an optimized version (which is slightly more optimized than the version used in the paper).
- tab4_part2.py is an equivalent of fig6.py, but for tab4.r
- tab4.r runs the regression between countries’ trade volume and geographical distance and their trade correlations, on the Product Space at the 2 digit level, with and without optimization.

**Download Pearson Correlations on Complex Networks material**

This archive contains the code and the data to replicate the study of network node distance measures detailed in the paper “The Node Vector Distance Problem in Complex Networks“. We provide only the code and the data we have the right to share, pointing to the original sources when the original material could not be repackaged.

The implementation folder contains the library implementing in Python 3.6.5 all known node vector distance measures. The minimal code to use it, assuming you placed network_distance.py in your path or in the same folder of execution, is the following:

import network_distance as nd

ge_dist = nd.ge(src, trg, G)

Assuming that G is a networkx unweighted graph and that src and trg are two dictionaries, whose keys are the nodes the agent is occupying and whose values are the occupation intensity. The above code calculates the Laplacian Generalized Euclidean distance.

The library prerequisites are the following (the versions are the ones for which the library has been developed, newer or older version could still work): Numpy 1.17.2, Scipy 1.3.1, Pandas 0.25.1, Networkx 2.4, pyemd 0.5.1. Additionally, some experiments also require statsmodels 0.10.1.

WARNING: The calculation of shortest path lengths relies on the multiprocessing library. Moreover, MAPF relies on running an external binary with the subprocess library. Both operations are tested and work reliably on Ubuntu 18.04.1 LTS. I know Windows might have issues with the code as it stands. Thus you might need to calculate the shortest paths independently from the library.

Each folder in this archive allows you to replicate the figures and the tables of the result section. The specific figure and table to replicate is determined by the folder name.

To run the MAPF node vector distance (and reproduce its results) you will need the binary insolver_reLOC from here. For some experiments, you will also need the binary_network benchmark from here to generate LFR synthetic networks.

This is the code necessary to replicate the results of the paper “Generalized Euclidean Measure to Estimate Network Distances”

The archive contains the library implementing in python the generalized Euclidean approach, the Graph Fourier Transform, and the Earth Mover Distance.

Each folder allows the replication of a subsection of the experiments. Simply cd into the folders and run the scripts. Each script should generate (among other things) a csv file with the result of the experiment.

IMPORTANT NOTES:

– We cannot repackage the Anobii dataset from Section 5.3. The script in the corresponding folder would work if you manage to obtain the following files from the original authors and place them in the sec53 folder:

– anobii-friendship.dat: a space-separated unweighted edgelist of the social relationships.

– anobii-bookeshelves-*.dat: a set of six files, each containing a space-separated list of which book is in which user’s bookshelf. The first column is the id of the user, the second column is the id of the book.

– anobii-books-*.dat: a set of six files, each containing a tab-separated list of metadata per book. The columns are, in order: id of the book, isbn of the book, title of the book, author of the book.

– The python libraries we use assume you are using Python 3 with the following additional packages: numpy, scipy, pandas, sklearn, networkx, and pyemd.