Supermarket Data

With Diego Pennacchioli, Salvatore Rinzivillo, Dino Pedreschi and Fosca Giannotti

This is the dataset released as companion for the paper “Explaining the Product Range Effect in Purchase Data“, presented at the BigData 2013 conference (click on the link to download the paper).

To download the data, click here.

The ZIP pack contains three files:

  • supermarket_distances: three columns. The first column is the customer id, the second is the shop id and the third is the distance between the customer’s house and the shop location. The distance is a calculated in meters as a straight line so it does not take into account the road graph.
  • supermarket_prices: two columns. The first column is the product id and the second column is its unit price. The price is in Euro and it is calculated as the average unit price for the time span of the dataset.
  • supermarket_purchases: four columns. The first column is the customer id, the second is the product id, the third is the shop id and the fourth is the total amount of items that the customer bought the product in that particular shop. The data is recorded from January 2007 to December 2011.

To recreate the analysis, it is necessary to reconstruct the purchase matrix using the supermarket_purchases and then calculate the eigenvectors of rows and columns to obtain product and customer sophistication.

If you use this data, please cite: Pennacchioli, D., Coscia, M., Rinzivillo, S., Pedreschi, D. and Giannotti, F., Explaining the Product Range Effect in Purchase Data. In BigData, 2013.