Tradesy data

Ruining He, UCSD

Julian McAuley, UCSD

Description

This dataset contains images and products purchased on Tradesy.com, including around 410 thousand actions by 20 thousand users

This dataset includes images, image features, and purchases.

Files

Purchases

tradesy.json.gz is a compressed json file of user data. Each user has four lists: "selling," "sold," "bought," and "want" with each being a list of item ids.

raw purchase data (4.3mb) - all user lists

URLs

tradesy_item_urls.json.gz maps item IDs to their image url and the product page for the item.

item and image URLs (21mb)

Visual Features

We extracted visual features from each product image using a deep CNN (see citation below). Image features are stored in a binary format, which consists of 10 characters (the product ID), followed by 4096 floats (repeated for every product). See files below for further help reading the data.

Note that item ids are padded (with SPACES) at the end to be 10-char long, so you need to remove them when loading the data.

visual features (5.5gb) - visual features for all products

Citation

Please cite the following if you use the data in any way:

VBPR: Visual bayesian personalized ranking from implicit feedback
R. He, J. McAuley
AAAI, 2016
pdf

Code

Reading the data

Data can be treated as python dictionary objects. A simple script to read user data is as follows:

userData = eval(gzip.open(path, 'r').read())

Read image features

import struct def readImageFeatures(path): f = open(path, 'rb') while True: userId = f.read(10) userId = userId.strip() if userId == '': break feature = [] for i in range(4096): feature.append(struct.unpack('f', f.read(4))) yield asin, feature