Ruining He, UCSD
Julian McAuley, UCSD
This dataset contains images and products purchased on Tradesy.com, including around 410 thousand actions by 20 thousand users
This dataset includes images, image features, and purchases.
tradesy.json.gz is a compressed json file of user data. Each user has four lists: "selling," "sold," "bought," and "want" with each being a list of item ids.
raw purchase data (4.3mb) - all user lists
tradesy_item_urls.json.gz maps item IDs to their image url and the product page for the item.
item and image URLs (21mb)
We extracted visual features from each product image using a deep CNN (see citation below). Image features are stored in a binary format, which consists of 10 characters (the product ID), followed by 4096 floats (repeated for every product). See files below for further help reading the data.
Note that item ids are padded (with SPACES) at the end to be 10-char long, so you need to remove them when loading the data.
visual features (5.5gb) - visual features for all products
Please cite the following if you use the data in any way:
VBPR: Visual bayesian personalized ranking from implicit feedback
R. He, J. McAuley
Reading the data
Data can be treated as python dictionary objects. A simple script to read user data is as follows:
Read image features