Amazon question/answer data
Julian McAuley, UCSD
Note: This server has been retired! You will be redirected in 5 seconds.
Description
This dataset contains Question and Answer data from Amazon, totaling around 1.4 million answered questions.
This dataset can be combined with Amazon product review data, available here, by matching ASINs in the Q/A dataset with ASINs in the review data. The review data also includes product metadata (product titles etc.).
Files
Sample question (and answer):
where
- asin - ID of the product, e.g. B000050B6Z
- questionType - type of question. Could be 'yes/no' or 'open-ended'
- answerType - type of answer. Could be 'Y', 'N', or '?' (if the polarity of the answer could not be predicted). Only present for yes/no questions.
- answerTime - raw answer timestamp
- unixTime - answer timestamp converted to unix time
- question - question text
- answer - answer text
Per-category files
Below are files for individual product categories, which have already had duplicate item reviews removed.
Appliances (9,011 questions) |
Arts Crafts and Sewing (21,262 questions) |
Automotive (89,923 questions) |
Baby (28,933 questions) |
Beauty (42,422 questions) |
Cell Phones and Accessories (85,865 questions) |
Clothing Shoes and Jewelry (22,068 questions) |
Electronics (314,263 questions) |
Grocery and Gourmet Food (19,538 questions) |
Health and Personal Care (80,496 questions) |
Home and Kitchen (184,439 questions) |
Industrial and Scientific (12,136 questions) |
Musical Instruments (23,322 questions) |
Office Products (43,608 questions) |
Patio Lawn and Garden (59,595 questions) |
Pet Supplies (36,607 questions) |
Software (10,636 questions) |
Sports and Outdoors (146,891 questions) |
Tools and Home Improvement (101,088 questions) |
Toys and Games (51,486 questions) |
Video Games (13,307 questions) |
Questions with multiple answers
Below are updated Q/A files as used in our ICDM paper. Importantly, these files include multiple answers to each question, allowing the ambiguity of answers to be studied.
Automotive (59,415 questions, 233,784 answers) |
Baby (21,996 questions, 82,034 answers) |
Beauty (32,936 questions, 125,652 answers) |
Cell Phones and Accessories (60,761 questions, 237,220 answers) |
Clothing Shoes and Jewelry (17,233 questions, 66,709 answers) |
Electronics (231,449 questions, 867,921 answers) |
Grocery and Gourmet Food (15,373 questions, 62,243 answers) |
Health and Personal Care (63,962 questions, 255,209 answers) |
Home and Kitchen (148,728 questions, 611,335 answers) |
Musical Instruments (17,971 questions, 67,326 answers) |
Office Products (33,984 questions, 130,088 answers) |
Patio Lawn and Garden (47,574 questions, 193,780 answers) |
Pet Supplies (30,848 questions, 133,274 answers) |
Sports and Outdoors (114,496 questions, 444,900 answers) |
Tools and Home Improvement (81,609 questions, 327,597 answers) |
Toys and Games (39,549 questions, 151,779 answers) |
Video Games (7,744 questions 28,893 answers) |
Citation
Please cite the following if you use the data in any way:
Modeling ambiguity, subjectivity, and diverging viewpoints in opinion question answering systems
Mengting Wan, Julian McAuley
International Conference on Data Mining (ICDM), 2016
pdf
Addressing complex and subjective product-related queries with customer reviews
Julian McAuley, Alex Yang
World Wide Web (WWW), 2016
pdf
Code
Reading the data
Data can be treated as python dictionary objects. A simple script to read any of the above the data is as follows:
Convert to 'strict' json
The above data can be read with python 'eval', but is not strict json. If you'd like to use some language other than python, you can convert the data to strict json as follows:
Pandas data frame
This code reads the data into a pandas data frame: