Dataset Overview
Complete statistical portrait of the Home & Kitchen review data, powered by Amazon Reviews'23
Dataset Overview
The Home & Kitchen category is the second-largest in the Amazon Reviews'23 dataset, containing 67,409,944 reviews spanning over 15 years (1996–2023). This page presents a comprehensive statistical portrait of the data powering this site.
Data Source: Amazon Reviews'23 by McAuley Lab, UC San Diego.
Raw file: Home_and_Kitchen.jsonl
Rating Distribution
Figure: Distribution of star ratings across all Home & Kitchen reviews.
Amazon reviews skew heavily positive — this is a well-known platform bias.
| Rating | Count | Percentage |
|---|---|---|
| 5 ★ | 44,701,824 | 66.3% |
| 4 ★ | 7,536,382 | 11.2% |
| 3 ★ | 4,600,479 | 6.8% |
| 2 ★ | 3,410,803 | 5.1% |
| 1 ★ | 7,160,456 | 10.6% |
Average Rating: 4.18 / 5.00
Why This Matters
Over 83% of reviews are 4–5 stars. This means:
- Raw star ratings are a weak signal for product comparison — most products cluster near 4.5
- Negative reviews (1–2 stars) are disproportionately valuable for identifying product defects
- Review text is more informative than ratings for making purchase decisions
Review Authenticity
| Metric | Count | Percentage |
|---|---|---|
| Verified Purchases | 62,921,814 | 93.3% |
| Unverified | 4,488,130 | 6.7% |
About 6.7% of reviews come from unverified sources (Vine reviewers, review swaps, direct reviews without purchase). The verified subset is generally more trustworthy.
Review Helpfulness
Amazon users can upvote reviews as “helpful.” This signal helps surface the most informative content.
Figure: Log-scale distribution of helpful votes. Note the extreme skew — most reviews receive 0 votes.
| Helpful Votes | Review Count | Percentage |
|---|---|---|
| 0 votes | 51,843,462 | 76.9% |
| ≥ 1 vote | 15,566,473 | 23.1% |
| ≥ 5 votes | 2,446,886 | 3.6% |
| ≥ 10 votes | 1,085,608 | 1.6% |
| ≥ 50 votes | 144,113 | 0.2% |
Average helpful votes per review: 1.02 Most helpful review: 34619 votes
“What can I say about the 571B Banana Slicer that hasn’t already been said about the wheel, penicillin, or the iPhone…. this is one of the greatest inventions of all time. My husband and I would argu…”
Insight
Only 23.1% of reviews receive any helpful vote. This means:
- Reviews with 5+ helpful votes are rare signals of genuine insight
- About 144,113 reviews (0.3%) carry 50+ votes — these are gold for content curation
Review Length Distribution
Figure: Review length at key percentiles. The dashed red line marks the average.
| Percentile | Characters |
|---|---|
| 25th | 47 |
| 50th (median) | 106 |
| 75th | 216 |
| 90th | 398 |
Average length: 179 characters
Half of all reviews are under 106 characters — roughly 1–2 sentences. Only 10% exceed 398 characters. Long, detailed reviews are scarce and valuable.
User-Submitted Images
| Metric | Count | Percentage |
|---|---|---|
| Reviews with photos | 5,636,288 | 8.4% |
| Text-only reviews | 61,773,656 | 91.6% |
Only 8.4% of reviews include user photos — but these are the most trusted by shoppers.
Reviews Over Time
Figure: Review volume (bars) and average rating (line) per year.
| Year | Reviews | 5★ % | Verified % |
|---|---|---|---|
| 1998 | 2 | 100.0% | 0.0% |
| 1999 | 9 | 33.3% | 0.0% |
| 2000 | 1,001 | 64.0% | 13.3% |
| 2001 | 2,209 | 59.5% | 20.6% |
| 2002 | 3,695 | 55.8% | 24.1% |
| 2003 | 5,938 | 52.6% | 26.2% |
| 2004 | 7,353 | 49.8% | 25.5% |
| 2005 | 15,419 | 49.1% | 30.2% |
| 2006 | 24,972 | 52.8% | 36.7% |
| 2007 | 58,322 | 57.8% | 45.9% |
| 2008 | 68,207 | 56.4% | 48.1% |
| 2009 | 92,012 | 54.9% | 60.1% |
| 2010 | 159,296 | 55.2% | 78.9% |
| 2011 | 278,672 | 55.8% | 81.5% |
| 2012 | 500,445 | 58.4% | 85.6% |
| 2013 | 1,328,715 | 61.0% | 92.8% |
| 2014 | 2,222,935 | 63.6% | 89.0% |
| 2015 | 3,446,051 | 65.5% | 94.1% |
| 2016 | 4,493,405 | 66.7% | 92.3% |
| 2017 | 5,059,065 | 66.6% | 93.2% |
| 2018 | 5,833,876 | 67.3% | 95.3% |
| 2019 | 8,229,519 | 71.2% | 96.3% |
| 2020 | 10,540,330 | 68.6% | 96.2% |
| 2021 | 11,436,430 | 65.7% | 95.6% |
| 2022 | 9,601,338 | 63.1% | 90.3% |
| 2023 | 4,000,728 | 64.1% | 86.8% |
Statistics computed from the raw dataset. Based on a full scan of all available reviews.