Training and testing data set of fine food reviews.
Usage
attach_small_fine_foods(envir = parent.frame(), quiet = FALSE, ...)
Arguments
- envir
Environment to load data sets into. Defaults to
parent.frame()
.- quiet
Logical, should function announce what data sets are loaded.
- ...
Arguments passed to
pins::pin_read()
.
Details
These data are from Amazon, who describe it as "This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plaintext review."
A subset of the data are contained here and are split into a training and test set. The training set sampled 10 products and retained all of their individual reviews. Since the reviews within these products are correlated, we recommend resampling the data using a leave-one-product-out approach. The test set sampled 500 products that were not included in the training set and selected a single review at random for each.
There is a column for the product, a column for the text of the review, and a factor column for a class variable. The outcome is whether the reviewer gave the product a 5-star rating or not.
tibble print
attach_small_fine_foods()
#> The following data sets have been loaded:
#> `training_data`, `testing_data`
#> Silence this message by setting `quiet = TRUE`.
training_data
#> # A tibble: 4,000 x 3
#> product review score
#> <chr> <chr> <fct>
#> 1 B000J0LSBG "this stuff is not stuffing its not good at all save yo~ other
#> 2 B000EYLDYE "I absolutely LOVE this dried fruit. LOVE IT. Whenever I ~ great
#> 3 B0026LIO9A "GREAT DEAL, CONVENIENT TOO. Much cheaper than WalMart and~ great
#> 4 B00473P8SK "Great flavor, we go through a ton of this sauce! I discove~ great
#> 5 B001SAWTNM "This is excellent salsa/hot sauce, but you can get it for ~ great
#> 6 B000FAG90U "Again, this is the best dogfood out there. One suggestion~ great
#> 7 B006BXTCEK "The box I received was filled with teas, hot chocolates, a~ other
#> 8 B002GWH5OY "This is delicious coffee which compares favorably with muc~ great
#> 9 B003R0MFYY "Don't let these little tiny cans fool you. They pack a lo~ great
#> 10 B001EO5ZXI "One of the nicest, smoothest cup of chai I've made. Nice m~ great
#> # i 3,990 more rows
testing_data
#> # A tibble: 1,000 x 3
#> product review score
#> <chr> <chr> <fct>
#> 1 B005GXFP60 "These are the best tasting gummy fruits I have ever eaten.~ great
#> 2 B000G7V394 "I have been a consumer of Snyders hard sourdough pretzels ~ great
#> 3 B004WJAULO "This tastes so bad, I'm considering throwing it away. But~ other
#> 4 B003D4MBOS "This product is way too pricey to have so little chocolate~ other
#> 5 B0030Z95B2 "I bought this for my Mom as a gift to accompany her Dolce ~ great
#> 6 B000LRH4WE "This thing is 7 dollars in US?I know its exported from Cyp~ other
#> 7 B000Z91SZW "This tea tastes like hot cocoa. Very pleasant experience.~ other
#> 8 B00563VNEI "This product is great for a quick cup of coffee. If you us~ great
#> 9 B0085NFX2O "Grilled out brats, chicken, and burgers for the entire fam~ great
#> 10 B000LRH7XK "I ordered 4 cans of this product. The product is fine, bu~ other
#> # i 990 more rows
glimpse()
tibble::glimpse(training_data)
#> Rows: 4,000
#> Columns: 3
#> $ product <chr> "B000J0LSBG", "B000EYLDYE", "B0026LIO9A", "B00473P8SK", "B001S~
#> $ review <chr> "this stuff is not stuffing its not good at all save your ~
#> $ score <fct> other, great, great, great, great, great, other, great, great,~
tibble::glimpse(testing_data)
#> Rows: 1,000
#> Columns: 3
#> $ product <chr> "B005GXFP60", "B000G7V394", "B004WJAULO", "B003D4MBOS", "B0030~
#> $ review <chr> "These are the best tasting gummy fruits I have ever eaten. Ca~
#> $ score <fct> great, great, other, other, great, other, other, great, great,~