A data set containing information on a subset of taxi trips in the city of Chicago in 2022.
Arguments
- ...
Arguments passed to
pins::pin_read()
.
Details
The source data are originally described on the linked City of Chicago data portal. The data exported here are a pre-processed subset motivated by the modeling problem of predicting whether a rider will tip or not.
- tip
Whether the rider left a tip. A factor with levels "yes" and "no".
- distance
The trip distance, in odometer miles.
- company
The taxi company, as a factor. Companies that occurred few times were binned as "other".
- local
Whether the trip started in the same community area as it began. See the source data for community area values.
- dow
The day of the week in which the trip began, as a factor.
- month
The month in which the trip began, as a factor.
- hour
The hour of the day in which the trip began, as a numeric.
Previous releases of this data (with version = "20230630T214846Z-643d0"
)
included additional columns:
- id
A unique identifier for the trip, as a factor.
- duration
The trip duration, in seconds.
- fare
The cost of the trip fare, in USD
- tolls
The cost of tolls for the trip, in USD.
- extras
The cost of extra charges for the trip, in USD.
- total_cost
The total cost of the trip, in USD. This is the sum of the previous three columns plus tip.
- payment_type
Type of payment for the trip. A factor with levels "Credit Card", "Dispute", "Mobile", "No Charge", "Prcard", and "Unknown".
tibble print
data_taxi()
#> # A tibble: 10,000 x 7
#> tip distance company local dow month hour
#> <fct> <dbl> <fct> <fct> <fct> <fct> <int>
#> 1 yes 17.2 Chicago Independents no Thu Feb 16
#> 2 yes 0.88 City Service yes Thu Mar 8
#> 3 yes 18.1 other no Mon Feb 18
#> 4 yes 20.7 Chicago Independents no Mon Apr 8
#> 5 yes 12.2 Chicago Independents no Sun Mar 21
#> 6 yes 0.94 Sun Taxi yes Sat Apr 23
#> 7 yes 17.5 Flash Cab no Fri Mar 12
#> 8 yes 17.7 other no Sun Jan 6
#> 9 yes 1.85 Taxicab Insurance Agency Llc no Fri Apr 12
#> 10 yes 1.47 City Service no Tue Mar 14
#> # i 9,990 more rows
glimpse()
tibble::glimpse(data_taxi())
#> Rows: 10,000
#> Columns: 7
#> $ tip <fct> yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, y~
#> $ distance <dbl> 17.19, 0.88, 18.11, 20.70, 12.23, 0.94, 17.47, 17.67, 1.85, 1~
#> $ company <fct> Chicago Independents, City Service, other, Chicago Independen~
#> $ local <fct> no, yes, no, no, no, yes, no, no, no, no, no, no, no, yes, no~
#> $ dow <fct> Thu, Thu, Mon, Mon, Sun, Sat, Fri, Sun, Fri, Tue, Tue, Sun, W~
#> $ month <fct> Feb, Mar, Feb, Apr, Mar, Apr, Mar, Jan, Apr, Mar, Mar, Apr, A~
#> $ hour <int> 16, 8, 18, 8, 21, 23, 12, 6, 12, 14, 18, 11, 12, 19, 17, 13, ~
Examples
# \donttest{
data_taxi()
#> # A tibble: 10,000 × 7
#> tip distance company local dow month hour
#> <fct> <dbl> <fct> <fct> <fct> <fct> <int>
#> 1 yes 17.2 Chicago Independents no Thu Feb 16
#> 2 yes 0.88 City Service yes Thu Mar 8
#> 3 yes 18.1 other no Mon Feb 18
#> 4 yes 20.7 Chicago Independents no Mon Apr 8
#> 5 yes 12.2 Chicago Independents no Sun Mar 21
#> 6 yes 0.94 Sun Taxi yes Sat Apr 23
#> 7 yes 17.5 Flash Cab no Fri Mar 12
#> 8 yes 17.7 other no Sun Jan 6
#> 9 yes 1.85 Taxicab Insurance Agency Llc no Fri Apr 12
#> 10 yes 1.47 City Service no Tue Mar 14
#> # ℹ 9,990 more rows
# }