Skip to content

A data set containing information on a subset of taxi trips in the city of Chicago in 2022.

Usage

data_taxi(...)

Arguments

...

Arguments passed to pins::pin_read().

Value

tibble

Details

The source data are originally described on the linked City of Chicago data portal. The data exported here are a pre-processed subset motivated by the modeling problem of predicting whether a rider will tip or not.

tip

Whether the rider left a tip. A factor with levels "yes" and "no".

distance

The trip distance, in odometer miles.

company

The taxi company, as a factor. Companies that occurred few times were binned as "other".

local

Whether the trip started in the same community area as it began. See the source data for community area values.

dow

The day of the week in which the trip began, as a factor.

month

The month in which the trip began, as a factor.

hour

The hour of the day in which the trip began, as a numeric.

Previous releases of this data (with version = "20230630T214846Z-643d0") included additional columns:

id

A unique identifier for the trip, as a factor.

duration

The trip duration, in seconds.

fare

The cost of the trip fare, in USD

tolls

The cost of tolls for the trip, in USD.

extras

The cost of extra charges for the trip, in USD.

total_cost

The total cost of the trip, in USD. This is the sum of the previous three columns plus tip.

payment_type

Type of payment for the trip. A factor with levels "Credit Card", "Dispute", "Mobile", "No Charge", "Prcard", and "Unknown".

tibble print

data_taxi()
#> # A tibble: 10,000 x 7
#>    tip   distance company                      local dow   month  hour
#>    <fct>    <dbl> <fct>                        <fct> <fct> <fct> <int>
#>  1 yes      17.2  Chicago Independents         no    Thu   Feb      16
#>  2 yes       0.88 City Service                 yes   Thu   Mar       8
#>  3 yes      18.1  other                        no    Mon   Feb      18
#>  4 yes      20.7  Chicago Independents         no    Mon   Apr       8
#>  5 yes      12.2  Chicago Independents         no    Sun   Mar      21
#>  6 yes       0.94 Sun Taxi                     yes   Sat   Apr      23
#>  7 yes      17.5  Flash Cab                    no    Fri   Mar      12
#>  8 yes      17.7  other                        no    Sun   Jan       6
#>  9 yes       1.85 Taxicab Insurance Agency Llc no    Fri   Apr      12
#> 10 yes       1.47 City Service                 no    Tue   Mar      14
#> # i 9,990 more rows

glimpse()

tibble::glimpse(data_taxi())
#> Rows: 10,000
#> Columns: 7
#> $ tip      <fct> yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, y~
#> $ distance <dbl> 17.19, 0.88, 18.11, 20.70, 12.23, 0.94, 17.47, 17.67, 1.85, 1~
#> $ company  <fct> Chicago Independents, City Service, other, Chicago Independen~
#> $ local    <fct> no, yes, no, no, no, yes, no, no, no, no, no, no, no, yes, no~
#> $ dow      <fct> Thu, Thu, Mon, Mon, Sun, Sat, Fri, Sun, Fri, Tue, Tue, Sun, W~
#> $ month    <fct> Feb, Mar, Feb, Apr, Mar, Apr, Mar, Jan, Apr, Mar, Mar, Apr, A~
#> $ hour     <int> 16, 8, 18, 8, 21, 23, 12, 6, 12, 14, 18, 11, 12, 19, 17, 13, ~

Examples

# \donttest{
data_taxi()
#> # A tibble: 10,000 × 7
#>    tip   distance company                      local dow   month  hour
#>    <fct>    <dbl> <fct>                        <fct> <fct> <fct> <int>
#>  1 yes      17.2  Chicago Independents         no    Thu   Feb      16
#>  2 yes       0.88 City Service                 yes   Thu   Mar       8
#>  3 yes      18.1  other                        no    Mon   Feb      18
#>  4 yes      20.7  Chicago Independents         no    Mon   Apr       8
#>  5 yes      12.2  Chicago Independents         no    Sun   Mar      21
#>  6 yes       0.94 Sun Taxi                     yes   Sat   Apr      23
#>  7 yes      17.5  Flash Cab                    no    Fri   Mar      12
#>  8 yes      17.7  other                        no    Sun   Jan       6
#>  9 yes       1.85 Taxicab Insurance Agency Llc no    Fri   Apr      12
#> 10 yes       1.47 City Service                 no    Tue   Mar      14
#> # ℹ 9,990 more rows
# }