Larsen and Clemmensen (2019) state: "This data set was published as the challenge at the Chimiometrie 2019 conference held in Montpellier and is available at the conference homepage. The data consist of 6915 training spectra and 600 test spectra measured at 550 (unknown) wavelengths. The target was the amount of soy oil (0-5.5%), ucerne (0-40%) and barley (0-52%) in a mixture."
The test set included a distribution shift due to the use of a different instrument and this competition was designed to measure how models might be made to be resistant to such a difference. However, since there are no test set outcomes, we only include the training set here.
There are 6,915 rows and 553 columns. The columns whose names start with
wvlgth_
are the spectral values with the numbers in the column names
referring to the order (as opposed to the wavenumber). Fernández
Pierna (2020) suggest that the wavelengths range from 1300 2nm to 2398 2nm.
The three outcome columns are "soy_oil"
, "lucerne"
, and "barley"
.
Arguments
- ...
Arguments passed to
pins::pin_read()
.
glimpse()
tibble::glimpse(data_chimiometrie_2019()[, 1:10])
#> Rows: 6,915
#> Columns: 10
#> $ soy_oil <dbl> 2.1, 2.1, 2.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,~
#> $ lucerne <dbl> 23.5712, 23.5712, 23.5712, 25.0000, 25.0000, 25.0000, 25.00~
#> $ barley <dbl> 0, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,~
#> $ wvlgth_001 <dbl> 0.2076995, 0.2064382, 0.2070081, 0.2057694, 0.2005429, 0.20~
#> $ wvlgth_002 <dbl> 0.2074427, 0.2062003, 0.2067785, 0.2055505, 0.2003232, 0.20~
#> $ wvlgth_003 <dbl> 0.2072212, 0.2059973, 0.2065901, 0.2053678, 0.2001469, 0.20~
#> $ wvlgth_004 <dbl> 0.2070317, 0.2058266, 0.2064396, 0.2052174, 0.2000069, 0.20~
#> $ wvlgth_005 <dbl> 0.2068830, 0.2056964, 0.2063288, 0.2051110, 0.1999092, 0.20~
#> $ wvlgth_006 <dbl> 0.2067773, 0.2056115, 0.2062618, 0.2050571, 0.1998616, 0.20~
#> $ wvlgth_007 <dbl> 0.2067083, 0.2055686, 0.2062386, 0.2050495, 0.1998592, 0.20~
References
J. Larsen and L. Clemmensen (2019) "Deep learning for Chemometric and non-translational data," arXiv.org, https://arxiv.org/abs/1910.00391.
J.A. Fernández Pierna, A. Laborde, L. Lakhal, M. Lesnoff, M. Martin, Y. Roggo, and P. Dardenne (2020) "The applicability of vibrational spectroscopy and multivariate analysis for the characterization of animal feed where the reference values do not follow a normal distribution: A new chemometric challenge posed at the 'Chimiométrie 2019' congress," Chemometrics and Intelligent Laboratory Systems, vol 202, p. 104026. doi:10.1016/j.chemolab.2020.104026
Examples
# \donttest{
data_chimiometrie_2019()
#> # A tibble: 6,915 × 553
#> soy_oil lucerne barley wvlgth_001 wvlgth_002 wvlgth_003 wvlgth_004
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2.1 23.6 0 0.208 0.207 0.207 0.207
#> 2 2.1 23.6 0 0.206 0.206 0.206 0.206
#> 3 2.1 23.6 0 0.207 0.207 0.207 0.206
#> 4 0.5 25 4 0.206 0.206 0.205 0.205
#> 5 0.5 25 4 0.201 0.200 0.200 0.200
#> 6 0.5 25 4 0.206 0.205 0.205 0.205
#> 7 0.5 25 4 0.202 0.202 0.201 0.201
#> 8 0.5 25 4 0.205 0.205 0.204 0.204
#> 9 0.5 25 4 0.205 0.205 0.204 0.204
#> 10 0.5 25 4 0.207 0.207 0.207 0.207
#> # ℹ 6,905 more rows
#> # ℹ 546 more variables: wvlgth_005 <dbl>, wvlgth_006 <dbl>,
#> # wvlgth_007 <dbl>, wvlgth_008 <dbl>, wvlgth_009 <dbl>,
#> # wvlgth_010 <dbl>, wvlgth_011 <dbl>, wvlgth_012 <dbl>,
#> # wvlgth_013 <dbl>, wvlgth_014 <dbl>, wvlgth_015 <dbl>,
#> # wvlgth_016 <dbl>, wvlgth_017 <dbl>, wvlgth_018 <dbl>,
#> # wvlgth_019 <dbl>, wvlgth_020 <dbl>, wvlgth_021 <dbl>, …
# }