This assessment involves applying the concepts you have learned so far. It is open book, but the datasets are different so keep that in mind. It is expected that you will submit working code at the end in a “knit” document in html format.
The assessment has 2 questions, one will focus on regression and the other on classification.
For both questions, first use filtering functions and some plots to give a feeling of what the data is like. Then remove features (columns) that are irrelevant if they are not needed.
Download the “avocado” dataset and perform a multiple linear regression to predict the price of the avocado given the other features. Do a proper cross validation. The code should appear in the code block below with comments specifying what you did. Also, at the end there should be some sort of conclusion.
Make sure you explain in the conclusion which features are good and which ones are not good (based on hypothesis testing and results in summary of the model).
filepath <- "./Datasets/avocado.csv"
avocados <- read.csv(filepath, header=TRUE)
myModel <- lm(AveragePrice ~ ., data=avocados)
summary(myModel)
##
## Call:
## lm(formula = AveragePrice ~ ., data = avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9836 -0.1208 0.0033 0.1272 1.4155
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.811e+00 1.074e+00 3.548 0.000389 ***
## X -5.094e-02 2.105e-02 -2.420 0.015542 *
## Date2015-01-11 1.733e-02 3.723e-02 0.465 0.641640
## Date2015-01-18 -1.280e-02 5.211e-02 -0.246 0.806025
## Date2015-01-25 -5.786e-02 7.023e-02 -0.824 0.410039
## Date2015-02-01 -2.570e-01 8.964e-02 -2.867 0.004143 **
## Date2015-02-08 -2.450e-01 1.096e-01 -2.235 0.025435 *
## Date2015-02-15 -2.209e-01 1.300e-01 -1.699 0.089251 .
## Date2015-02-22 -2.967e-01 1.505e-01 -1.971 0.048743 *
## Date2015-03-01 -3.994e-01 1.712e-01 -2.333 0.019657 *
## Date2015-03-08 -4.008e-01 1.919e-01 -2.088 0.036781 *
## Date2015-03-15 -4.200e-01 2.127e-01 -1.974 0.048341 *
## Date2015-03-22 -5.142e-01 2.336e-01 -2.201 0.027717 *
## Date2015-03-29 -5.157e-01 2.545e-01 -2.027 0.042701 *
## Date2015-04-05 -5.513e-01 2.754e-01 -2.002 0.045294 *
## Date2015-04-12 -6.504e-01 2.963e-01 -2.195 0.028173 *
## Date2015-04-19 -6.949e-01 3.172e-01 -2.190 0.028514 *
## Date2015-04-26 -7.325e-01 3.382e-01 -2.166 0.030344 *
## Date2015-05-03 -8.820e-01 3.592e-01 -2.456 0.014077 *
## Date2015-05-10 -8.937e-01 3.802e-01 -2.351 0.018736 *
## Date2015-05-17 -9.177e-01 4.011e-01 -2.288 0.022158 *
## Date2015-05-24 -9.396e-01 4.221e-01 -2.226 0.026030 *
## Date2015-05-31 -9.892e-01 4.431e-01 -2.232 0.025613 *
## Date2015-06-07 -1.042e+00 4.641e-01 -2.246 0.024729 *
## Date2015-06-14 -1.073e+00 4.851e-01 -2.212 0.026964 *
## Date2015-06-21 -1.118e+00 5.061e-01 -2.209 0.027202 *
## Date2015-06-28 -1.170e+00 5.271e-01 -2.220 0.026435 *
## Date2015-07-05 -1.214e+00 5.482e-01 -2.214 0.026833 *
## Date2015-07-12 -1.262e+00 5.692e-01 -2.218 0.026565 *
## Date2015-07-19 -1.342e+00 5.902e-01 -2.275 0.022945 *
## Date2015-07-26 -1.359e+00 6.112e-01 -2.224 0.026173 *
## Date2015-08-02 -1.354e+00 6.322e-01 -2.141 0.032274 *
## Date2015-08-09 -1.450e+00 6.533e-01 -2.219 0.026496 *
## Date2015-08-16 -1.484e+00 6.743e-01 -2.201 0.027742 *
## Date2015-08-23 -1.539e+00 6.953e-01 -2.214 0.026864 *
## Date2015-08-30 -1.616e+00 7.164e-01 -2.256 0.024054 *
## Date2015-09-06 -1.651e+00 7.374e-01 -2.239 0.025164 *
## Date2015-09-13 -1.674e+00 7.584e-01 -2.207 0.027305 *
## Date2015-09-20 -1.741e+00 7.794e-01 -2.233 0.025538 *
## Date2015-09-27 -1.789e+00 8.005e-01 -2.235 0.025461 *
## Date2015-10-04 -1.872e+00 8.215e-01 -2.279 0.022707 *
## Date2015-10-11 -1.971e+00 8.425e-01 -2.339 0.019332 *
## Date2015-10-18 -2.000e+00 8.636e-01 -2.316 0.020575 *
## Date2015-10-25 -2.062e+00 8.846e-01 -2.330 0.019795 *
## Date2015-11-01 -2.190e+00 9.057e-01 -2.418 0.015605 *
## Date2015-11-08 -2.212e+00 9.267e-01 -2.387 0.017017 *
## Date2015-11-15 -2.255e+00 9.477e-01 -2.380 0.017346 *
## Date2015-11-22 -2.316e+00 9.688e-01 -2.390 0.016840 *
## Date2015-11-29 -2.355e+00 9.898e-01 -2.379 0.017349 *
## Date2015-12-06 -2.459e+00 1.011e+00 -2.432 0.015005 *
## Date2015-12-13 -2.512e+00 1.032e+00 -2.434 0.014925 *
## Date2015-12-20 -2.510e+00 1.053e+00 -2.384 0.017116 *
## Date2015-12-27 -2.607e+00 1.074e+00 -2.428 0.015198 *
## Date2016-01-03 -9.953e-02 3.072e-02 -3.240 0.001198 **
## Date2016-01-10 -1.268e-01 3.713e-02 -3.416 0.000637 ***
## Date2016-01-17 -1.352e-01 5.195e-02 -2.602 0.009273 **
## Date2016-01-24 -2.182e-01 7.003e-02 -3.116 0.001837 **
## Date2016-01-31 -2.593e-01 8.943e-02 -2.899 0.003750 **
## Date2016-02-07 -3.894e-01 1.094e-01 -3.558 0.000375 ***
## Date2016-02-14 -3.798e-01 1.298e-01 -2.926 0.003438 **
## Date2016-02-21 -3.851e-01 1.503e-01 -2.562 0.010409 *
## Date2016-02-28 -4.470e-01 1.710e-01 -2.614 0.008945 **
## Date2016-03-06 -5.090e-01 1.917e-01 -2.655 0.007942 **
## Date2016-03-13 -6.183e-01 2.125e-01 -2.909 0.003627 **
## Date2016-03-20 -6.689e-01 2.334e-01 -2.866 0.004159 **
## Date2016-03-27 -6.736e-01 2.543e-01 -2.649 0.008072 **
## Date2016-04-03 -7.335e-01 2.752e-01 -2.665 0.007695 **
## Date2016-04-10 -8.455e-01 2.961e-01 -2.855 0.004303 **
## Date2016-04-17 -8.510e-01 3.170e-01 -2.684 0.007277 **
## Date2016-04-24 -9.300e-01 3.380e-01 -2.752 0.005937 **
## Date2016-05-01 -1.005e+00 3.590e-01 -2.799 0.005135 **
## Date2016-05-08 -1.074e+00 3.799e-01 -2.826 0.004716 **
## Date2016-05-15 -1.059e+00 4.009e-01 -2.640 0.008289 **
## Date2016-05-22 -1.103e+00 4.219e-01 -2.614 0.008952 **
## Date2016-05-29 -1.131e+00 4.429e-01 -2.553 0.010692 *
## Date2016-06-05 -1.189e+00 4.639e-01 -2.563 0.010397 *
## Date2016-06-12 -1.182e+00 4.849e-01 -2.438 0.014792 *
## Date2016-06-19 -1.255e+00 5.059e-01 -2.482 0.013091 *
## Date2016-06-26 -1.275e+00 5.269e-01 -2.419 0.015579 *
## Date2016-07-03 -1.346e+00 5.479e-01 -2.456 0.014062 *
## Date2016-07-10 -1.335e+00 5.690e-01 -2.346 0.018976 *
## Date2016-07-17 -1.309e+00 5.900e-01 -2.219 0.026505 *
## Date2016-07-24 -1.306e+00 6.110e-01 -2.137 0.032617 *
## Date2016-07-31 -1.387e+00 6.320e-01 -2.194 0.028251 *
## Date2016-08-07 -1.469e+00 6.531e-01 -2.249 0.024535 *
## Date2016-08-14 -1.515e+00 6.741e-01 -2.248 0.024608 *
## Date2016-08-21 -1.570e+00 6.951e-01 -2.259 0.023909 *
## Date2016-08-28 -1.642e+00 7.162e-01 -2.293 0.021862 *
## Date2016-09-04 -1.712e+00 7.372e-01 -2.323 0.020204 *
## Date2016-09-11 -1.778e+00 7.582e-01 -2.345 0.019013 *
## Date2016-09-18 -1.739e+00 7.793e-01 -2.232 0.025652 *
## Date2016-09-25 -1.704e+00 8.003e-01 -2.129 0.033251 *
## Date2016-10-02 -1.714e+00 8.213e-01 -2.087 0.036940 *
## Date2016-10-09 -1.857e+00 8.424e-01 -2.204 0.027502 *
## Date2016-10-16 -1.907e+00 8.634e-01 -2.209 0.027219 *
## Date2016-10-23 -1.903e+00 8.844e-01 -2.151 0.031480 *
## Date2016-10-30 -1.792e+00 9.055e-01 -1.979 0.047825 *
## Date2016-11-06 -1.925e+00 9.265e-01 -2.078 0.037735 *
## Date2016-11-13 -2.011e+00 9.476e-01 -2.122 0.033857 *
## Date2016-11-20 -2.146e+00 9.686e-01 -2.216 0.026727 *
## Date2016-11-27 -2.200e+00 9.896e-01 -2.223 0.026196 *
## Date2016-12-04 -2.377e+00 1.011e+00 -2.352 0.018694 *
## Date2016-12-11 -2.490e+00 1.032e+00 -2.414 0.015792 *
## Date2016-12-18 -2.572e+00 1.053e+00 -2.443 0.014569 *
## Date2016-12-25 -2.591e+00 1.074e+00 -2.413 0.015842 *
## Date2017-01-01 2.647e-02 3.714e-02 0.713 0.476112
## Date2017-01-08 -7.687e-03 3.073e-02 -0.250 0.802493
## Date2017-01-15 -3.456e-02 3.735e-02 -0.925 0.354805
## Date2017-01-22 -1.811e-01 5.227e-02 -3.465 0.000531 ***
## Date2017-01-29 -1.826e-01 7.040e-02 -2.594 0.009487 **
## Date2017-02-05 -3.615e-01 8.981e-02 -4.026 5.71e-05 ***
## Date2017-02-12 -3.549e-01 1.098e-01 -3.231 0.001235 **
## Date2017-02-19 -3.419e-01 1.302e-01 -2.627 0.008627 **
## Date2017-02-26 -4.215e-01 1.507e-01 -2.797 0.005164 **
## Date2017-03-05 -4.251e-01 1.714e-01 -2.480 0.013129 *
## Date2017-03-12 -3.285e-01 1.921e-01 -1.710 0.087302 .
## Date2017-03-19 -3.471e-01 2.129e-01 -1.630 0.103043
## Date2017-03-26 -4.626e-01 2.338e-01 -1.979 0.047842 *
## Date2017-04-02 -4.685e-01 2.546e-01 -1.840 0.065827 .
## Date2017-04-09 -5.019e-01 2.755e-01 -1.821 0.068548 .
## Date2017-04-16 -5.145e-01 2.965e-01 -1.736 0.082668 .
## Date2017-04-23 -5.250e-01 3.174e-01 -1.654 0.098170 .
## Date2017-04-30 -5.692e-01 3.384e-01 -1.682 0.092565 .
## Date2017-05-07 -7.087e-01 3.594e-01 -1.972 0.048612 *
## Date2017-05-14 -6.952e-01 3.803e-01 -1.828 0.067596 .
## Date2017-05-21 -7.128e-01 4.013e-01 -1.776 0.075724 .
## Date2017-05-28 -7.456e-01 4.223e-01 -1.766 0.077483 .
## Date2017-06-04 -8.113e-01 4.433e-01 -1.830 0.067240 .
## Date2017-06-11 -9.062e-01 4.643e-01 -1.952 0.050979 .
## Date2017-06-18 -9.397e-01 4.850e-01 -1.938 0.052686 .
## Date2017-06-25 -9.705e-01 5.060e-01 -1.918 0.055126 .
## Date2017-07-02 -1.013e+00 5.269e-01 -1.922 0.054639 .
## Date2017-07-09 -1.093e+00 5.480e-01 -1.995 0.046086 *
## Date2017-07-16 -1.098e+00 5.690e-01 -1.929 0.053705 .
## Date2017-07-23 -1.168e+00 5.900e-01 -1.980 0.047707 *
## Date2017-07-30 -1.211e+00 6.110e-01 -1.982 0.047459 *
## Date2017-08-06 -1.229e+00 6.320e-01 -1.944 0.051912 .
## Date2017-08-13 -1.243e+00 6.531e-01 -1.903 0.057034 .
## Date2017-08-20 -1.199e+00 6.741e-01 -1.778 0.075360 .
## Date2017-08-27 -1.176e+00 6.951e-01 -1.691 0.090816 .
## Date2017-09-03 -1.182e+00 7.162e-01 -1.651 0.098748 .
## Date2017-09-10 -1.244e+00 7.372e-01 -1.687 0.091553 .
## Date2017-09-17 -1.305e+00 7.582e-01 -1.721 0.085223 .
## Date2017-09-24 -1.354e+00 7.793e-01 -1.737 0.082388 .
## Date2017-10-01 -1.367e+00 8.003e-01 -1.708 0.087733 .
## Date2017-10-08 -1.446e+00 8.213e-01 -1.760 0.078341 .
## Date2017-10-15 -1.557e+00 8.424e-01 -1.848 0.064581 .
## Date2017-10-22 -1.726e+00 8.634e-01 -1.999 0.045605 *
## Date2017-10-29 -1.840e+00 8.844e-01 -2.081 0.037469 *
## Date2017-11-05 -1.932e+00 9.055e-01 -2.134 0.032855 *
## Date2017-11-12 -2.032e+00 9.265e-01 -2.194 0.028274 *
## Date2017-11-19 -2.095e+00 9.476e-01 -2.211 0.027026 *
## Date2017-11-26 -2.143e+00 9.686e-01 -2.212 0.026972 *
## Date2017-12-03 -2.311e+00 9.896e-01 -2.335 0.019566 *
## Date2017-12-10 -2.404e+00 1.011e+00 -2.379 0.017391 *
## Date2017-12-17 -2.430e+00 1.032e+00 -2.355 0.018511 *
## Date2017-12-24 -2.411e+00 1.053e+00 -2.290 0.022038 *
## Date2017-12-31 -2.616e+00 1.074e+00 -2.436 0.014860 *
## Date2018-01-07 -1.983e+00 8.424e-01 -2.354 0.018601 *
## Date2018-01-14 -1.968e+00 8.634e-01 -2.279 0.022678 *
## Date2018-01-21 -2.066e+00 8.844e-01 -2.336 0.019528 *
## Date2018-01-28 -2.100e+00 9.055e-01 -2.319 0.020400 *
## Date2018-02-04 -2.307e+00 9.265e-01 -2.490 0.012781 *
## Date2018-02-11 -2.288e+00 9.476e-01 -2.415 0.015740 *
## Date2018-02-18 -2.273e+00 9.686e-01 -2.347 0.018951 *
## Date2018-02-25 -2.340e+00 9.896e-01 -2.364 0.018082 *
## Date2018-03-04 -2.399e+00 1.011e+00 -2.373 0.017634 *
## Date2018-03-11 -2.467e+00 1.032e+00 -2.391 0.016819 *
## Date2018-03-18 -2.538e+00 1.053e+00 -2.411 0.015919 *
## Date2018-03-25 -2.557e+00 1.074e+00 -2.381 0.017283 *
## Total.Volume -4.448e-05 3.524e-05 -1.262 0.206885
## X4046 4.447e-05 3.524e-05 1.262 0.207010
## X4225 4.446e-05 3.524e-05 1.262 0.207046
## X4770 4.467e-05 3.524e-05 1.268 0.204947
## Total.Bags -2.243e-02 2.622e-02 -0.856 0.392153
## Small.Bags 2.248e-02 2.622e-02 0.857 0.391218
## Large.Bags 2.248e-02 2.622e-02 0.857 0.391218
## XLarge.Bags 2.248e-02 2.622e-02 0.857 0.391201
## typeorganic 4.940e-01 3.542e-03 139.467 < 2e-16 ***
## year NA NA NA NA
## regionAtlanta -2.215e-01 1.737e-02 -12.747 < 2e-16 ***
## regionBaltimoreWashington -2.665e-02 1.739e-02 -1.532 0.125462
## regionBoise -2.136e-01 1.736e-02 -12.300 < 2e-16 ***
## regionBoston -2.859e-02 1.738e-02 -1.644 0.100126
## regionBuffaloRochester -4.458e-02 1.736e-02 -2.568 0.010239 *
## regionCalifornia -1.717e-01 1.776e-02 -9.664 < 2e-16 ***
## regionCharlotte 4.284e-02 1.737e-02 2.467 0.013634 *
## regionChicago -1.260e-02 1.745e-02 -0.722 0.470530
## regionCincinnatiDayton -3.513e-01 1.738e-02 -20.219 < 2e-16 ***
## regionColumbus -3.095e-01 1.736e-02 -17.828 < 2e-16 ***
## regionDallasFtWorth -4.738e-01 1.740e-02 -27.232 < 2e-16 ***
## regionDenver -3.384e-01 1.747e-02 -19.373 < 2e-16 ***
## regionDetroit -2.928e-01 1.740e-02 -16.832 < 2e-16 ***
## regionGrandRapids -5.949e-02 1.736e-02 -3.426 0.000613 ***
## regionGreatLakes -2.476e-01 1.805e-02 -13.713 < 2e-16 ***
## regionHarrisburgScranton -4.779e-02 1.736e-02 -2.753 0.005910 **
## regionHartfordSpringfield 2.586e-01 1.737e-02 14.892 < 2e-16 ***
## regionHouston -5.110e-01 1.739e-02 -29.387 < 2e-16 ***
## regionIndianapolis -2.476e-01 1.736e-02 -14.261 < 2e-16 ***
## regionJacksonville -4.966e-02 1.736e-02 -2.860 0.004238 **
## regionLasVegas -1.792e-01 1.736e-02 -10.321 < 2e-16 ***
## regionLosAngeles -3.541e-01 1.764e-02 -20.069 < 2e-16 ***
## regionLouisville -2.745e-01 1.736e-02 -15.814 < 2e-16 ***
## regionMiamiFtLauderdale -1.301e-01 1.738e-02 -7.486 7.40e-14 ***
## regionMidsouth -1.587e-01 1.754e-02 -9.047 < 2e-16 ***
## regionNashville -3.493e-01 1.736e-02 -20.116 < 2e-16 ***
## regionNewOrleansMobile -2.571e-01 1.737e-02 -14.806 < 2e-16 ***
## regionNewYork 1.711e-01 1.752e-02 9.765 < 2e-16 ***
## regionNortheast 5.327e-02 1.879e-02 2.835 0.004589 **
## regionNorthernNewEngland -8.248e-02 1.737e-02 -4.748 2.07e-06 ***
## regionOrlando -5.377e-02 1.737e-02 -3.096 0.001964 **
## regionPhiladelphia 7.187e-02 1.737e-02 4.138 3.52e-05 ***
## regionPhoenixTucson -3.319e-01 1.741e-02 -19.061 < 2e-16 ***
## regionPittsburgh -1.969e-01 1.736e-02 -11.345 < 2e-16 ***
## regionPlains -1.212e-01 1.741e-02 -6.959 3.54e-12 ***
## regionPortland -2.436e-01 1.738e-02 -14.018 < 2e-16 ***
## regionRaleighGreensboro -8.023e-03 1.737e-02 -0.462 0.644139
## regionRichmondNorfolk -2.701e-01 1.736e-02 -15.557 < 2e-16 ***
## regionRoanoke -3.132e-01 1.736e-02 -18.039 < 2e-16 ***
## regionSacramento 6.138e-02 1.736e-02 3.535 0.000408 ***
## regionSanDiego -1.631e-01 1.736e-02 -9.396 < 2e-16 ***
## regionSanFrancisco 2.450e-01 1.737e-02 14.098 < 2e-16 ***
## regionSeattle -1.178e-01 1.738e-02 -6.780 1.24e-11 ***
## regionSouthCarolina -1.580e-01 1.736e-02 -9.098 < 2e-16 ***
## regionSouthCentral -4.513e-01 1.799e-02 -25.094 < 2e-16 ***
## regionSoutheast -1.517e-01 1.785e-02 -8.495 < 2e-16 ***
## regionSpokane -1.157e-01 1.736e-02 -6.666 2.70e-11 ***
## regionStLouis -1.308e-01 1.736e-02 -7.536 5.06e-14 ***
## regionSyracuse -4.104e-02 1.736e-02 -2.364 0.018082 *
## regionTampa -1.509e-01 1.737e-02 -8.688 < 2e-16 ***
## regionTotalUS -2.173e-01 2.167e-02 -10.028 < 2e-16 ***
## regionWest -2.692e-01 1.811e-02 -14.864 < 2e-16 ***
## regionWestTexNewMexico -3.092e-01 1.847e-02 -16.745 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2257 on 18017 degrees of freedom
## Multiple R-squared: 0.6899, Adjusted R-squared: 0.686
## F-statistic: 173.5 on 231 and 18017 DF, p-value: < 2.2e-16
Build a classification model that can predict which rating given the rest of the features in a recipe dataset. Again provide some sort of conclusion.
filepath <- "./Datasets/epi_r.csv"
recipes <- read.csv(filepath, header=TRUE)
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
subset <- recipes[1:1000,]
myModel <- randomForest(as.factor(rating) ~ calories + protein + fat + sodium + advance.prep.required, data=recipes, na.action=na.exclude)
glm(as.factor(rating) ~ .)