Final Assessment

This assessment involves applying the concepts you have learned so far. It is open book, but the datasets are different so keep that in mind. It is expected that you will submit working code at the end in a “knit” document in html format.

The assessment has 2 questions, one will focus on regression and the other on classification.

For both questions, first use filtering functions and some plots to give a feeling of what the data is like. Then remove features (columns) that are irrelevant if they are not needed.

First task: Regression

Download the “avocado” dataset and perform a multiple linear regression to predict the price of the avocado given the other features. Do a proper cross validation. The code should appear in the code block below with comments specifying what you did. Also, at the end there should be some sort of conclusion.

Make sure you explain in the conclusion which features are good and which ones are not good (based on hypothesis testing and results in summary of the model).

filepath <- "./Datasets/avocado.csv"
avocados <- read.csv(filepath, header=TRUE)

myModel <- lm(AveragePrice ~ ., data=avocados)
summary(myModel)
## 
## Call:
## lm(formula = AveragePrice ~ ., data = avocados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9836 -0.1208  0.0033  0.1272  1.4155 
## 
## Coefficients: (1 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                3.811e+00  1.074e+00   3.548 0.000389 ***
## X                         -5.094e-02  2.105e-02  -2.420 0.015542 *  
## Date2015-01-11             1.733e-02  3.723e-02   0.465 0.641640    
## Date2015-01-18            -1.280e-02  5.211e-02  -0.246 0.806025    
## Date2015-01-25            -5.786e-02  7.023e-02  -0.824 0.410039    
## Date2015-02-01            -2.570e-01  8.964e-02  -2.867 0.004143 ** 
## Date2015-02-08            -2.450e-01  1.096e-01  -2.235 0.025435 *  
## Date2015-02-15            -2.209e-01  1.300e-01  -1.699 0.089251 .  
## Date2015-02-22            -2.967e-01  1.505e-01  -1.971 0.048743 *  
## Date2015-03-01            -3.994e-01  1.712e-01  -2.333 0.019657 *  
## Date2015-03-08            -4.008e-01  1.919e-01  -2.088 0.036781 *  
## Date2015-03-15            -4.200e-01  2.127e-01  -1.974 0.048341 *  
## Date2015-03-22            -5.142e-01  2.336e-01  -2.201 0.027717 *  
## Date2015-03-29            -5.157e-01  2.545e-01  -2.027 0.042701 *  
## Date2015-04-05            -5.513e-01  2.754e-01  -2.002 0.045294 *  
## Date2015-04-12            -6.504e-01  2.963e-01  -2.195 0.028173 *  
## Date2015-04-19            -6.949e-01  3.172e-01  -2.190 0.028514 *  
## Date2015-04-26            -7.325e-01  3.382e-01  -2.166 0.030344 *  
## Date2015-05-03            -8.820e-01  3.592e-01  -2.456 0.014077 *  
## Date2015-05-10            -8.937e-01  3.802e-01  -2.351 0.018736 *  
## Date2015-05-17            -9.177e-01  4.011e-01  -2.288 0.022158 *  
## Date2015-05-24            -9.396e-01  4.221e-01  -2.226 0.026030 *  
## Date2015-05-31            -9.892e-01  4.431e-01  -2.232 0.025613 *  
## Date2015-06-07            -1.042e+00  4.641e-01  -2.246 0.024729 *  
## Date2015-06-14            -1.073e+00  4.851e-01  -2.212 0.026964 *  
## Date2015-06-21            -1.118e+00  5.061e-01  -2.209 0.027202 *  
## Date2015-06-28            -1.170e+00  5.271e-01  -2.220 0.026435 *  
## Date2015-07-05            -1.214e+00  5.482e-01  -2.214 0.026833 *  
## Date2015-07-12            -1.262e+00  5.692e-01  -2.218 0.026565 *  
## Date2015-07-19            -1.342e+00  5.902e-01  -2.275 0.022945 *  
## Date2015-07-26            -1.359e+00  6.112e-01  -2.224 0.026173 *  
## Date2015-08-02            -1.354e+00  6.322e-01  -2.141 0.032274 *  
## Date2015-08-09            -1.450e+00  6.533e-01  -2.219 0.026496 *  
## Date2015-08-16            -1.484e+00  6.743e-01  -2.201 0.027742 *  
## Date2015-08-23            -1.539e+00  6.953e-01  -2.214 0.026864 *  
## Date2015-08-30            -1.616e+00  7.164e-01  -2.256 0.024054 *  
## Date2015-09-06            -1.651e+00  7.374e-01  -2.239 0.025164 *  
## Date2015-09-13            -1.674e+00  7.584e-01  -2.207 0.027305 *  
## Date2015-09-20            -1.741e+00  7.794e-01  -2.233 0.025538 *  
## Date2015-09-27            -1.789e+00  8.005e-01  -2.235 0.025461 *  
## Date2015-10-04            -1.872e+00  8.215e-01  -2.279 0.022707 *  
## Date2015-10-11            -1.971e+00  8.425e-01  -2.339 0.019332 *  
## Date2015-10-18            -2.000e+00  8.636e-01  -2.316 0.020575 *  
## Date2015-10-25            -2.062e+00  8.846e-01  -2.330 0.019795 *  
## Date2015-11-01            -2.190e+00  9.057e-01  -2.418 0.015605 *  
## Date2015-11-08            -2.212e+00  9.267e-01  -2.387 0.017017 *  
## Date2015-11-15            -2.255e+00  9.477e-01  -2.380 0.017346 *  
## Date2015-11-22            -2.316e+00  9.688e-01  -2.390 0.016840 *  
## Date2015-11-29            -2.355e+00  9.898e-01  -2.379 0.017349 *  
## Date2015-12-06            -2.459e+00  1.011e+00  -2.432 0.015005 *  
## Date2015-12-13            -2.512e+00  1.032e+00  -2.434 0.014925 *  
## Date2015-12-20            -2.510e+00  1.053e+00  -2.384 0.017116 *  
## Date2015-12-27            -2.607e+00  1.074e+00  -2.428 0.015198 *  
## Date2016-01-03            -9.953e-02  3.072e-02  -3.240 0.001198 ** 
## Date2016-01-10            -1.268e-01  3.713e-02  -3.416 0.000637 ***
## Date2016-01-17            -1.352e-01  5.195e-02  -2.602 0.009273 ** 
## Date2016-01-24            -2.182e-01  7.003e-02  -3.116 0.001837 ** 
## Date2016-01-31            -2.593e-01  8.943e-02  -2.899 0.003750 ** 
## Date2016-02-07            -3.894e-01  1.094e-01  -3.558 0.000375 ***
## Date2016-02-14            -3.798e-01  1.298e-01  -2.926 0.003438 ** 
## Date2016-02-21            -3.851e-01  1.503e-01  -2.562 0.010409 *  
## Date2016-02-28            -4.470e-01  1.710e-01  -2.614 0.008945 ** 
## Date2016-03-06            -5.090e-01  1.917e-01  -2.655 0.007942 ** 
## Date2016-03-13            -6.183e-01  2.125e-01  -2.909 0.003627 ** 
## Date2016-03-20            -6.689e-01  2.334e-01  -2.866 0.004159 ** 
## Date2016-03-27            -6.736e-01  2.543e-01  -2.649 0.008072 ** 
## Date2016-04-03            -7.335e-01  2.752e-01  -2.665 0.007695 ** 
## Date2016-04-10            -8.455e-01  2.961e-01  -2.855 0.004303 ** 
## Date2016-04-17            -8.510e-01  3.170e-01  -2.684 0.007277 ** 
## Date2016-04-24            -9.300e-01  3.380e-01  -2.752 0.005937 ** 
## Date2016-05-01            -1.005e+00  3.590e-01  -2.799 0.005135 ** 
## Date2016-05-08            -1.074e+00  3.799e-01  -2.826 0.004716 ** 
## Date2016-05-15            -1.059e+00  4.009e-01  -2.640 0.008289 ** 
## Date2016-05-22            -1.103e+00  4.219e-01  -2.614 0.008952 ** 
## Date2016-05-29            -1.131e+00  4.429e-01  -2.553 0.010692 *  
## Date2016-06-05            -1.189e+00  4.639e-01  -2.563 0.010397 *  
## Date2016-06-12            -1.182e+00  4.849e-01  -2.438 0.014792 *  
## Date2016-06-19            -1.255e+00  5.059e-01  -2.482 0.013091 *  
## Date2016-06-26            -1.275e+00  5.269e-01  -2.419 0.015579 *  
## Date2016-07-03            -1.346e+00  5.479e-01  -2.456 0.014062 *  
## Date2016-07-10            -1.335e+00  5.690e-01  -2.346 0.018976 *  
## Date2016-07-17            -1.309e+00  5.900e-01  -2.219 0.026505 *  
## Date2016-07-24            -1.306e+00  6.110e-01  -2.137 0.032617 *  
## Date2016-07-31            -1.387e+00  6.320e-01  -2.194 0.028251 *  
## Date2016-08-07            -1.469e+00  6.531e-01  -2.249 0.024535 *  
## Date2016-08-14            -1.515e+00  6.741e-01  -2.248 0.024608 *  
## Date2016-08-21            -1.570e+00  6.951e-01  -2.259 0.023909 *  
## Date2016-08-28            -1.642e+00  7.162e-01  -2.293 0.021862 *  
## Date2016-09-04            -1.712e+00  7.372e-01  -2.323 0.020204 *  
## Date2016-09-11            -1.778e+00  7.582e-01  -2.345 0.019013 *  
## Date2016-09-18            -1.739e+00  7.793e-01  -2.232 0.025652 *  
## Date2016-09-25            -1.704e+00  8.003e-01  -2.129 0.033251 *  
## Date2016-10-02            -1.714e+00  8.213e-01  -2.087 0.036940 *  
## Date2016-10-09            -1.857e+00  8.424e-01  -2.204 0.027502 *  
## Date2016-10-16            -1.907e+00  8.634e-01  -2.209 0.027219 *  
## Date2016-10-23            -1.903e+00  8.844e-01  -2.151 0.031480 *  
## Date2016-10-30            -1.792e+00  9.055e-01  -1.979 0.047825 *  
## Date2016-11-06            -1.925e+00  9.265e-01  -2.078 0.037735 *  
## Date2016-11-13            -2.011e+00  9.476e-01  -2.122 0.033857 *  
## Date2016-11-20            -2.146e+00  9.686e-01  -2.216 0.026727 *  
## Date2016-11-27            -2.200e+00  9.896e-01  -2.223 0.026196 *  
## Date2016-12-04            -2.377e+00  1.011e+00  -2.352 0.018694 *  
## Date2016-12-11            -2.490e+00  1.032e+00  -2.414 0.015792 *  
## Date2016-12-18            -2.572e+00  1.053e+00  -2.443 0.014569 *  
## Date2016-12-25            -2.591e+00  1.074e+00  -2.413 0.015842 *  
## Date2017-01-01             2.647e-02  3.714e-02   0.713 0.476112    
## Date2017-01-08            -7.687e-03  3.073e-02  -0.250 0.802493    
## Date2017-01-15            -3.456e-02  3.735e-02  -0.925 0.354805    
## Date2017-01-22            -1.811e-01  5.227e-02  -3.465 0.000531 ***
## Date2017-01-29            -1.826e-01  7.040e-02  -2.594 0.009487 ** 
## Date2017-02-05            -3.615e-01  8.981e-02  -4.026 5.71e-05 ***
## Date2017-02-12            -3.549e-01  1.098e-01  -3.231 0.001235 ** 
## Date2017-02-19            -3.419e-01  1.302e-01  -2.627 0.008627 ** 
## Date2017-02-26            -4.215e-01  1.507e-01  -2.797 0.005164 ** 
## Date2017-03-05            -4.251e-01  1.714e-01  -2.480 0.013129 *  
## Date2017-03-12            -3.285e-01  1.921e-01  -1.710 0.087302 .  
## Date2017-03-19            -3.471e-01  2.129e-01  -1.630 0.103043    
## Date2017-03-26            -4.626e-01  2.338e-01  -1.979 0.047842 *  
## Date2017-04-02            -4.685e-01  2.546e-01  -1.840 0.065827 .  
## Date2017-04-09            -5.019e-01  2.755e-01  -1.821 0.068548 .  
## Date2017-04-16            -5.145e-01  2.965e-01  -1.736 0.082668 .  
## Date2017-04-23            -5.250e-01  3.174e-01  -1.654 0.098170 .  
## Date2017-04-30            -5.692e-01  3.384e-01  -1.682 0.092565 .  
## Date2017-05-07            -7.087e-01  3.594e-01  -1.972 0.048612 *  
## Date2017-05-14            -6.952e-01  3.803e-01  -1.828 0.067596 .  
## Date2017-05-21            -7.128e-01  4.013e-01  -1.776 0.075724 .  
## Date2017-05-28            -7.456e-01  4.223e-01  -1.766 0.077483 .  
## Date2017-06-04            -8.113e-01  4.433e-01  -1.830 0.067240 .  
## Date2017-06-11            -9.062e-01  4.643e-01  -1.952 0.050979 .  
## Date2017-06-18            -9.397e-01  4.850e-01  -1.938 0.052686 .  
## Date2017-06-25            -9.705e-01  5.060e-01  -1.918 0.055126 .  
## Date2017-07-02            -1.013e+00  5.269e-01  -1.922 0.054639 .  
## Date2017-07-09            -1.093e+00  5.480e-01  -1.995 0.046086 *  
## Date2017-07-16            -1.098e+00  5.690e-01  -1.929 0.053705 .  
## Date2017-07-23            -1.168e+00  5.900e-01  -1.980 0.047707 *  
## Date2017-07-30            -1.211e+00  6.110e-01  -1.982 0.047459 *  
## Date2017-08-06            -1.229e+00  6.320e-01  -1.944 0.051912 .  
## Date2017-08-13            -1.243e+00  6.531e-01  -1.903 0.057034 .  
## Date2017-08-20            -1.199e+00  6.741e-01  -1.778 0.075360 .  
## Date2017-08-27            -1.176e+00  6.951e-01  -1.691 0.090816 .  
## Date2017-09-03            -1.182e+00  7.162e-01  -1.651 0.098748 .  
## Date2017-09-10            -1.244e+00  7.372e-01  -1.687 0.091553 .  
## Date2017-09-17            -1.305e+00  7.582e-01  -1.721 0.085223 .  
## Date2017-09-24            -1.354e+00  7.793e-01  -1.737 0.082388 .  
## Date2017-10-01            -1.367e+00  8.003e-01  -1.708 0.087733 .  
## Date2017-10-08            -1.446e+00  8.213e-01  -1.760 0.078341 .  
## Date2017-10-15            -1.557e+00  8.424e-01  -1.848 0.064581 .  
## Date2017-10-22            -1.726e+00  8.634e-01  -1.999 0.045605 *  
## Date2017-10-29            -1.840e+00  8.844e-01  -2.081 0.037469 *  
## Date2017-11-05            -1.932e+00  9.055e-01  -2.134 0.032855 *  
## Date2017-11-12            -2.032e+00  9.265e-01  -2.194 0.028274 *  
## Date2017-11-19            -2.095e+00  9.476e-01  -2.211 0.027026 *  
## Date2017-11-26            -2.143e+00  9.686e-01  -2.212 0.026972 *  
## Date2017-12-03            -2.311e+00  9.896e-01  -2.335 0.019566 *  
## Date2017-12-10            -2.404e+00  1.011e+00  -2.379 0.017391 *  
## Date2017-12-17            -2.430e+00  1.032e+00  -2.355 0.018511 *  
## Date2017-12-24            -2.411e+00  1.053e+00  -2.290 0.022038 *  
## Date2017-12-31            -2.616e+00  1.074e+00  -2.436 0.014860 *  
## Date2018-01-07            -1.983e+00  8.424e-01  -2.354 0.018601 *  
## Date2018-01-14            -1.968e+00  8.634e-01  -2.279 0.022678 *  
## Date2018-01-21            -2.066e+00  8.844e-01  -2.336 0.019528 *  
## Date2018-01-28            -2.100e+00  9.055e-01  -2.319 0.020400 *  
## Date2018-02-04            -2.307e+00  9.265e-01  -2.490 0.012781 *  
## Date2018-02-11            -2.288e+00  9.476e-01  -2.415 0.015740 *  
## Date2018-02-18            -2.273e+00  9.686e-01  -2.347 0.018951 *  
## Date2018-02-25            -2.340e+00  9.896e-01  -2.364 0.018082 *  
## Date2018-03-04            -2.399e+00  1.011e+00  -2.373 0.017634 *  
## Date2018-03-11            -2.467e+00  1.032e+00  -2.391 0.016819 *  
## Date2018-03-18            -2.538e+00  1.053e+00  -2.411 0.015919 *  
## Date2018-03-25            -2.557e+00  1.074e+00  -2.381 0.017283 *  
## Total.Volume              -4.448e-05  3.524e-05  -1.262 0.206885    
## X4046                      4.447e-05  3.524e-05   1.262 0.207010    
## X4225                      4.446e-05  3.524e-05   1.262 0.207046    
## X4770                      4.467e-05  3.524e-05   1.268 0.204947    
## Total.Bags                -2.243e-02  2.622e-02  -0.856 0.392153    
## Small.Bags                 2.248e-02  2.622e-02   0.857 0.391218    
## Large.Bags                 2.248e-02  2.622e-02   0.857 0.391218    
## XLarge.Bags                2.248e-02  2.622e-02   0.857 0.391201    
## typeorganic                4.940e-01  3.542e-03 139.467  < 2e-16 ***
## year                              NA         NA      NA       NA    
## regionAtlanta             -2.215e-01  1.737e-02 -12.747  < 2e-16 ***
## regionBaltimoreWashington -2.665e-02  1.739e-02  -1.532 0.125462    
## regionBoise               -2.136e-01  1.736e-02 -12.300  < 2e-16 ***
## regionBoston              -2.859e-02  1.738e-02  -1.644 0.100126    
## regionBuffaloRochester    -4.458e-02  1.736e-02  -2.568 0.010239 *  
## regionCalifornia          -1.717e-01  1.776e-02  -9.664  < 2e-16 ***
## regionCharlotte            4.284e-02  1.737e-02   2.467 0.013634 *  
## regionChicago             -1.260e-02  1.745e-02  -0.722 0.470530    
## regionCincinnatiDayton    -3.513e-01  1.738e-02 -20.219  < 2e-16 ***
## regionColumbus            -3.095e-01  1.736e-02 -17.828  < 2e-16 ***
## regionDallasFtWorth       -4.738e-01  1.740e-02 -27.232  < 2e-16 ***
## regionDenver              -3.384e-01  1.747e-02 -19.373  < 2e-16 ***
## regionDetroit             -2.928e-01  1.740e-02 -16.832  < 2e-16 ***
## regionGrandRapids         -5.949e-02  1.736e-02  -3.426 0.000613 ***
## regionGreatLakes          -2.476e-01  1.805e-02 -13.713  < 2e-16 ***
## regionHarrisburgScranton  -4.779e-02  1.736e-02  -2.753 0.005910 ** 
## regionHartfordSpringfield  2.586e-01  1.737e-02  14.892  < 2e-16 ***
## regionHouston             -5.110e-01  1.739e-02 -29.387  < 2e-16 ***
## regionIndianapolis        -2.476e-01  1.736e-02 -14.261  < 2e-16 ***
## regionJacksonville        -4.966e-02  1.736e-02  -2.860 0.004238 ** 
## regionLasVegas            -1.792e-01  1.736e-02 -10.321  < 2e-16 ***
## regionLosAngeles          -3.541e-01  1.764e-02 -20.069  < 2e-16 ***
## regionLouisville          -2.745e-01  1.736e-02 -15.814  < 2e-16 ***
## regionMiamiFtLauderdale   -1.301e-01  1.738e-02  -7.486 7.40e-14 ***
## regionMidsouth            -1.587e-01  1.754e-02  -9.047  < 2e-16 ***
## regionNashville           -3.493e-01  1.736e-02 -20.116  < 2e-16 ***
## regionNewOrleansMobile    -2.571e-01  1.737e-02 -14.806  < 2e-16 ***
## regionNewYork              1.711e-01  1.752e-02   9.765  < 2e-16 ***
## regionNortheast            5.327e-02  1.879e-02   2.835 0.004589 ** 
## regionNorthernNewEngland  -8.248e-02  1.737e-02  -4.748 2.07e-06 ***
## regionOrlando             -5.377e-02  1.737e-02  -3.096 0.001964 ** 
## regionPhiladelphia         7.187e-02  1.737e-02   4.138 3.52e-05 ***
## regionPhoenixTucson       -3.319e-01  1.741e-02 -19.061  < 2e-16 ***
## regionPittsburgh          -1.969e-01  1.736e-02 -11.345  < 2e-16 ***
## regionPlains              -1.212e-01  1.741e-02  -6.959 3.54e-12 ***
## regionPortland            -2.436e-01  1.738e-02 -14.018  < 2e-16 ***
## regionRaleighGreensboro   -8.023e-03  1.737e-02  -0.462 0.644139    
## regionRichmondNorfolk     -2.701e-01  1.736e-02 -15.557  < 2e-16 ***
## regionRoanoke             -3.132e-01  1.736e-02 -18.039  < 2e-16 ***
## regionSacramento           6.138e-02  1.736e-02   3.535 0.000408 ***
## regionSanDiego            -1.631e-01  1.736e-02  -9.396  < 2e-16 ***
## regionSanFrancisco         2.450e-01  1.737e-02  14.098  < 2e-16 ***
## regionSeattle             -1.178e-01  1.738e-02  -6.780 1.24e-11 ***
## regionSouthCarolina       -1.580e-01  1.736e-02  -9.098  < 2e-16 ***
## regionSouthCentral        -4.513e-01  1.799e-02 -25.094  < 2e-16 ***
## regionSoutheast           -1.517e-01  1.785e-02  -8.495  < 2e-16 ***
## regionSpokane             -1.157e-01  1.736e-02  -6.666 2.70e-11 ***
## regionStLouis             -1.308e-01  1.736e-02  -7.536 5.06e-14 ***
## regionSyracuse            -4.104e-02  1.736e-02  -2.364 0.018082 *  
## regionTampa               -1.509e-01  1.737e-02  -8.688  < 2e-16 ***
## regionTotalUS             -2.173e-01  2.167e-02 -10.028  < 2e-16 ***
## regionWest                -2.692e-01  1.811e-02 -14.864  < 2e-16 ***
## regionWestTexNewMexico    -3.092e-01  1.847e-02 -16.745  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2257 on 18017 degrees of freedom
## Multiple R-squared:  0.6899, Adjusted R-squared:  0.686 
## F-statistic: 173.5 on 231 and 18017 DF,  p-value: < 2.2e-16

Second task: Classification

Build a classification model that can predict which rating given the rest of the features in a recipe dataset. Again provide some sort of conclusion.

filepath <- "./Datasets/epi_r.csv"
recipes <- read.csv(filepath, header=TRUE)

library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
subset <- recipes[1:1000,]
myModel <- randomForest(as.factor(rating) ~ calories + protein + fat + sodium + advance.prep.required, data=recipes, na.action=na.exclude)

glm(as.factor(rating) ~ .)