--- title: "Assignment1" author: "Aditya Maheshwari" date: '2018-08-17' output: html_document --- # Final Assessment This assessment involves applying the concepts you have learned so far. It is open book, but the datasets are different so keep that in mind. It is expected that you will submit working code at the end in a "knit" document in html format. The assessment has 2 questions, one will focus on regression and the other on classification. For both questions, first use filtering functions and some plots to give a feeling of what the data is like. Then remove features (columns) that are irrelevant if they are not needed. # First task: Regression Download the "avocado" dataset and perform a multiple linear regression to predict the price of the avocado given the other features. Do a proper cross validation. The code should appear in the code block below with comments specifying what you did. Also, at the end there should be some sort of conclusion. Make sure you explain in the conclusion which features are good and which ones are not good (based on hypothesis testing and results in summary of the model). ```{r avocado} filepath <- "./Datasets/avocado.csv" avocados <- read.csv(filepath, header=TRUE) myModel <- lm(AveragePrice ~ ., data=avocados) summary(myModel) ``` # Second task: Classification Build a classification model that can predict which rating given the rest of the features in a recipe dataset. Again provide some sort of conclusion. ```{r recipes} filepath <- "./Datasets/epi_r.csv" recipes <- read.csv(filepath, header=TRUE) library(randomForest) subset <- recipes[1:1000,] ``` ```{r Model} myModel <- randomForest(as.factor(rating) ~ calories + protein + fat + sodium + advance.prep.required, data=recipes, na.action=na.exclude) ``` glm(as.factor(rating) ~ .)