Libraries

This section just ensures you have the correct libraries. When you run the code block if no errors show continue otherwise follow the instructions in the block or ask for help.

#install.packages("pryr")
data("cars")

# If tidyverse works
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.5
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# If tidyverse doesn't work:
  # 1. Comment out the "library(tidyverse)" line.
  # 2. Uncomment the below lines and run the block
#install.packages("dplyr")
#library(dplyr)

Introduction

Regression is a very important part of many statistical activities ranging from simple prediction, displaying a trendline, or even as a part of sohpisticated machine learning algorithms. Regression can also be used to select which features/variables are useful from data, which can be important when trying to reduce the size of a problem. At a high level, it is a mathematical technique that is used to predict the value of an outcome Y given input variables X.

Prerequisites

To understand regression it’s important that one knows the basics of calculus, basic linear algebra (vectors, matrices, operations on matrices) and so on.

Some Regression Applications

  • Stock Prediction
  • Tweet Popularity
  • Reading the mind
  • House Prices
  • Really anything where there is data and you are trying to predict an output based on an input (interpolation or extrapolation)

Simple Linear Regression

Regression can be done in n-dimension, but the math remains very similar to doing it in one dimension. When regression uses only one input to predict an output it is called simple linear regression.

Equation

The simple linear regression equation is:

\[ y_i = w_0 + w_1x_i + err_i \]

\(y_i\) stands for the output for input i, \(w_0\) is the intercept term, and \(w_1\) is the slope of the line. Again, simple linear regression is simple because you are just fitting a line. In calculus you must have seen the equation for a line - and this is kind of the same idea. The slightly complicated part is finding which line fits the set of points the “best”.

Measuring Error

To decide what the best line is, we have to come up with an idea for measuring how good one fit is from another. We can see in many cases which line seems just by observation, but when two different lines look equally good a measure; here I will introduce a fairly standard one known as the residual sum of squares (RSS).

\[RSS = \sum_{i=1}^{n}(y_i - [w_0 + w_1x_i])^2\]

This is a relative measure of distance from the real values to the fitted line. In the initial equation the noise (\(err_i\)) is the vector that makes up for difference between line and predicted value. This value is squared to include consistent distances for negative values, as well as to amplify the distance relative to how far the points are from the line.

Gradient Descent

But how to find whether a particular line has the minimum RSS?

You know from calculus finding the minimum of things usually involves some sort of a derivative. Clearly one line fits the best (least error) so if we take every possible line going through the same (similar) intercepts, they will converge at a minimum RSS. (Think about why there can’t be two lines of minimum RSS).

The algorithm used to find the minimum line is known as gradient descent. This will give approximately the correct answer, and functions by way of hill climbing. Gradient descent is usually done in the back end of most packages - but I will give a short description of the algorithm here. Many resources are online that can explain one of the steps or theory if it is unclear.

The idea is that given a step size (n), a weight vector (w), and a tolerance: while not converged: \[w^{(t+1)} = w^{(t)} - n\nabla g(w^{(t)})\]

In the case of simple linear regression, we want: \[min_{w_0,w_1}\sum_{i=1}^{N}(y_i-[w_0+w_1 x_i])^2\]

So we need the gradient of the RSS expression with respect to \(w_0\) and \(w_1\), which (you can check it yourself) evaluates to:

\[ \nabla RSS(w_0, w_1) = \begin{bmatrix} -2\sum_{i=1}^{N}[y_i-(w_0+w_1x_i)] \\ -2\sum_{i=1}^{N}[y_i-(w_0+w_1x_i)]x_i \end{bmatrix} \]

This idea can translate to more than 2 dimensions and in that case rather than finding a local min/max in a curve, it will involve finding the local min/max in a plane or hyperplane.

Regression Package and Application

To demonstrate linear regression in R, we will use the built in “cars” package which has speed and distance information from 50 cars.

cars
##    speed dist
## 1      4    2
## 2      4   10
## 3      7    4
## 4      7   22
## 5      8   16
## 6      9   10
## 7     10   18
## 8     10   26
## 9     10   34
## 10    11   17
## 11    11   28
## 12    12   14
## 13    12   20
## 14    12   24
## 15    12   28
## 16    13   26
## 17    13   34
## 18    13   34
## 19    13   46
## 20    14   26
## 21    14   36
## 22    14   60
## 23    14   80
## 24    15   20
## 25    15   26
## 26    15   54
## 27    16   32
## 28    16   40
## 29    17   32
## 30    17   40
## 31    17   50
## 32    18   42
## 33    18   56
## 34    18   76
## 35    18   84
## 36    19   36
## 37    19   46
## 38    19   68
## 39    20   32
## 40    20   48
## 41    20   52
## 42    20   56
## 43    20   64
## 44    22   66
## 45    23   54
## 46    24   70
## 47    24   92
## 48    24   93
## 49    24  120
## 50    25   85

Filtering Questions

The following questions are just some filtering tasks to get some more ideas about the data:

  1. What’s the average speed of cars with distance larger than 30?

  2. What’s the average distance for cars with speed larger than 12?

  3. What’s the first quartile for the speed of cars with speed greater than 12 and distance greater than 50?

  4. For cars with distance greater than 80, is the mean or median of the speed higher?

Plotting

First we should plot variables. Here I will explore a few more specific plots versus the freestyle ones you used in Titanic.

First we will explore scatter plots (which can check for a trend), boxplots (which can identify outliers), and density plots (to see if the response variable does not deviate too much; is close to normality).

# Scatter Plots
scatter.smooth(cars$speed,cars$distance, main="speed vs dist")

# Box Plots
par(mfrow=c(1,2))
boxplot(cars$speed,main="speed")
boxplot(cars$dist,main="dist")

# Density Plots
plot(density(cars$speed), main="speed")
plot(density(cars$dist), main="dist")

Correlations

Next we can check the correlation of the two variables to see if in fact they are related. Correlation is a measure of how strongy linear the relation of the variables is. The correlation coefficient is calculated as follows: \[r_{xy} = \frac{s_{xy}}{s_x s_y}\] Note, \(s_{xy}\) is the covariance while \(s_x\) and \(s_y\) are the standard deviations.

Standard deviation is calculated with: \[s_x=\sqrt{\frac{\sum{(x_i - \bar{x})^2}}{n}}\] Covariance is calculated with: \[\frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{n-1}\]

  1. QUESTION: Given that information, what range of values can the correlation coefficient take? Prove why the bounds bounds are the bounds.

A correlation coefficient close to 0 implies there is a weak linear relation, and positive or negative values close to the boundaries imply a strong positive linear or negative linear relationship, respectively. To explain this in more formal terms, a correlation coefficient close to 0 suggests that the variation in the response variable (Y) is largely unexplained by the predictor (X).

In R, you can calculate the correlation coefficient as follows:

cor(cars$speed, cars$dist)
## [1] 0.8068949

Linear Model

This sub section will cover building the linear model and analyzing it.

Creating a linear model

In R, you can define and save a linear model (or any model) the same way we can save functions or variables (with the assignment operator, “<-”). Below, we will create a linear model for speed and distance:

linearModel <- lm(dist ~ speed, data=cars)

print(linearModel)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Coefficients:
## (Intercept)        speed  
##     -17.579        3.932
  1. QUESTION: In the model above, what does “Intercept” and “speed” stand for using the regression equation defined at the beginning of this notebook?

Summarize specific features

We can see more informations below:

summary(linearModel)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Make predictions

Here is a method for using the linear model above to predict data for new points:

# first we set a seed
set.seed(100) 

# we will separate some data to use for testing our model (in this case 10 percent)
testingRowIndex <- sample(1:nrow(cars), 0.1*nrow(cars))
testData  <- cars[testingRowIndex, ]

# predict distance from speed inputs
distPred <- predict(linearModel, testData)  

summary (linearModel)  # model summary
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Measure error

Now we can make a table which checks how close the predictions are to the actual values and calculate the residual sum squares.

actuals_preds <- data.frame(cbind(actuals=testData$dist, predicteds=distPred))  # make actuals_predicteds dataframe.
correlation_accuracy <- cor(actuals_preds)  # 82.7%
head(actuals_preds)
##    actuals predicteds
## 16      26  33.542219
## 13      20  29.609810
## 27      32  45.339445
## 3        4   9.947766
## 22      60  37.474628
# Residual Sum Squares
sum((actuals_preds$predicteds - actuals_preds$actuals) ** 2)
## [1] 869.9426

Plotting

Finally, we will plot our regression line on the points we have from the data using the following code:

plot(cars$speed, cars$dist)
abline(linearModel)

Conclusion

We can see that the data does seem to have a lot of variation, and therefore the linear model only does an ok job of predicting.

Small Tasks and Questions

  1. TASK: Train the model on a different training set by setting the seed to 10 and calculate the residual sum of squares… is it different? Why?

NOTE: You do this by setting a new seed (0) and rerunning the above code. It is a good idea to do this in a separate R Script or R Markdown Notebook.

  1. QUESTION: Looking at the plots, what do you think the distance a car travelling at speed 30 would travel in the model you just built?

  2. TASK: What are the predictions you get for the following set of speeds? 2,3,5,15,26,30

HINT: Put the data the correct form using “as.data.frame” and a vector with the speeds required.

2nd HINT: use “colnames” function to give the dataframe column the right name for predictions, and predict using the method above.

Multiple Linear Regression

Multiple Linear Regression (MLR) works almost the same mechanically as simple linear regression (SLR), except that it accounts for multiple inputs. The formula at a high level changes from: \(y_i = w_0 + w_1x_1 + err_i\) to \(y_i = w_0 + w_1x_1 + w_2x_2 + ... + err_i\)

Code for multiple regression

Code for it is as follows:

swissModel <- lm(Infant.Mortality ~ Fertility + Agriculture + Catholic, data=swiss)

summary(swissModel)
## 
## Call:
## lm(formula = Infant.Mortality ~ Fertility + Agriculture + Catholic, 
##     data = swiss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3672 -1.4970  0.0909  1.6089  5.3076 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.565009   2.362940   5.741  8.7e-07 ***
## Fertility    0.112074   0.036105   3.104  0.00337 ** 
## Agriculture -0.032335   0.019207  -1.683  0.09953 .  
## Catholic     0.003754   0.011045   0.340  0.73560    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.652 on 43 degrees of freedom
## Multiple R-squared:  0.225,  Adjusted R-squared:  0.1709 
## F-statistic: 4.161 on 3 and 43 DF,  p-value: 0.01128

Interpreting Results

Now is a good time to understand what the summary command says. + The first element (“Call”) tells you back the formula that was inputted. + The residuals section shows how the residuals are distributed. Residuals are the difference between actual points and model prediction - and if the residuals are distributed symmetrically around 0 that is a good sign. Usually it is a good idea to plot residuals and check the distribution, but that feature may not always be available to you. Generally, if you see an even distribution around 0 and the relationship does not follow a pattern of dispersion it is a good sign.

plot(swissModel$residuals, main="Residuals Plot")
abline(h=0)

  • The coefficients have many components:
  • The first one is the estimate of the value of \(w_0\), \(w_1\) etc. in the regression equation.
    • In this case, the estimate represents that with no other factors, infant mortality will be 13.565… and with every unit increase in Fertility, the estimate can increase by ~.11, and so on for the other features.
  • The standard error estimates the average amount that estimates vary from the actual response. This means (intuitevly) that we want the standard error to be very small compared to the estimate. Using that logic it appears that in the above Fertility is the only really good indicator, and Catholic is certainly a bad one.
  • Coefficient t-value represents how far (how many standard deviations) the estimate is from 0. An answer closer to zero implies that this factor does not impact the output variable much. Therefore, for a useful predictor, we want this value to be as far from zero as possible (such as the intercept, fertility and even agriculture).
  • Pr(>|t|) is the p-value which is used in hypothesis testing. To give a tldr version of hypothesis testing:
    • You are almost always testing whether or not the null hypothesis is true (meaning that whatever input you are testing has no value on the response) and if the p-value is larger than 0.05 you typically accept that this null hypothesis is true otherwise you can reject it, and rejecting it means the input is important.
    • Typically a value will get rejected for a reason discussed above (i.e too much variation or too much an intercept too close to 0).
  • In the last section, there are many more measures of “goodness of fit” out of which the most notable one is “Adjusted-R Squared”. The Adjusted-R squared is an estimate of how much of the models response variable is explained by an input (in this case 17%) which is bad.
  • Some people also look at the F-statistic which is measured based on how much larger it is than the number of variables inputted. Either way, the point is to assess how good the “fit” is.

Discarding Variables

One biproduct of all of these pieces of information is that we can now discard features that are taking up time. That way, when we are predicting we can input less fields and also run the model on less variables (all saving runtime).

There are two easy ways you can choose to discard variables:

    1. Remove features with a high p-value.
    1. Try all subsets of features and choose the combination with the highest R-squared.

Summary

In summary, all of the above methods are used a lot in traditional experiments, and are a large part of statistical theory. It is usually ignored today, but still good to know about as in the future if you get funny results this can be a good indicator. Another very important part of these applications is having the ability to discard useless variables.

Group Task

Questions:

Answer the questions from above (except the proof in Q5 they are all numerical or TRUE/FALSE)

Fill in Answers for All Questions (above)

Task:

You will get a dataset and be required to create a model which can predict. The datasets will be split among the groups.

The task is to use the form of the notebook given above to create a similar style above, and then as a group give a 2-3 min presentation explaining what you have learned from the data.

You must: - Use some filters to highlight features of the data - Plot the data (use different plots for different things) - Build a linear model - Split the data and do some predictions - Measure the accuracy - Find some conclusion (i.e which of the multiple inputs predicts the output the best)

If you want to save a LOT of time, figure out how to do multiple regression and use all 3 inputs at once to predict the output.

Group1

Predict rating [output] given: complaints, privileges, learning [inputs]

data("attitude")
rating <- attitude[[1]]
complaints <- attitude[[2]]
privileges <- attitude[[3]]
learning <- attitude[[4]]

Group2

Predict rating [output] given: raises, critical, advance [inputs]

data("attitude")
rating <- attitude[[1]]
raises <- attitude[[5]]
critical <- attitude[[6]]
advance <- attitude[[7]]
attitude
##    rating complaints privileges learning raises critical advance
## 1      43         51         30       39     61       92      45
## 2      63         64         51       54     63       73      47
## 3      71         70         68       69     76       86      48
## 4      61         63         45       47     54       84      35
## 5      81         78         56       66     71       83      47
## 6      43         55         49       44     54       49      34
## 7      58         67         42       56     66       68      35
## 8      71         75         50       55     70       66      41
## 9      72         82         72       67     71       83      31
## 10     67         61         45       47     62       80      41
## 11     64         53         53       58     58       67      34
## 12     67         60         47       39     59       74      41
## 13     69         62         57       42     55       63      25
## 14     68         83         83       45     59       77      35
## 15     77         77         54       72     79       77      46
## 16     81         90         50       72     60       54      36
## 17     74         85         64       69     79       79      63
## 18     65         60         65       75     55       80      60
## 19     65         70         46       57     75       85      46
## 20     50         58         68       54     64       78      52
## 21     50         40         33       34     43       64      33
## 22     64         61         52       62     66       80      41
## 23     53         66         52       50     63       80      37
## 24     40         37         42       58     50       57      49
## 25     63         54         42       48     66       75      33
## 26     66         77         66       63     88       76      72
## 27     78         75         58       74     80       78      49
## 28     48         57         44       45     51       83      38
## 29     85         85         71       71     77       74      55
## 30     82         82         39       59     64       78      39

Group3

Predict Fertility[output] given: Infant.Mortality, Education, Agriculture [inputs]

data("swiss")
#output(y)
Fertility <- swiss[[1]]

Infant.Mortality <- swiss[[6]]
Education <- swiss[[4]]
Agriculture <- swiss[[2]]
#swiss

Group4

Predict Infant Mortality [output] given: Fertility, Agriculture, Catholic [input]

data("swiss")
#output(y)
Infant.Mortality <- swiss[[6]]

#input(x)
Fertility <- swiss[[1]]
Agriculture <- swiss[[2]]
Catholic <- swiss[[5]]
swiss
##              Fertility Agriculture Examination Education Catholic
## Courtelary        80.2        17.0          15        12     9.96
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Neuveville        76.9        43.5          17        15     5.16
## Porrentruy        76.1        35.3           9         7    90.57
## Broye             83.8        70.2          16         7    92.85
## Glane             92.4        67.8          14         8    97.16
## Gruyere           82.4        53.3          12         7    97.67
## Sarine            82.9        45.2          16        13    91.38
## Veveyse           87.1        64.5          14         6    98.61
## Aigle             64.1        62.0          21        12     8.52
## Aubonne           66.9        67.5          14         7     2.27
## Avenches          68.9        60.7          19        12     4.43
## Cossonay          61.7        69.3          22         5     2.82
## Echallens         68.3        72.6          18         2    24.20
## Grandson          71.7        34.0          17         8     3.30
## Lausanne          55.7        19.4          26        28    12.11
## La Vallee         54.3        15.2          31        20     2.15
## Lavaux            65.1        73.0          19         9     2.84
## Morges            65.5        59.8          22        10     5.23
## Moudon            65.0        55.1          14         3     4.52
## Nyone             56.6        50.9          22        12    15.14
## Orbe              57.4        54.1          20         6     4.20
## Oron              72.5        71.2          12         1     2.40
## Payerne           74.2        58.1          14         8     5.23
## Paysd'enhaut      72.0        63.5           6         3     2.56
## Rolle             60.5        60.8          16        10     7.72
## Vevey             58.3        26.8          25        19    18.46
## Yverdon           65.4        49.5          15         8     6.10
## Conthey           75.5        85.9           3         2    99.71
## Entremont         69.3        84.9           7         6    99.68
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## Monthey           79.4        64.9           7         3    98.22
## St Maurice        65.0        75.9           9         9    99.06
## Sierre            92.2        84.6           3         3    99.46
## Sion              79.3        63.1          13        13    96.83
## Boudry            70.4        38.4          26        12     5.62
## La Chauxdfnd      65.7         7.7          29        11    13.79
## Le Locle          72.7        16.7          22        13    11.22
## Neuchatel         64.4        17.6          35        32    16.92
## Val de Ruz        77.6        37.6          15         7     4.97
## ValdeTravers      67.6        18.7          25         7     8.65
## V. De Geneve      35.0         1.2          37        53    42.34
## Rive Droite       44.7        46.6          16        29    50.43
## Rive Gauche       42.8        27.7          22        29    58.33
##              Infant.Mortality
## Courtelary               22.2
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Neuveville               20.6
## Porrentruy               26.6
## Broye                    23.6
## Glane                    24.9
## Gruyere                  21.0
## Sarine                   24.4
## Veveyse                  24.5
## Aigle                    16.5
## Aubonne                  19.1
## Avenches                 22.7
## Cossonay                 18.7
## Echallens                21.2
## Grandson                 20.0
## Lausanne                 20.2
## La Vallee                10.8
## Lavaux                   20.0
## Morges                   18.0
## Moudon                   22.4
## Nyone                    16.7
## Orbe                     15.3
## Oron                     21.0
## Payerne                  23.8
## Paysd'enhaut             18.0
## Rolle                    16.3
## Vevey                    20.9
## Yverdon                  22.5
## Conthey                  15.1
## Entremont                19.8
## Herens                   18.3
## Martigwy                 19.4
## Monthey                  20.2
## St Maurice               17.8
## Sierre                   16.3
## Sion                     18.1
## Boudry                   20.3
## La Chauxdfnd             20.5
## Le Locle                 18.9
## Neuchatel                23.0
## Val de Ruz               20.0
## ValdeTravers             19.5
## V. De Geneve             18.0
## Rive Droite              18.2
## Rive Gauche              19.3

Extra Topic: Polynomial Regression

It is possible you are wondering: why only linear regressions? Even in many dimensions, aren’t some curves better approximated with polynomials?

You are right! There are polynomial regressions too, and they also look surprisingly the same as linear regressions. I will show a quick example here, and do so by plotting functions directly in R. By the way, plotting functions in R can be useful for non-statistics reasons as well!

set.seed(20)

#values in sequence
q <- seq(from=0, to=20, by=0.1)
#values to predict
y <- 500 + .4*(q-10)^3

# add some noise so graph looks better
noise <- rnorm(length(q), mean=10, sd=80)
noisy.y <- y + noise

# plot points
plot(q, noisy.y, col='skyblue4', xlab='q')
lines(q,y,col='firebrick1', lwd=3)

# Cubic vs Quadratic vs Linear Model
modelCube <- lm(noisy.y ~ poly(q,3))
modelQuad <- lm(noisy.y ~ poly(q,2))
modelLinr <- lm(noisy.y ~ poly(q,1))

# Summarize
summary(modelCube)
## 
## Call:
## lm(formula = noisy.y ~ poly(q, 3))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -212.326  -51.186    4.276   61.485  165.960 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  513.615      5.602   91.69   <2e-16 ***
## poly(q, 3)1 2075.899     79.422   26.14   <2e-16 ***
## poly(q, 3)2 -108.004     79.422   -1.36    0.175    
## poly(q, 3)3  864.025     79.422   10.88   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 79.42 on 197 degrees of freedom
## Multiple R-squared:  0.8031, Adjusted R-squared:  0.8001 
## F-statistic: 267.8 on 3 and 197 DF,  p-value: < 2.2e-16
summary(modelQuad)
## 
## Call:
## lm(formula = noisy.y ~ poly(q, 2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -315.977  -61.517    3.223   68.062  233.512 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   513.62       7.07  72.649   <2e-16 ***
## poly(q, 2)1  2075.90     100.23  20.711   <2e-16 ***
## poly(q, 2)2  -108.00     100.23  -1.078    0.283    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 100.2 on 198 degrees of freedom
## Multiple R-squared:  0.6848, Adjusted R-squared:  0.6816 
## F-statistic: 215.1 on 2 and 198 DF,  p-value: < 2.2e-16
summary(modelLinr)
## 
## Call:
## lm(formula = noisy.y ~ poly(q, 1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -329.81  -63.50    6.30   65.52  217.23 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  513.615      7.073   72.62   <2e-16 ***
## poly(q, 1)  2075.899    100.272   20.70   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 100.3 on 199 degrees of freedom
## Multiple R-squared:  0.6829, Adjusted R-squared:  0.6813 
## F-statistic: 428.6 on 1 and 199 DF,  p-value: < 2.2e-16
# Predictions + Lines
df <- data.frame(x=q,y)

predicted.intervals.Cube <- predict(modelCube,df)
lines(q,predicted.intervals.Cube, col='tan3', lwd=3)

predicted.intervals.Quad <- predict(modelQuad,df)
lines(q,predicted.intervals.Quad, col='darkolivegreen2', lwd=3)

predicted.intervals.Lin <- predict(modelLinr,df)
lines(q,predicted.intervals.Lin, col='steelblue4', lwd=3)

# Residual Sum Squares
RSScube <- sum((predicted.intervals.Cube - y)**2)
RSSquad <- sum((predicted.intervals.Quad - y)**2)
RSSlin <- sum((predicted.intervals.Lin - y)**2)

# Print Results 
print(paste("RSScube (Tan): ", RSScube))
## [1] "RSScube (Tan):  55675.9249100325"
print(paste("RSSquad (Green): ", RSSquad))
## [1] "RSSquad (Green):  812791.814722302"
print(paste("RSSlin (Blue): ", RSSlin))
## [1] "RSSlin (Blue):  801127.027085057"