Introduction to Data Analytics/Science
June-July 2018


Instructors: Anil Maheshwari (AM), Aditya Maheshwari (AD), Saeed Mehrabi (SM)
E-mail: AM: anil@scs.carleton.ca, AD: aditya.yow@gmail.com, SM: smehrabi@gmail.com

Lectures: 09:30 - 11:30AM in Richcraft 2311 from June 21- July 19, 2018

Course objectives: Algorithm design techniques for modern data sets arising in, for example, data mining, web analytics,  search engines, social networks, and machine learning.

Caution: The contents of this course are fairly broad, and will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on the analysis of algorithms (O-notation, recurrences, and complexity analysis), elementary probability theory including expectation and indicator random variables, the knowledge of basic data structures (lists, trees, hashing), and the knowledge of discrete mathematics (counting, permutations and combinations, proof techniques:  induction, contradiction, ..). Note that there will not be time to review these material, and to appreciate the contents of this course, you must have a very good grasp on these topics, and preferably do an extra

General References:

Useful References related to various topics:



Tentative Topics

Some combination of the following topics:
(Instructors are Listed in [ ] )

Week 0: June  21/22: [AM + AD]

  1. Randomly generate 100 numbers with a mean 0 and standard deviation 1 and store these numbers in a vector called `two'
  2. What is the expected value of the generated numbers and the actual mean
  3. Write a function which takes in a number x and seed y and returns x randomly generated numbers with seed y
  4. Write a function which returns the mean median and mode in a vector given a vector inputs
  5. Generate 1000 numbers with a mean 0 and stdev 1, and find how many of them are greater than 0.2 with a single command. Store these as a vector called `six'
  6. Multiply the vectors `six' by `two' and store results in `tricky' - what is the result and why is there a warning?
  7. Install the `housingData' package and store the `fipsCounty' data into a data frame
  8. Which state appears the most times?
  9. Load the housing data into a dataframe. Which state has the most houses sold, which state has the highest average difference between list and selling price
  10. Plot a graph of list and selling price and find the outliers in the data. Does there appear to be a relation and is there a similarity between the outliers


Week 1: June 25-29 [AM + AD]

Week 2: July  3-6 [AM + AD]

Week 3: July 9-13 [AD]

         R-Introduction

         Tables and Graphs

        Linear Regression

           Classification

  • Logistic Regression
  • Decision Trees

  • Neural Nets

    Measuring Error

    Assignment-1

    Assignment-2         

    Functional Programming in R

    Week 4: July 16-19 [SM + AD]



    Project Groups:



    Members
    Topic
    1
    Vedang, Tirth, Pranjal
    Face Recognition using IRIS Scan
    How to match the person in the database given the measurements from the IRIS scan
    2
    Bhaummi, Kajol, Vinita
    How find find the shops in geometric vicinity which match the given criteria of the user.
    3
    Sanket, Nidhi, Shivam
    Fingerprint Recognition via the LSH scheme and studying the effect of varying the cell numbers
    for designing the hash functions.
    4.
    Vrajesh, Shruti, Aarsh
    Finding where to shop the grocery given the database of barcodes of produce  with price, deals, expiry, etc.
    5
    Panth, Ketul, Rushang
    How to analyze the success of business given their annual reports.
    6
    Rahul, Smit, Meet
    How to find the best deals from online stores given multicriteria including price, shipping time, buy local, etc.
    7
    Juhi, Divyanshi
    Strategies for enabling remote desktops



    Announcements:
    1. Classes from June 21-July 6th will be offered by Anil/Aditya, July 9-13 by Aditya, and July 16-19 by Saeed/Aditya. 
    2. Exercise 1 is posted
    3. Problem Set is posted (Due on 3rd July)
    4. Short presentation related to your seminar topic is on Friday June 29th.
    5. Mid-Term Quiz - Please email your answers to anil@scs.carleton.ca by 4PM today