Algorithms for Modern Data Sets (COMP 3801)
Fall 2018


Instructor: Anil Maheshwari
Office: Herzberg Building 5125B
E-mail: anil@scs.carleton.ca

Lectures: Monday & Wednesday  08:35 - 09:55 SA 404
Office hours: Wednesday 10:15 to 11:45 5125b HP
                       
(Combined Office Hour with COMP 3804) 


Teaching Assistant:  Darryl Hill   (Office Hours:  Tuesday 10AM to 11:30AM in  HP 5356, starting from September 10th)              

Course objectives: Algorithm design techniques for modern data sets arising in, for example, data mining, web analytics,  search engines, social networks, and machine learning.

Caution: Note that you need a minimum of B+ in COMP 2804 to register in this course. The contents of this course are fairly broad, and will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on the analysis of algorithms (O-notation, recurrences, and complexity analysis), elementary probability theory including expectation and indicator random variables (contents of COMP 2804), the knowledge of basic data structures (lists, trees, hashing), and the knowledge of discrete mathematics (counting, permutations and combinations, proof techniques:  induction, contradiction, ..). Note that there will not be time to review these material, and to appreciate the contents of this course, you must have a very good grasp on these topics. 

Textbooks:

General References:

Useful References related to various topics:


Topics

We will likely cover parts of MMDS Chapters 3, 4, 5, 7, 8, 9, 11, and possibly 10 and 6.  In addition to this there will be more material on Data Streaming from some research articles. In broad terms, some combination of the following topics:

Link Analysis
Mining Data Streams - Frequency and Moment Estimates
Finding Similar Items - Locality Sensitive Hashing
Advertising on the web - Adwords & Online matching
Recommendation Systems - Collaborative Filtering
Dimensionality Reduction - Eigenvalues, PCA, SVD
Clustering - K-Means
Mining Social Networks - Community Detection, Partitioning of Graphs,  Dynamic Graph Algorithms
Frequent Itemset
+
some probability+linear algebra as and when required.


Grading Scheme: (Tentative)

- 4 Assignments: 25%

  Assignment 1 --> PDF-File   LaTeX-file
  Assignment 2 --> PDF-File   LaTeX-file
  Assignment 3 --> PDF-File   LaTeX-file
  Assignment 4 --> PDF-File   LaTeX-file

- Mid-Term: 25%
- Final Exam: 50%





Fall 2018 Class-Wise Schedule:

Sep 05: Introduction to various topics in the course. A probability question from Newton-Pepys

Sep 10: Basic Linear Algebra: Concept of Eigenvalues/Eigenvectors, rank, column and row vector spaces. How they relate to this course. Linearity of Expectation for random variables with some applications.
Sep 12: Balls & Bins, Load Balancing, and Hashing - General Introduction.

Sep 17: Hashing Continued. Finding the Majority Element.
Sep 19:
Analysis of Kadane's Algorithm for Majority. Count-Min Sketch.
 
Sep 24: CMS + Bloom Filters
Sep 26:
A1 Due, CMS Analysis - How to find elements that occur at least n/k times in a stream of n elements. Bloom Filters, Web-Graphs

Oct 01: Page Rank - Web-Graph - Markovian Matrix - Principal Eigenvalue/vector (Section 5.1)
Oct 03: Page Rank - Dead Ends/Spider Traps/Teleporting/Computation of PageRank (Section 5.2+5.3)

Oct 10:
Concluding Page Rank + LSH (Chapter 3).
 
Oct 15:
LSH - MinHash Signatures - Jaccard Similarity.  [Chapter 3 in MMDS, Chapter in MyNotes under LSH]
Banding Technique and the function f(s)=1-(1-s^r)^b and its implications.
Oct 17:
A2 Due (On Friday Oct 19th by 11:45AM); Distance Measure - Metric - Sensitive family of functions, and the Banding Technique.
 

Oct 29:
Fingerprinting [Chapter 3 of textbook + MyNotes]
Oct 31:
MID-TERM (PDF, TeX)

Nov 05: Fingerprints + Sensitive Families
Nov 07: Adwords Problem

Nov 12:
Adwords Problem and Balance Algorithm [Chapter 8 of TextBook]
Nov 14:
Finishing Proof(s) of Balance Algorithm + Recommendation Systems [Chapter 9 of TextBook]
 

Nov 19:
K-Clustering and K++-Clustering [Section 7.3]. Intro to Recommendation Systems [Sections 9.3+9.4]
Nov 21:
A3 Due  Utility Matrix, Cosine Distance and UV decomposition.
 
Nov 26: 
Introduction to SVDs expressing a Matrix as sum of tensors + Counting 1s in a stream
               [ Sections 11.3 + Sections 4.6 + Chapter on Matrix for CS in My Notes]
Nov 28:  SVD + Counting 1s.

Dec 03: Dimensionality Reduction using SVDs + Counting 1s.
Dec 05:
A4 Due
              Counting 1s, review, and some remarks on Final Exam.

Dec 10: Final Exam - All Material Covered in the course in classes, assignments, from Sept 5- Dec. 5 (inclusive).
              For exam preparation: Review Assignments, Review Solved and Unsolved Problems in the Text Book/Notes. Exam consists of 6 Questions that cover the main topics in the course. Exam starts at 9AM (please figure out in which room it is in - TB 238?) and is for 2.5 Hours.



Undergraduate Academic Advisor:

The undergraduate advisor for the School of Computer Science is available in Room 5302C HP, by telephone at 520-2600, ext. 4364 or by email at undergraduate_advisor@scs.carleton.ca. The advisor can assist with information about prerequisites and preclusions, course substitutions/equivalencies, understanding your academic audit and the remaining requirements for graduation. The undergraduate advisor will also refer students to appropriate resources such as the Science Student Success Centre, Learning Support Services and the Writing Tutorial Services.
University Policies:  

Student Academic Integrity Policy: Every student should be familiar with the Carleton University student academic integrity policy. A student found in violation of academic integrity standards may be awarded penalties which range from a reprimand to receiving a grade of F in the course or even being expelled from the program or University. Some examples of offences are: plagiarism and unauthorized co-operation or collaboration. Information on this policy may be found in the Undergraduate Calendar.

Plagiarism: As defined by Senate, "plagiarism is presenting, whether intentional or not, the ideas, expression of ideas or work of others as one's own". Such reported offences will be reviewed by the office of the Dean.

Unauthorized Co-operation or Collaboration:
We follow all the rules and regulations set by Carleton University regarding accommodating students with any kind of need(s). Please consult with the appropriate authorities to see how you can be accommodated and please follow their procedures.

Equity Statements:

You may need special arrangements to meet your academic obligations during the term. For an accommodation request the processes are as follows:

Pregnancy obligation: write to me with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website: http://www2.carleton.ca/equity/

Religious obligation: write to me with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website: http://www2.carleton.ca/equity/

Academic Accommodations: The Paul Menton Centre for Students with Disabilities (PMC) provides services to students with Learning Disabilities (LD), psychiatric/mental health disabilities, Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorders (ASD), chronic medical conditions, and impairments in mobility, hearing, and vision. If you have a disability requiring academic accommodations in this course, please contact PMC at 613-520-6608 or pmc@carleton.ca for a formal evaluation. If you are already registered with the PMC, contact your PMC coordinator to send me your Letter of Accommodation at the beginning of the term, and no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). Requests made within two weeks will be reviewed on a case-by-case basis. After requesting accommodation from PMC, meet with me to ensure accommodation arrangements are made. Please consult the PMC website (www.carleton.ca/pmc)  for the deadline to request accommodations for the formally-scheduled exam (if applicable).

Medical Certificate: The following is a link to the official medical certificate accepted by Carleton University for the deferral of final examinations or assignments in undergraduate courses. To access the form, please go to http://www1.carleton.ca/registrar/forms/


Announcements:
  1. Practice few simple problems in Probability and Linear Algebra using R/R-Studio
  2. Assignment 1 is posted - Please start early and this assignment will help you with the Probability and Linear Algebra background that we require in this course.
  3. Assignment 2 is posted
  4. Assignment 3 is posted
  5. Assignment 4 is posted
  6. Please remember to finish teaching evaluation!