Algorithms for Modern Data Sets (COMP 3801)
Fall 2018

Instructor: Anil Maheshwari
Office: Herzberg Building 5125B

Lectures: Monday & Wednesday  08:35 - 09:55 SA 404
Office hours: Wednesday 10:15 to 11:45 5125b HP
(Combined Office Hour with COMP 3804) 

Teaching Assistant:  Darryl Hill   (Office Hours:  Tuesday 10AM to 11:30AM in  HP 5356, starting from September 10th)              

Course objectives: Algorithm design techniques for modern data sets arising in, for example, data mining, web analytics,  search engines, social networks, and machine learning.

Caution: Note that you need a minimum of B+ in COMP 2804 to register in this course. The contents of this course are fairly broad, and will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on the analysis of algorithms (O-notation, recurrences, and complexity analysis), elementary probability theory including expectation and indicator random variables (contents of COMP 2804), the knowledge of basic data structures (lists, trees, hashing), and the knowledge of discrete mathematics (counting, permutations and combinations, proof techniques:  induction, contradiction, ..). Note that there will not be time to review these material, and to appreciate the contents of this course, you must have a very good grasp on these topics. 


General References:

Useful References related to various topics:


We will likely cover parts of MMDS Chapters 3, 4, 5, 7, 8, 9, 11, and possibly 10 and 6.  In addition to this there will be more material on Data Streaming from some research articles. In broad terms, some combination of the following topics:

Link Analysis
Mining Data Streams - Frequency and Moment Estimates
Finding Similar Items - Locality Sensitive Hashing
Advertising on the web - Adwords & Online matching
Recommendation Systems - Collaborative Filtering
Dimensionality Reduction - Eigenvalues, PCA, SVD
Clustering - K-Means
Mining Social Networks - Community Detection, Partitioning of Graphs,  Dynamic Graph Algorithms
Frequent Itemset
some probability+linear algebra as and when required.

Grading Scheme: (Tentative)

- 4 Assignments: 25%

  Assignment 1 --> PDF-File   LaTeX-file
  Assignment 2 --> PDF-File   LaTeX-file

- Mid-Term: 25%
- Final Exam: 50%

Fall 2018 Class-Wise Schedule:

Sep 05: Introduction to various topics in the course. A probability question from Newton-Pepys

Sep 10: Basic Linear Algebra: Concept of Eigenvalues/Eigenvectors, rank, column and row vector spaces. How they relate to this course. Linearity of Expectation for random variables with some applications.
Sep 12: Balls & Bins, Load Balancing, and Hashing - General Introduction.

Sep 17: Hashing Continued. Finding the Majority Element.
Sep 19:
Analysis of Kadane's Algorithm for Majority. Count-Min Sketch.
Sep 24: CMS + Bloom Filters
Sep 26:
A1 Due, CMS Analysis - How to find elements that occur at least n/k times in a stream of n elements. Bloom Filters, Web-Graphs

Oct 01: Page Rank - Web-Graph - Markovian Matrix - Principal Eigenvalue/vector (Section 5.1)
Oct 03: Page Rank - Dead Ends/Spider Traps/Teleporting/Computation of PageRank (Section 5.2+5.3)

Oct 10:
Concluding Page Rank + LSH (Chapter 3).
Oct 15:
LSH - MinHash Signatures - Jaccard Similarity.  [Chapter 3 in MMDS, Chapter in MyNotes under LSH]
Banding Technique and the function f(s)=1-(1-s^r)^b and its implications.
Oct 17:
A2 Due (On Friday Oct 19th by 11:45AM); Distance Measure - Metric - Sensitive family of functions, and the Banding Technique.

Oct 29:

Oct 31:

Nov 05: 
Nov 07:

Nov 12:

Nov 14:


Nov 19:
A3 Due
Nov 21:

Nov 26: 
Nov 28: 

Dec 03:
Dec 05:
A4 Due
              Review and some remarks on Final Exam.

What was done in Fall 2017 Term?

Sept 06:
Bright & Early - the first class of Year 2017-18. Good Luck :)
               Introductory remarks. What to expect. A quick summary of some of the topics that we will cover.
               A bit of Linear Algebra and Probability.

Sept 11:
Introduction to Eigenvalues/Vectors; Characteristic Polynomials; Markov Matrices - recurrent/transient, reducible/irreducible, period; (Q1: Computing Eigenvalues/vectors of an identity matrix.)
Sept 13: Eigenvalues of Markov Matrices. Principal Eigenvalue/Eigenvector of Special type of Markov Matrices.  Structure of the web graph.

Sept 18: Markov Matrices and their connection to Pagerank (Q2: Computing steady state of a 2-state Markov Chain.)
Sept 20:
More details on Pagerank, LSH

Sept 25:
LSH Continued [Chapter 3 in Book and/or See my course notes]
Sept 27: LSH + Metric; (Q3: Show that the probability that signatures match equals the Jaccard Similarity.)

Oct  02: A1 Due,
Locality Sensitive Family
Oct  04: Due date of Project Proposal (pdf file) Nearest Neighbors + Fingerprinting

Oct  09: Thanksgiving

Oct  11:  4 minute Presentation on Project Summary

Oct  16: Data Streaming Algorithms - Bloom Filters; Heavy Hitters; Estimating Moments; DGIM algorithm for counting ones in a window.
Oct  18: Data Streaming Algorithms Continued

Oct 30: Review of problems from A1
Nov 01: Mid-Term

Nov 06: CMS, Counting #1s in the last N bit of the stream (the DGIM algorithm).
Nov 08: A2 Due. Analysis of CMS.
Nov 13: Online Matching + Balance Algorithm - Lower Bounds
Nov 15: Balance Algorithm - Lower/Upper Bounds; Collaborative Filtering
Nov 20: Collaborative Filtering; Finding Missing Entries in the Utility Matrix; SVDs
Nov 22: Project Presentations; More on SVDs

Nov 27: Project Presentations; More on SVDs.
Nov 29: Project Presentations

Dec 04: Project Presentations
Dec 06:
Project Presentations 

Dec 08: Due date of Final/Edited Project Reports; A3 Due.  Some problems on K-Means + Some comments on SVD decomposition.

Dec 11: Final Exam. TB 206 at 2PM.

Undergraduate Academic Advisor:

The undergraduate advisor for the School of Computer Science is available in Room 5302C HP, by telephone at 520-2600, ext. 4364 or by email at The advisor can assist with information about prerequisites and preclusions, course substitutions/equivalencies, understanding your academic audit and the remaining requirements for graduation. The undergraduate advisor will also refer students to appropriate resources such as the Science Student Success Centre, Learning Support Services and the Writing Tutorial Services.
University Policies:  

Student Academic Integrity Policy: Every student should be familiar with the Carleton University student academic integrity policy. A student found in violation of academic integrity standards may be awarded penalties which range from a reprimand to receiving a grade of F in the course or even being expelled from the program or University. Some examples of offences are: plagiarism and unauthorized co-operation or collaboration. Information on this policy may be found in the Undergraduate Calendar.

Plagiarism: As defined by Senate, "plagiarism is presenting, whether intentional or not, the ideas, expression of ideas or work of others as one's own". Such reported offences will be reviewed by the office of the Dean.

Unauthorized Co-operation or Collaboration:
We follow all the rules and regulations set by Carleton University regarding accommodating students with any kind of need(s). Please consult with the appropriate authorities to see how you can be accommodated and please follow their procedures.

Equity Statements:

You may need special arrangements to meet your academic obligations during the term. For an accommodation request the processes are as follows:

Pregnancy obligation: write to me with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website:

Religious obligation: write to me with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website:

Academic Accommodations: The Paul Menton Centre for Students with Disabilities (PMC) provides services to students with Learning Disabilities (LD), psychiatric/mental health disabilities, Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorders (ASD), chronic medical conditions, and impairments in mobility, hearing, and vision. If you have a disability requiring academic accommodations in this course, please contact PMC at 613-520-6608 or for a formal evaluation. If you are already registered with the PMC, contact your PMC coordinator to send me your Letter of Accommodation at the beginning of the term, and no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). Requests made within two weeks will be reviewed on a case-by-case basis. After requesting accommodation from PMC, meet with me to ensure accommodation arrangements are made. Please consult the PMC website (  for the deadline to request accommodations for the formally-scheduled exam (if applicable).

Medical Certificate: The following is a link to the official medical certificate accepted by Carleton University for the deferral of final examinations or assignments in undergraduate courses. To access the form, please go to

  1. Practice few simple problems in Probability and Linear Algebra using R/R-Studio
  2. Assignment 1 is posted - Please start early and this assignment will help you with the Probability and Linear Algebra background that we require in this course.
  3. Assignment 2 is posted