Algorithms for Modern Data Sets (COMP 3801)
Fall 2017

Instructor: Anil Maheshwari
Office: Herzberg Building 5125B

Lectures: Monday & Wednesday  08:35 - 09:55 CB 2202
Office hours: Tuesday 10:00 to 11:30 5125b HP. 

Teaching assistants: It is unlikely we will have a TA for this course.

Course objectives: Algorithm design techniques for modern data sets arising in, for example, data mining, web analytics,  search engines, social networks, and machine learning.

Caution: Note that you need a minimum of B+ in COMP 2804 to register in this course. The contents of this course are fairly broad, and will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on the analysis of algorithms (O-notation, recurrences, and complexity analysis), elementary probability theory including expectation and indicator random variables (contents of COMP 2804), the knowledge of basic data structures (lists, trees, hashing), and the knowledge of discrete mathematics (counting, permutations and combinations, proof techniques:  induction, contradiction, ..). Note that there will not be time to review these material, and to appreciate the contents of this course, you must have a very good grasp on these topics. 


General References:

Useful References related to various topics:


Some combination of the following topics:
Link Analysis

Finding Similar Items - Locality Sensitive Hashing

Mining Data Streams - Frequency and Moment Estimates

Advertising on the web - Adwords & Online matching.

Mining Social Networks - Community Detection, Partitioning of Graphs,  Dynamic Graph Algorithms, ...

Recommendation Systems - Collaborative Filtering

Dimensionality Reduction - Eigenvalues, PCA, SVD


Grading Scheme: TBD

Will involve some combination of Scribing, Assignments, In class tests and quiz(s), Seminar/Presentation/Project.

What is done in Fall 2017 Term:

Sept 06:
Bright & Early - the first class of Year 2017-18. Good Luck :)
               Introductory remarks. What to expect. A quick summary of some of the topics that we will cover.
               A bit of Linear Algebra and Probability.

Sept 11:
Sept 13:

Sept 18:
Sept 20:

Sept 25:
Sept 27:

Oct  02:
Oct  04:

Oct  09: Thanksgiving

Oct  11:

Oct  16:
Oct  18:

Oct 30:
Nov 01:

Nov 06:
Nov 08:
Nov 13:
Nov 15:
Nov 20:
Nov 22:
Nov 27:
Nov 29:

Dec 04:
Dec 06:
Dec 08:

Some of the projects from the past offering of this course:

Fall 17: Music Collaboration Graph; Counting Distinct Elements in Streams; Finding Regions of Interest in ImagesPredicting controversy of a Wikipedia Article

What follows below are the contents from the 1st offering of this course from Fall 2016. The format will likely change this term as we have over 20 students registered in the course:

Grading Scheme:
Assignment 1  (PDF FILE     LaTeX File)

Assignment 2  (PDF-File       LaTeX-File)

Assignment 3  There are two parts to this assignment.
Part 1: Take any four topics from the course - list what was the main idea and what were the main techniques used in your opinion for each of these topics.
Part 2: Do the same thing as Part 1 for the seminar/projects. Pick any two seminars (except your own) and highlight what was the main idea and what were the main techniques used.
The whole assignment should not exceed 3 pages. Think of writing two paragraphs for each topic - one paragraph highlighting the main idea and the other one highlighting the main technique.
What is done in Fall 2016 Term:

Sep 07:
What is a Matrix - Linear Transforms - Eigenvalues - Eigenvectors - What are Markov Matrices and their Eigenvalues. Other than the Wikipedia links on these topics, you may look at these links: 1 2  Book on Linear Algebra and its Applications by Gilbert Strang from MIT is a good source.

Sep 09: More on Eigenvalues, diagonalization and factorization of square matrices.  Page Rank Algorithm. 
Sep 14:
Eigenvalues of powers of A. PageRank Algorithm. Some useful links:
Sep 16: Pagerank + Data Streaming
  Consult the MMDS book for Data Streaming.

Sep 21: Data Streaming - Why simple sampling is not sufficient.

Sep 23:
Data Streaming - Bloom Filters

Sep 28:
Data Streaming - Flajolet - Martin algorithm for estimating distinct elements. An algorithm for Estimating Frequency Moments  (Alon, Matias, and Szegedy - JCSS 1999).

Sep 30: Handing of Project Proposal
             Data Streaming -  Analysis of AMS algorithm -  How to extend it for unknown value of n, how to estimate number of 1's in last N bits.

Oct 05:
How to estimate number of 1's in last N bits - algorithm of Datar, Gionis, Indyk, and Motwani from SIAM Jl. Computing 2002, and some of its applications.

Oct 07:
Introduction to Locality Sensitive Hashing (Consult Book and/or my Course Notes on Algorithm Design.)

Oct 12:
LSH Continued - Analysis, Applications in Document Matching, Fingerprint Matching.

Oct 14:
More Applications of LSH (Metric,  AND/OR Family of Functions, Applications using Hamming Distance & finding close points.

Oct 19:
  Due date for Assignment 1.
Matching Fingerprints             
              Advertising on the web - online algorithms for matching.

Oct 21:
Advertising on the web - online algorithms.

Nov 02:
Project Previews: Short 5-7 minute presentation on your projects. Balance Algorithm

Nov 04:
Finishing up Adwards - including variations of Balance Algorithm.

Nov 09:
Clustering: K-Means and K++-Means.  Algorithm by Arthur and Vassilvitski (SODA 2007)

Nov 11: 
Finishing up K++-Means and Introduction to Recommendation Systems.

Nov 16:
Recommendation Systems: Writing utility matrix as a product of two matrices. Introduction to SVD.
              Due Date for Assignment 2

Nov 18:
Dimensionality Reduction: SVD,  its variants, and applicationsCUR Decomposition by Mahoney and Drineas (PNAS 2008)

Nov 23:
Social Network as a Graph G: Second Eigenvalue and its applications to partitioning G. Counting triangles in G.

Nov 25:
Project Presentation  (BR--> Presentation Report) + Quiz

Nov 30:
Project Presentation (OB --> Presentation Report,   CE --> Presentation   Report)) + Quiz

Dec 02:
Project Presentation (CA   --> Presentation  Report,  TG - Presentation  Report)+ Quiz        

Dec 07:
Assignment 3 is due at Baker's Breakfast (4th floor University Center) at 8:30AM.

Undergraduate Academic Advisor:

The undergraduate advisor for the School of Computer Science is available in Room 5302C HP, by telephone at 520-2600, ext. 4364 or by email at The advisor can assist with information about prerequisites and preclusions, course substitutions/equivalencies, understanding your academic audit and the remaining requirements for graduation. The undergraduate advisor will also refer students to appropriate resources such as the Science Student Success Centre, Learning Support Services and the Writing Tutorial Services.
University Policies:

Student Academic Integrity Policy: Every student should be familiar with the Carleton University student academic integrity policy. A student found in violation of academic integrity standards may be awarded penalties which range from a reprimand to receiving a grade of F in the course or even being expelled from the program or University. Some examples of offences are: plagiarism and unauthorized co-operation or collaboration. Information on this policy may be found in the Undergraduate Calendar.

Plagiarism: As defined by Senate, "plagiarism is presenting, whether intentional or not, the ideas, expression of ideas or work of others as one's own". Such reported offences will be reviewed by the office of the Dean.

Unauthorized Co-operation or Collaboration:
Equity Statements:

You may need special arrangements to meet your academic obligations during the term. For an accommodation request the processes are as follows:

Pregnancy obligation: write to me with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website:

Religious obligation: write to me with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details visit the Equity Services website:

Academic Accommodations: The Paul Menton Centre for Students with Disabilities (PMC) provides services to students with Learning Disabilities (LD), psychiatric/mental health disabilities, Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorders (ASD), chronic medical conditions, and impairments in mobility, hearing, and vision. If you have a disability requiring academic accommodations in this course, please contact PMC at 613-520-6608 or for a formal evaluation. If you are already registered with the PMC, contact your PMC coordinator to send me your Letter of Accommodation at the beginning of the term, and no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). Requests made within two weeks will be reviewed on a case-by-case basis. After requesting accommodation from PMC, meet with me to ensure accommodation arrangements are made. Please consult the PMC website (  for the deadline to request accommodations for the formally-scheduled exam (if applicable).

Medical Certificate: The following is a link to the official medical certificate accepted by Carleton University for the deferral of final examinations or assignments in undergraduate courses. To access the form, please go to