Algorithms for Modern Data Sets (COMP 3801)

Weekly Schedule



Fall 2020 Offering of COMP 3801

Instructor: Anil Maheshwari
Office: HP 5125B

Lectures: Lectures are Monday and Wednesday at 19:35 to 20:55 PM. The course as such is planned in the HyFlex teaching room AT 102. But we will see whether we can do the lectures in-person. This depends on the mask requirements during the lectures and how many students will be physically coming to the classroom. We will do a polling and determine what is the best strategy.

To begin with, our first lecture on Wednesday September 8th will be in-person. Zoom link is already sent via e-mail and/or can be accessed in Brighspace.

Office hours: Office hours TBA
Please feel free to send me email at and post your questions in the forum (once we figure out how to set this up in Brightspace). I will also entertain your questions immediately following the class on Mondays and Wednesdays. So if you have questions, please attend the lecture (or at least join towards the end of the lecture.)

Teaching Assistant:  They will be basically marking the assignments, tests and the final exam.

Course objectives:  Algorithm design techniques for modern data sets arising in, for example, data mining, web analytics, epidemic spreads, search engines and social networks. Topics may include data mining, hashing, streaming, clustering, recommendation systems, link analysis, dimensionality reduction, online, social networking, game theoretic and probabilistic algorithms.

Caution: Note that you need a minimum of B+ in COMP 2804 to register in this course. The contents of this course are fairly broad, and will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on the analysis of algorithms (O-notation, recurrences, and complexity analysis), elementary probability theory including expectation and indicator random variables (contents of COMP 2804), the knowledge of basic data structures (lists, trees, hashing), and the knowledge of discrete mathematics (counting, permutations and combinations,  proof techniques:  induction, contradiction, ..). Note that there will not be time to review these material, and to appreciate the contents of this course, you must have a very good grasp on these topics. 


Useful References related to various topics:


We will likely cover parts of MMDS Chapters 3, 4, 5, 7, 8, 9, 11, and possibly 10 and 6.  In addition to this there will be more material on Data Streaming from some research articles. In broad terms, some combination of the following topics:

Link Analysis
Mining Data Streams - Frequency and Moment Estimates
Finding Similar Items - Locality Sensitive Hashing
Advertising on the web - Adwords & Online matching
Recommendation Systems - Collaborative Filtering
Dimensionality Reduction - Eigenvalues, PCA, SVD
Clustering - K-Means
Mining Social Networks - Community Detection, Partitioning of Graphs,  Dynamic Graph Algorithms
Frequent Itemset
+ some probability+linear algebra as and when required.

Grading Scheme (Tentative):

- Assignments: 40%

Assignment #
Posted On (Tentative)
Due Date

September 15

October 1

October 6

November 1
TeX-File + Figure
November 3

November 19
November 18

December 8

- Tests: 30%
: Plan is to hold two tests

Test #
Date and Time
October  13
November 24

- Final Exam: 30% Scheduled by the Exam Services (Seems to be on December 14th at 7 to 9:30 PM)

Weekly Schedule

Week #
(Video Recordings
at Brightspace)
Week 0

What is this course about?
Introduction + Course Logistics

Week 0/1

Majority, Heavy Hitters, and
Introduction to Count-Min Sketch.

Linear Algebra - Symmetric Matrices


Read Section 10.1 of Notes

Section 4.1 of Notes
Week 2

Bloom Filters

Linear Algebra - Eigenvalues


Section 10.2 of Notes

Section 4.2 - 4.4 of Notes
Week 3

Estimating Frequency Moments
F_0 and F_2
Markov Chains

Section 10.3 of Notes + Chapter 4 of MMDS

Section 4.7 of Notes
Weeks 3/4
Estimating Frequency Moments
F_0 and F_2
Pagerank Algorithm

Section 4.7 of Notes
Weeks 4/5 Balls and Bins + Assignment 1 Solutions

Test - 1 (October 13th)
Exercises of Chapter 4 of Notes and COMP 2804 Textbook

The test takes place during the class time slot of 19:35 to 20:55 on Brightspace. It will be completely online.  Please don't physically come to the class. The syllabus includes everything covered in the first 5 weeks of this course. There will be 4-5 questions, similar in style to Assignment. The test will be made available in Brightspace by 7:35 PM on October 13th. It is due at 20:55 PM (with a cutoff time at 21:00). Any late submissions will not be accepted. Please submit one single pdf document. Only the last version will be kept and marked. If you have any questions, either please send me an e-mail or post it in the forum if it may be of interest to others.
Week 6 Stream Statistics over Sliding Window

Locality Sensitive Hashing
Stats on Sliding

Section 10.4 of Notes

Chapter 9 of Notes
Sections: 9.1-9.3
Section: 9.6.5

Study Break (Oct 25-29)

Week 7
LSH - Metric Spaces, Hamming and Near Neighbors, Matching Fingerprints  

Adwords Problem


Chapter in on Advertising on the Web

Section 11.1 (My notes have proofs using the Linear Programming formulation)
Week 8
Balance Algorithm

Week 9

Collaborative Filtering
Solutions of Problems in Assignment 2.


Chapter 9 of MMDS
Week 10
Collaborative Filtering Contd.

Dimensionality Reduction



Section 4.5 of my notes + Chapter 11.3 and 11.4 of MMDS
Week 10-11
CUR+ Assignment 3 Problems

Test-2 (November 24th)

Test -2 takes place on Wednesday Nov. 24th during the class time slot (7:30 PM to 9:00 PM) via the Brightspace system. The material includes Stream Statistics over sliding window, locality-sensitive hashing, Adwords and Online Matching, Collaborative Filtering and SVD and its applications to recommendation systems. 
Week 12 Clustering

(How to become an expert)


k-means (Llyod's algorithm, MMDS Section 7.3) and k-means++ paper (Arthur and Vassilvitskii, 8th ACM-SIAM Symposium on Discrete algorithms, 2007)

Section 11.5.1, 11.5.2 of my notes.
Randomized MWU (without proofs) - see Section 11.5.3 and 11.5.4 of my notes.
Week 12/13



Maximum Weight Independent Sets (Recoverable Values)
December 14, 2021

Final Exam

Final Exam takes place on December 14th. It starts at 7 PM and ends at 9:30 PM. You need to upload your answers in the brightspace system by 9:45 PM. No e-mail submissions please. The course content includes all the material covered in the lectures, assignments, and tests.

Important Considerations:

Late assignments are not accepted. Assignments submissions are handled electronically and there is no "grace period" with respect to a deadline. Technical problems do not exempt you from this requirement. You are advised to:
periodically upload your progress (e.g. upload your progress at least daily)
attempt to submit your final submission at least one hour in advance of the
due date and time.

Undergraduate Academic Advisor:

The Undergraduate Advisor for the School of Computer Science is available in Room 5302C HP; by telephone at 520-2600, ext. 4364; or by email at  The undergraduate advisor can assist with information about prerequisites and preclusions, course substitutions/equivalencies, understanding your academic audit and the remaining requirements for graduation. The undergraduate advisor will also refer students to appropriate resources such as the Science Student Success Centre, Learning Support Services and Writing Tutorial Services.

University Policies

We follow all the rules and regulations set by Carleton University, Dean of Science, and the School of Computer Science regarding accommodating students with any kind of need(s). Please consult with the appropriate authorities to see how you can be accommodated and please follow their procedures. For information about Carleton's academic year, including registration and withdrawal dates, see Carleton's Academic Calendar. Following is a standard list of recommendations that we have been advised to provide you with respect to whom to contact for the appropriate action(s):

Pregnancy Obligation. 
Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, visit Equity Services.

Religious Obligation. 
Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, visit Equity Services.

Academic Accommodations for Students with Disabilities
If you have a documented disability requiring academic accommodations in this course, please contact the Paul Menton Centre for Students with Disabilities (PMC) at 613-520-6608 or for a formal evaluation or contact your PMC coordinator to send your instructor your Letter of Accommodation at the beginning of the term. You must also contact the PMC no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). After requesting accommodation from PMC, meet with your instructor as soon as possible to ensure accommodation arrangements are made. For more details, visit the Paul Menton Centre website.

Survivors of Sexual Violence. 
As a community, Carleton University is committed to maintaining a positive learning, working and living environment where sexual violence will not be tolerated, and survivors are supported through academic accommodations as per Carleton's Sexual Violence Policy. For more information about the services available at the university and to obtain information about sexual violence and/or support, visit:

Accommodation for Student Activities.
 Carleton University recognizes the substantial benefits, both to the individual student and for the university, that result from a student participating in activities beyond the classroom experience. Reasonable accommodation must be provided to students who compete or perform at the national or international level. Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, see the policy.

Student Academic Integrity Policy. 
Every student should be familiar with the Carleton University student academic integrity policy. A student found in violation of academic integrity standards may be awarded penalties which range from a reprimand to receiving a grade of F in the course or even being expelled from the program or University. Examples of punishable offences include: plagiarism and unauthorized co-operation or collaboration.

Plagiarism. As defined by Senate, "plagiarism is presenting, whether intentional or not, the ideas, expression of ideas or work of others as one's own". Such reported offences will be reviewed by the office of the Dean of Science.

Unauthorized Co-operation or Collaboration. 
Senate policy states that "to ensure fairness and equity in assessment of term work, students shall not co-operate or collaborate in the completion of an academic assignment, in whole or in part, when the instructor has indicated that the assignment is to be completed on an individual basis". For this course, the following holds:


  1.  Monday September 13th class will not take place. Instead a video is posted in the Brightspace.
  2. Assignment 1 is posted.