Algorithms for Modern Data Sets (COMP 3801)   

Weekly Schedule

Grading Scheme


Instructor: Anil Maheshwari
Office: HP 5125b
E-mail: anil@scs.carleton.ca


Lectures: Lectures are Wednesday and Fridays at 08:35 to 09:55 AM. See public class schedule for the location.

Office hours Mondays 08:30-09:45 AM (HP 5125b)

Please feel free to send me email at anil@scs.carleton.ca


Teaching Assistant:   Details will be posted here.


Course objectives:  Algorithmic design techniques for modern data sets arising in, for example, data mining, web analytics, epidemic spreads, search engines and social networks. Topics may include data mining, hashing, streaming, clustering, recommendation systems, link analysis, dimensionality reduction, online, social networking, game theoretic and probabilistic algorithms.


Caution: Note that you need a minimum of B+ in COMP 2804 to register in this course. The contents of this course are fairly broad, and will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on the analysis of algorithms (O-notation, recurrences, and complexity analysis), elementary probability theory including expectation and indicator random variables (contents of COMP 2804), the knowledge of basic data structures (lists, trees, hashing), and the knowledge of discrete mathematics (counting, permutations and combinations,  proof techniques:  induction, contradiction, ..). Note that there will not be time to review these material, and to appreciate the contents of this course, you must have a very good grasp on these topics. 


Textbooks:

Useful References related to various topics:


Topics

We will likely cover parts of MMDS Chapters 3, 4, 5, 7, 8, 9, 11, and possibly 10 and 6.  In addition to this there will be more material on Data Streaming from some research articles. In broad terms, some combination of the following topics:

Link Analysis
Mining Data Streams - Frequency and Moment Estimates
Finding Similar Items - Locality Sensitive Hashing
Advertising on the web - Adwords & Online matching
Recommendation Systems - Collaborative Filtering
Dimensionality Reduction - Eigenvalues, PCA, SVD
Clustering - K-Means
Mining Social Networks - Community Detection, Partitioning of Graphs,  Dynamic Graph Algorithms
Frequent Itemset
Online Learning
+ some probability
+ linear algebra as and when required.


Grading Scheme (Tentative):
 
- 2 Tests: 2 x 15%
: Scheduled during the class time slots on October 10 and November 14.

- Final Exam: 25% Scheduled by the Exam Services


- Project: 45%

Outline is as follows:

  1. Pick a topic.
    You may look for papers/citations in recent proceedings of the conferences ACM-SIAM Symposium on Discrete Algorithms conference (SODA), SIGKDD, WWW,  KDD,  Data Minining and Knowledge Discovery, WSDM, ICLR, ICML.
  2. Initial Proposal: Submit one page draft by October 1st.  What is the topic you chose? Why? What problem(s) you will look at? What you plan to do? Outline of sections of your report? Main References. Prepare a 3 minute presentation which will be held during the class time slot on October 3rd. 
  3. Final Project Presentation:  Presentation is for approximately 10 minutes duration.  It will be held during last 3-4 weeks of the course.
  4. Project Report: The report format will likely be a research article. Its best to use LaTeX (e.g. see Overleaf). The sections will include:
      1. Introduction (Motivation, Problem Statement, Related Work, Short Summary of what you did).
      2. Preliminaries (In case you need to discuss some notation, definitions, etc. as background)
      3. Main Section - How did you solve the problem; State Algorithm; State its Analysis; State its Correctness.
      4. Experiments (in case you performed any simulation etc.)
      5. Conclusions  (Summary + What did you learn? + What do you think can be done in future?)
      6. References
      7. Report will be approximately 4 pages long and will be posted on the course web-page.
      8. You may use a double column format - for example the style file from Canadian Conference in Computational Geometry Style File from here:  http://vga.usask.ca/cccg2020/CCCG2020-tex-template.zip



Schedule of FALL 2025 Term

Sept 03:
Introduction +  Online Learning
Sept 05:

Sept 10:

Sept 12:

Sept 17:

Sept 19:

Sept 24:

Sept 26:

Oct 01: Project Proposal Draft Due

Oct 03:
3-Minute Project Proposal Presentation

Oct 08:

Oct 10:
In-Class Test 1
 
Oct 15:

Oct 17:

Oct 29:

Oct 31:

Nov 05:

Nov 07:

Nov 12:

Nov 14: In-Class Test 2

Nov 19:

Nov 21:

Nov 26:

Nov 28:

Dec 03:

FINAL EXAM: Scheduled by the exam services



Schedule from previous term(s)

Sep 04: Introduction +  Online Learning

Sep 06: MWU (contd.) + Bloom Filters
Sep 11: Bloom Filters (contd.) + Probability Basics
Sep 13: Balls and Bins + Intro to CMS
Sep 18: CMS
Sep 20: DGIM's Algorithm for Estimating 1's in Sliding Window
Sep 25:  Estimating 1's in Sliding Window (Contd.)

Sep 27: 
Estimation of Frequency Moments F_0 and F_2
Oct 02:  Estimation of Frequency Moments F_0 and F_2 (contd.)
 
Oct 04:  LSH
Oct 09:  LSH (Contd.) + Introduction to Matrices
Oct 16: Online Bipartite Matching

Oct 18: Max k-Coveragae

Oct 30:  Solutions to Assignment Problems + Balance Algorithm

Nov 01: MID-TERM. Starts at 8:30 AM in the classroom.

Nov 06: Balance Algorithm, k-coverage

Nov 08: Matrices
Nov 13: Markov Matrices and Pagerank
Nov 15: Page Rank 

Nov 20:
Clustering

Nov 22: Collaborative Filtering

Nov 27: Singular Value Decomposition with Applications

Nov 29: SVD +  Review 

Dec 04:  Scheduled Class Replaced by Office Hour

Final Exam: 
See the Exam Schedule for Room Location.
Important Considerations:

Late assignments are not accepted. Assignments submissions are handled electronically and there is no "grace period" with respect to a deadline. Technical problems do not exempt you from this requirement. You are advised to:

Undergraduate Academic Advisor:

The Undergraduate Advisor for the School of Computer Science is available in Room 5302 HP; or by email at scs.ug.advisor@scs.carleton.ca.  The undergraduate advisor can assist with information about prerequisites and preclusions, course substitutions/equivalencies, understanding your academic audit and the remaining requirements for graduation. The undergraduate advisor will also refer students to appropriate resources such as the Science Student Success Centre, Learning Support Services and Writing Tutorial Services.


University Policies

Carleton is committed to providing academic accessibility for all individuals. Please review the academic accommodation available to students here: https://students.carleton.ca/course-outline/. We follow all the rules and regulations set by Carleton University, Faculty of Science, and the School of Computer Science regarding accommodating students with any kind of need(s). Please consult with the appropriate authorities to see how you can be accommodated and please follow their procedures. For information about Carleton's academic year, including registration and withdrawal dates, see Carleton's Academic Calendar.  Following is a standard list of recommendations that we have been advised to provide you with respect to whom to contact for the appropriate action(s):

Pregnancy Obligation. 
Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, visit Equity Services.

Religious Obligation. 
Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, visit Equity Services.

Academic Accommodations for Students with Disabilities
If you have a documented disability requiring academic accommodations in this course, please contact the Paul Menton Centre for Students with Disabilities (PMC) at 613-520-6608 or pmc@carleton.ca for a formal evaluation or contact your PMC coordinator to send your instructor your Letter of Accommodation at the beginning of the term. You must also contact the PMC no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). After requesting accommodation from PMC, meet with your instructor as soon as possible to ensure accommodation arrangements are made. For more details, visit the Paul Menton Centre website.

Survivors of Sexual Violence. 
As a community, Carleton University is committed to maintaining a positive learning, working and living environment where sexual violence will not be tolerated, and survivors are supported through academic accommodations as per Carleton's Sexual Violence Policy. For more information about the services available at the university and to obtain information about sexual violence and/or support, visit: carleton.ca/sexual-violence-support.

Accommodation for Student Activities.
 Carleton University recognizes the substantial benefits, both to the individual student and for the university, that result from a student participating in activities beyond the classroom experience. Reasonable accommodation must be provided to students who compete or perform at the national or international level. Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, see the policy.

Student Academic Integrity Policy. 
Every student should be familiar with the Carleton University student academic integrity policy. A student found in violation of academic integrity standards may be awarded penalties which range from a reprimand to receiving a grade of F in the course or even being expelled from the program or University. Examples of punishable offences include: plagiarism and unauthorized co-operation or collaboration.

Plagiarism. As defined by Senate, "plagiarism is presenting, whether intentional or not, the ideas, expression of ideas or work of others as one's own". Such reported offences will be reviewed by the office of the Dean of Science. More information and standard sanction guidelines can be found here: https://science.carleton.ca/students/academic-integrity/.   Please note that content generated by an unauthorized A.I.-based tool *is* considered plagiarized material.

Unauthorized Co-operation or Collaboration. Senate policy states that "to ensure fairness and equity in assessment of term work, students shall not co-operate or collaborate in the completion of an academic assignment, in whole or in part, when the instructor has indicated that the assignment is to be completed on an individual basis". For this course, the following holds:

Important Dates: For important academic dates and deadlines, refer to  Carleton's Academic Calendar.


Announcements: Please attend classes to know any course related announcements.