Algorithms for Modern Data Sets (COMP 3801)

Weekly Schedule

Assignments

Announcements


Instructor: Anil Maheshwari
Office: HP 5125B (None of us can access it!)
E-mail: anil@scs.carleton.ca


Lectures: Prerecorded videos and reference material will be posted in cuLearn and its only meant for students in the course. Please do not share with anybody. I have already posted prerecorded lectures for Weeks 1-6. Therefore, till October 31st (Halloween) we won't have online lectures.

After the Fall break the lectures will be "online" via zoom.  You may attend and participate in live recordings in zoom. I plan to hold online lectures from November 2nd till the end of the term. Lectures will be recorded on Mondays and Wednesdays starting at 8:30AM from November 2nd. Recordings will be made available on cuLearn (modulo the technology glitches).


Office hours: Office hours following the online class on Wednesdays for the rest of the term.
Please feel free to send me email at anil@scs.carleton.ca and post your questions in cuLearn forum.
 

Teaching Assistant:  Yunkai Wang (He will mark Assignments and Tests)


Course objectives:  Algorithm design techniques for modern data sets arising in, for example, data mining, web analytics, epidemic spreads, search engines and social networks. Topics may include data mining, hashing, streaming, clustering, recommendation systems, link analysis, dimensionality reduction, online, social networking, game theoretic and probabilistic algorithms.


Caution: Note that you need a minimum of B+ in COMP 2804 to register in this course. The contents of this course are fairly broad, and will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on the analysis of algorithms (O-notation, recurrences, and complexity analysis), elementary probability theory including expectation and indicator random variables (contents of COMP 2804), the knowledge of basic data structures (lists, trees, hashing), and the knowledge of discrete mathematics (counting, permutations and combinations,  proof techniques:  induction, contradiction, ..). Note that there will not be time to review these material, and to appreciate the contents of this course, you must have a very good grasp on these topics. 


Textbooks:

Useful References related to various topics:


Topics

We will likely cover parts of MMDS Chapters 3, 4, 5, 7, 8, 9, 11, and possibly 10 and 6.  In addition to this there will be more material on Data Streaming from some research articles. In broad terms, some combination of the following topics:

Link Analysis
Mining Data Streams - Frequency and Moment Estimates
Finding Similar Items - Locality Sensitive Hashing
Advertising on the web - Adwords & Online matching
Recommendation Systems - Collaborative Filtering
Dimensionality Reduction - Eigenvalues, PCA, SVD
Clustering - K-Means
Mining Social Networks - Community Detection, Partitioning of Graphs,  Dynamic Graph Algorithms
Frequent Itemset
+ some probability+linear algebra as and when required.


Grading Scheme (Tentative):

As decided by the majority of students:

- Assignments: 50%

Assignment #
Posted On (Tentative)
Due Date
I
September 14
(Posted and accessible from
cuLearn)
September 25
(before 11:59PM)
II
September 25
(Posted and accessible from cuLearn)
October 9
III
October 21
(posted and accessible from cuLearn)
November 6
IV
November 6
(posted and accessible from cuLearn)
November 18
V
December 5
December 14


- Tests: 50%
: Plan is to hold tests on Saturdays: October 17, November 21, and December 5.
We will take best 2 scores out of the 3 tests.

Test #
Date and Time
Syllabus
I
October 17  9AM to 11AM
(see cuLearn Annoucements)
All material from Weeks 1-5,
Assignments 1 and 2.
II
NEW DATE (DUE TO CULEARN MAINTENANCE)

November 23
(Starting at 8:15AM TILL 9:55 AM)
All materials from Week 6 to Week 9. This includes LSH, Matrix Algebra, Markov Chains, Pagerank, Online Bipartite Matching, Balance Algorithm and Collaborative Filtering and Assignments 3 and 4.

(All contents starting from Locality-Sensitive Hashing up to and including the Lecture on Wednesday November 18th + Assignments 3 and 4)
III
December 9, 2020
(Starting at 8:15AM TILL 9:55 AM)
All contents starting from Week 9 (Online Matching) up to Clustering (Lecture on Monday December 7th)



Weekly Schedule

Please see cuLearn for video recordings and additional details.
The topics reflect the title of the Videos (in most cases)


Week #
Topic
 References
 Raw Slides
Week 1 What is this course about?  (This video will
be replaced by the Online Lecture of Sept. 9th)
Introduction, Course Logistics
Majority and  Heavy Hitters
cuLearn Video


Outline


Majority
Week 2 Count-Min Sketch
Bloom Filters
cuLearn Video CMS

Filters
Week 3 Probability Basics
Balls and Bins Experiments

cuLearn Video Basics

Balls-Bins

Week 4
Hashing
Estimating Frequency Moment F_0
cuLearn Video Hashing

Frequency Moments
Week 5 Frequency Moment F_2
Stream Statistics over Sliding Window
cuLearn Video Sliding-Window
Week 6 Locality Sensitive Hashing
Matching Fingerprints 
cuLearn Video LSH
Week 7 +8
Linear Algebra
Eigenvalues and Eigenvectors
Pagerank Algorithm 
cuLearn Video
+
Section 4.6 (My Notes)
Chapter 5 (MMDS): Section 5.1 and Parts of Section 5.2

Matrices+PageRank
Week 8+9
Adwords Problem
Balance Algorithm
Online
Chapter 8(MMDS) : Sections 8.2-8.4
Online-Matching
Week 9 Recommendation Systems Online
(MMDS - Sections 9.3, 9.4)
Collaborative Filtering
Week 10+11+12
SVD
Dimensionality Reduction
Online
MMDS - Sections 11.3+11.4
My Notes - Section 4.4+4.5
SVD

CUR
Week 12+13 Clustering
Online  Clustering






Important Considerations:

Late assignments are not accepted. Assignments submissions are handled electronically (i.e., through cuLearn) and there is no "grace period" with respect to a deadline. Technical problems do not exempt you from this requirement. You are advised to:
periodically upload you progress (e.g. upload your progress at least daily)
attempt to submit your final submission at least one hour in advance of the
due date and time.


Undergraduate Academic Advisor:

The Undergraduate Advisor for the School of Computer Science is available in Room 5302C HP; by telephone at 520-2600, ext. 4364; or by email at undergraduate_advisor@scs.carleton.ca.  The undergraduate advisor can assist with information about prerequisites and preclusions, course substitutions/equivalencies, understanding your academic audit and the remaining requirements for graduation. The undergraduate advisor will also refer students to appropriate resources such as the Science Student Success Centre, Learning Support Services and Writing Tutorial Services.


University Policies

We follow all the rules and regulations set by Carleton University, Dean of Science, and the School of Computer Science regarding accommodating students with any kind of need(s). Please consult with the appropriate authorities to see how you can be accommodated and please follow their procedures. For information about Carleton's academic year, including registration and withdrawal dates, see Carleton's Academic Calendar. Following is a standard list of recommendations that we have been advised to provide you with respect to whom to contact for the appropriate action(s):

Pregnancy Obligation. 
Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, visit Equity Services.

Religious Obligation. 
Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, visit Equity Services.

Academic Accommodations for Students with Disabilities
If you have a documented disability requiring academic accommodations in this course, please contact the Paul Menton Centre for Students with Disabilities (PMC) at 613-520-6608 or pmc@carleton.ca for a formal evaluation or contact your PMC coordinator to send your instructor your Letter of Accommodation at the beginning of the term. You must also contact the PMC no later than two weeks before the first in-class scheduled test or exam requiring accommodation (if applicable). After requesting accommodation from PMC, meet with your instructor as soon as possible to ensure accommodation arrangements are made. For more details, visit the Paul Menton Centre website.

Survivors of Sexual Violence. 
As a community, Carleton University is committed to maintaining a positive learning, working and living environment where sexual violence will not be tolerated, and survivors are supported through academic accommodations as per Carleton's Sexual Violence Policy. For more information about the services available at the university and to obtain information about sexual violence and/or support, visit: carleton.ca/sexual-violence-support.

Accommodation for Student Activities.
 Carleton University recognizes the substantial benefits, both to the individual student and for the university, that result from a student participating in activities beyond the classroom experience. Reasonable accommodation must be provided to students who compete or perform at the national or international level. Please contact your instructor with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need for accommodation is known to exist. For more details, see the policy.

Student Academic Integrity Policy. 
Every student should be familiar with the Carleton University student academic integrity policy. A student found in violation of academic integrity standards may be awarded penalties which range from a reprimand to receiving a grade of F in the course or even being expelled from the program or University. Examples of punishable offences include: plagiarism and unauthorized co-operation or collaboration.

Plagiarism. As defined by Senate, "plagiarism is presenting, whether intentional or not, the ideas, expression of ideas or work of others as one's own". Such reported offences will be reviewed by the office of the Dean of Science.

Unauthorized Co-operation or Collaboration. 
Senate policy states that "to ensure fairness and equity in assessment of term work, students shall not co-operate or collaborate in the completion of an academic assignment, in whole or in part, when the instructor has indicated that the assignment is to be completed on an individual basis". For this course, the following holds:

Announcements:

  1. All communications will be done via forums in cuLearn if it is for the interest for the whole class. Discussion that are of private nature can be done by sending me an e-mail.
  2. Material for each Week will likely be posted in advance.
  3. I am planning to hold my office hours during one of the lecture slots where you can ask any type of questions and clarifications etc. I request the TA to hold their office hour in the other time slot. If you need additional office hours, we can try to accommodate.
  4. I am planning to record some of the contents of this course in a series of short videos during Summer. Authorities are not allowing us to access the office on a more regular basis and they are understandably not able to provide a clear timeline due to the unpredictable and evolving situation of COVID-19,  Unfortunately the video recordings will be noisy due to the limitation of the location and equipment. I will try to provide ample references etc. for you to do self reading and try to help you as much as possible with extra discussion time etc. during the Fall Term.
  5. I am new to online lectures and there will several bumps! I am still hoping that you will appreciate how beautiful algorithmic techniques have influenced so many modern applications.
  6. Test 1 will take place on Saturday October 17th from 9AM till 11AM.
  7. Test 2 can't take place on Saturday Nov 21st as cuLearn is on maintenance. We will hold Test 2 during the class time slot on Monday November 23rd starting at 8:15AM.
  8. My last lecture for the course will be on Monday, December 7th. We will end the course with the topic of clustering (k-Means and k-Means++)
  9. Test 3 takes place during the class time slot on Wednesday December 9th
  10. I will hold office hours on Friday December 11th during the class time slot (We are following the Monday schedule on that day).