Fall 2018

- Mining of Massive Datasets (MMDS) by Leskovec, Rajaraman, Ullman (2nd Edition, Cambridge University Press, 2014)
- Discrete Structures for Computer Science: Counting, Recursion, and Probability by Michiel Smid
- My Notes on Topics in Algorithm Design

General References:

- Algorithm Design: Kleinberg and Tardos, Addison-Wesley.
- Networks, Crowds, and Markets: Reasoning About a Highly Connected World, By David Easley and Jon Kleinberg
- Networks, An Introduction: Newman.

- Web-page of the text
book: http://www.mmds.org/

- Statistics 110:Probability Video Lectures from Harvard
- 18.06 Introduction to Linear Algebra by Gilbert Strang
- Networked Life: 20 Q & A by Mung Chiang, Cambridge
- LaTeX
- IPE
- R

Useful References related to various topics:

- Linear Algebra:

- Eigenvalues (1 2)
- Video Lectures and the book of Gilbert Strang
- Chapter on Matrices for CS in My Notes on Topics in Algorithm Design

- PageRank:
- Page Rank
- Original paper by Brin and Page on PageRank: The anatomy of a large-scale hypertextual web search engine (1998).
- What
can you do with a web in your pocket by Brin, Motwani,
Page and Winograd.

- Link to the TED talk of Cedric Villani on "What's so sexy about Math" and the description of PageRank.
- Probability:

- STAT110
on Youtube

- Introduction to Probability book by Blitzstein and Hwang
- Discrete Structures for Computer Science: Counting, Recursion, and Probability by Michiel Smid
- Chapter on Probability in My Notes on Topics in Algorithm Design
- Introduction
to R

- Locality Sensitive Hashing
- Chapter 3 in TextBook
- Chapter on LSH in My Notes on Topics in Algorithm Design
- Useful references on LSH
- STOC98 paper of Indyk-Motwani
- Data Streaming
- Mining Data Streams chapter of MMDS
- Wikipedia
Article

- DGIM Article
- Count-Min Sketch Article
- AMS Article (Approximating frequency moments)
- Flajolet and Martin's Article on number of distinct elements in a stream
- Some parts are covered in the Chapter in My Notes on Algorithm Design
- My talk on Bloom Filters and
Count-Min-Sketch

- Adwords
- Chapter on Advertisement on the Web from the Book.
- SVD/Dimensionality Reduction:
- Chapter in the book on Dimensionality Reduction
- Try a few SVDs using Wolfram Alpha.
- Randomized Load Balancing & Perfect Hashing
- http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec7.pdf
- Kleinberg&Tardos Algorithm Design Book, Chapter 13.

Topics

We will likely cover parts of MMDS Chapters 3, 4, 5, 7, 8, 9, 11,
and possibly 10 and 6. In addition to this there will be
more material on Data Streaming from some research articles. In
broad terms, some combination of the following topics:

Link Analysis

Mining Data Streams - Frequency and Moment Estimates

Finding Similar Items - Locality Sensitive Hashing

Advertising on the web - Adwords & Online matching

Recommendation Systems - Collaborative Filtering

Dimensionality Reduction - Eigenvalues, PCA, SVD

Clustering - K-Means

Mining Social Networks - Community Detection, Partitioning of
Graphs, Dynamic Graph Algorithms

Frequent Itemset

+

some probability+linear algebra as and when required.

Grading Scheme: (Tentative)

- 4 Assignments: 25%

Assignment 1 --> PDF-File LaTeX-file

Assignment 2 --> PDF-File LaTeX-file

Assignment 3 --> PDF-File LaTeX-file

Assignment 4 --> PDF-File LaTeX-file

- Mid-Term: 25%

- Final Exam: 50%

Sep 05: Introduction to various topics in the course. A probability question from Newton-Pepys

Sep 10: Basic Linear Algebra: Concept of Eigenvalues/Eigenvectors, rank, column and row vector spaces. How they relate to this course. Linearity of Expectation for random variables with some applications.

Sep 12: Balls & Bins, Load Balancing, and Hashing - General Introduction.

Sep 17: Hashing Continued. Finding the Majority Element.

Sep 19: Analysis of Kadane's Algorithm for Majority. Count-Min Sketch.

Sep 24: CMS + Bloom Filters

Sep 26: A1 Due, CMS Analysis - How to find elements that occur at least n/k times in a stream of n elements. Bloom Filters, Web-Graphs

Oct 01: Page Rank - Web-Graph - Markovian Matrix - Principal Eigenvalue/vector (Section 5.1)

Oct 03: Page Rank - Dead Ends/Spider Traps/Teleporting/Computation of PageRank (Section 5.2+5.3)

Oct 10: Concluding Page Rank + LSH (Chapter 3).

Oct 15: LSH - MinHash Signatures - Jaccard Similarity. [Chapter 3 in MMDS, Chapter in MyNotes under LSH]

Banding Technique and the function f(s)=1-(1-s^r)^b and its implications.

Oct 17: A2 Due (On Friday Oct 19th by 11:45AM); Distance Measure - Metric - Sensitive family of functions, and the Banding Technique.

Oct 29: Fingerprinting [Chapter 3 of textbook + MyNotes]

Oct 31: MID-TERM (PDF, TeX)

Nov 05: Fingerprints + Sensitive Families

Nov 07: Adwords Problem

Nov 12: Adwords Problem and Balance Algorithm [Chapter 8 of TextBook]

Nov 14: Finishing Proof(s) of Balance Algorithm + Recommendation Systems [Chapter 9 of TextBook]

Nov 19: K-Clustering and K++-Clustering [Section 7.3]. Intro to Recommendation Systems [Sections 9.3+9.4]

Nov 21: A3 Due Utility Matrix, Cosine Distance and UV decomposition.

Nov 26: Introduction to SVDs expressing a Matrix as sum of tensors + Counting 1s in a stream

[ Sections 11.3 + Sections 4.6 + Chapter on Matrix for CS in My Notes]

Nov 28: SVD + Counting 1s.

Dec 03: Dimensionality Reduction using SVDs + Counting 1s.

Dec 05: A4 Due

Counting 1s, review, and some remarks on Final Exam.

Dec 10: Final Exam - All Material Covered in the course in classes, assignments, from Sept 5- Dec. 5 (inclusive).

For exam preparation: Review Assignments, Review Solved and Unsolved Problems in the Text Book/Notes. Exam consists of 6 Questions that cover the main topics in the course. Exam starts at 9AM (please figure out in which room it is in - TB 238?) and is for 2.5 Hours.

The undergraduate advisor for the School of Computer Science is available in Room 5302C HP, by telephone at 520-2600, ext. 4364 or by email at undergraduate_advisor@scs.carleton.ca. The advisor can assist with information about prerequisites and preclusions, course substitutions/equivalencies, understanding your academic audit and the remaining requirements for graduation. The undergraduate advisor will also refer students to appropriate resources such as the Science Student Success Centre, Learning Support Services and the Writing Tutorial Services.

- Students are encouraged to collaborate on assignments, but at the level of discussion only. When writing down the solutions, they must do so in their own words. Just a word of caution - in theoretically oriented courses, it is important to come up with your own ideas for the proof/an algorithm/a contradiction/etc. Sometimes these are like logical puzzles - if somebody tells you a solution then they are trivial and hard part is to come up with a solution. What we want to learn is how to solve them ourselves.
- Past experience has shown conclusively that those who do not put adequate effort into the assignments do not learn the material and have a probability near 1 of doing poorly on the exams.

Equity Statements:

You may need special arrangements to meet your academic obligations during the term. For an accommodation request the processes are as follows:

- Practice few simple problems in Probability and Linear Algebra using R/R-Studio
- Assignment 1 is posted - Please start early and this assignment will help you with the Probability and Linear Algebra background that we require in this course.
- Assignment 2 is posted
- Assignment 3 is posted
- Assignment 4 is posted
- Please remember to finish teaching evaluation!