Assignments

To learn some of the algorithmic techniques to handle data science problems.

Topics may include:

- Randomized
Algorithms in Linear Algebra

- Data Streaming
- Dimensionality Reduction
- Online Algorithms (including the role of Primal-Dual LPs in their analysis)
- Finding Similar Items using Hashing
- Nearest
Neighbor Searching

- Random Graphs and Second Moment Method
- Algorithmic
Aspects of Social Networks (Graph Partitioning,
Searching various substructures)

These topics may be adjusted based on the background and interest of the students in the class.

We will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on:

- Analysis of algorithms (O-notation, recurrences, and complexity analysis)
- Elementary probability theory including expectation, indicator random variables, Variance, Markov's Inequality (contents of COMP 2804)
- Basic data structures (lists, trees, hashing, BST)
- Discrete mathematics (counting, permutations and combinations, graph theory, proof techniques: induction, contradiction, ..)
- Algorithmic techniques (divide and conquer, greedy, dynamic programming, BFS, DFS, Connectivity, Shortest Paths, and what is NP-Compleness)
- Linear Algebra (Eigenvalues/vectors,
Rank-Nullity, Vector Spaces, Norms).

- Foundations of Data Science by Blum, Hopcroft and Kanan.
- My Notes on Topics in Algorithm Design
- Mining of Massive Data Sets: mmds.org

- Several Research Articles that will be mentioned during the course.

Useful References related to various topics. This
will get modified as we go along the course.

- Linear Algebra:

- Eigenvalues (1 2)
- Video Lectures and the book of Gilbert Strang
- Chapter on Matrices for CS in My Notes on Topics in Algorithm Design

- PageRank:
- Page Rank
- Original paper by Brin and Page on PageRank: The anatomy of a large-scale hypertextual web search engine (1998).
- What
can you do with a web in your pocket by Brin, Motwani,
Page and Winograd.

- Link to the TED talk of Cedric Villani on "What's so sexy about Math" and the description of PageRank.
- Probability:

- STAT110
on Youtube

- Introduction to Probability book by Blitzstein and Hwang
- Discrete Structures for Computer Science: Counting, Recursion, and Probability by Michiel Smid
- Chapter on Probability in My Notes on Topics in Algorithm Design
- Introduction
to R

- Locality Sensitive Hashing
- Chapter 3 in Mining of Massive Data Sets by
Stanford Group (mmds.org)

- Chapter on LSH in My Notes on Topics in Algorithm Design
- Useful references on LSH
- STOC98 paper of Indyk-Motwani
- Data Streaming
- Mining Data Streams chapter of MMDS
- Wikipedia
Article

- DGIM Article
- Count-Min Sketch Article
- AMS Article (Approximating frequency moments)
- Flajolet and Martin's Article on number of distinct elements in a stream
- Some parts are covered in the Chapter in My Notes on Algorithm Design
- My talk on Bloom
Filters and Count-Min-Sketch

- Adwords
- Chapter on Advertisement on the Web from the MMDS Book.
- SVD/Dimensionality Reduction:
- Chapter in the MMDS book on Dimensionality Reduction
- Try a few SVDs using Wolfram Alpha.
- Randomized Load Balancing & Perfect Hashing
- http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec7.pdf
- Kleinberg&Tardos Algorithm Design Book,
Chapter 13.

- Assignments/Exercises: 30% (including a Take Home Exam)

- Scribe Notes: 15% (you will be responsible to type in content of (part of) one of the lectures in a LaTeX template and add some relevant exercises). Please have a look at the folder

https://people.scs.carleton.ca/~maheshwa/courses/5112/Scribe-Style/

where I have posted the cmu-scribe.sty tailored for us, and a sample (l1.tex and l1.pdf). If you have issues using this style file, please let me know. (Here is a course from CMU with good examples of scribe notes:

http://www.cs.cmu.edu/afs/cs/academic/class/15854-f05/www/ )

- Project: 55% (initial proposal 10% + short report 30% + presentation 15%). Outline is as follows:

- Pick a topic. (Look at references under
"Reference Material" and conferences in related
areas and also use Google Scholar) to see who
refers to those papers etc.)

__Initial Proposal:__Submit a 1-2 page draft. What is the topic you chose? Why? What problem(s) you will look at? What you plan to do? Outline of sections of your report? Main References. Due (in pdf format) by October 2nd.

- Report (
__Initial Draft__): Due by November 9.

- Introduction (Motivation, Problem Statement, Related Work, Short Summary of what you did).
- Preliminaries (In case you need to discuss some notation, definitions, etc. as background)
- Main Section -
How did you solve the problem; State Algorithm;
State its Analysis; State its Correctness. (NOTE:
**This will be****expressed****in terms of m****ultiple****questio****ns and answers,**and I will explain this further. )

- Experiments (in case you performed any simulation etc.)
- Conclusions
(Summary + What did you learn? + What do you think
can be done in future?)

- References

__Recorded Presentation and Final Report:____Due on____November 30.__Presentation is a pre-recorded Video of approximately 10 minutes duration describing your project (this will be posted on cuLearn under course web-page). Report will be approximately 6 page long and will also be posted on cuLearn. (Take Home Exam will have questions from these reports/presentations).

Week # |
Topics |
Raw Slides |
References/ Comments |
Scribe Notes |

Week 1 | Lecture 1 (Sept. 11) Introduction. What is this course about? Course Logistics Random Sampling Lecture 2 (Sept. 16) Count-Min Sketch |
Course
Introduction Randomized Sampling Count-Min Sketch |
References are in slides. Topics on Algorithm Design (Section 10.1)+ References in the slides |
Mohamad+Mario (Lecture 1) |

Week 2 |
Lecture 3
(Sept. 18) Probability Basics Balls and Bins Experiments Lecture 4 (Sept. 23) Hashing Bloom Filters Frequency Moments (Introduction) |
Balls-Bins Hashing Bloom Filters |
References in Slide (Mitzenmacher+Upfal, Section 5.5) Topics on Algorithm Design (Section 10.2) + References in Slides |
Nathaniel+Selina (Lecture 3) NONE |

Week 3 |
Estimating Frequency Moments in a Data
Stream Sliding Window |
|||

Week 4 |
Locality-Sensitive Hashing LSH Families Applications |
|||

Week 5 |
Geometric Algorithms - Approximate Nearest Neighbors - Clustering - Core Sets |
|||

Week 6 |
Online Algorithms - Classical Examples - Matching - Analysis via Primal-Dual LP |
|||

Week 7 |
Regret Minimization - Multiplicative Weights Algorithm |
|||

Week 8 |
Linear Algebra Review Graph Partitioning Applications in Social Networks |
|||

Week 9 |
Dimensionality Reduction |
|||

Week 10 |
Second Moment Method Random Graphs |
|||

Week 11 |
Randomized Linear Algebra |
|||

Week 12 |
Project Presentations |

- periodically upload you progress (e.g. upload your progress at least daily)
- attempt to submit your final submission at least one hour in
advance of the due date and time
**.**

Pregnancy Obligation.

Religious Obligation.

Academic Accommodations for Students with Disabilities

Survivors of Sexual Violence.

Accommodation for Student Activities.

Student Academic Integrity Policy.

Unauthorized Co-operation or Collaboration.

- Students are
**NOT**allowed to collaborate on assignments. Please avoid using search engines to look for answers etc. Just a word of caution - in theoretically oriented courses, it is important to come up with your own ideas for the proof/an algorithm/a contradiction/etc. Sometimes these are like logical puzzles - if somebody tells you a solution then they are trivial and hard part is to come up with a solution. What we want to learn is how to solve them ourselves. - Past experience has shown conclusively that those who do not put adequate effort into the assignments do not learn the material and have a probability near 1 of doing poorly on the exams/tests.
- Penalties for
academic offences can be found on the ODS webpage:
*.*

- This is the very first offering of this
course! I will seek your advise in terms of any particular
algorithms that you will like to see in this course.

- All communications will be done via cuLearn forums. If you have a question that may benefit the whole class, please post it in the forum or ask during the class/office hour. (The discussion that are of private nature can be done by sending me an e-mail.)
- I am planning to take online lectures twice a week. Once I know the time zones of all the students, we will draw up a lecture schedule. Online lectures will be recorded (subject to all the technological issues involved). I am planning to make the recordings available on cuLearn. They won't be made available on any other outlets. You need a cuLearn account to access them. Please do not post or resend the links to the recorded video to anybody (thanks).
- During the online lectures we will post
quizzes. You are expected to answer them in the allocated time
limit. If we are not able to find a suitable time to hold the
lectures, we will find some alternative mechanism to host the
quizzes.
**(It is unlikely this will happen formally due to students in different time zones +****some are working full time).**

- I am planning to hold my office hours
immediately after the online lectures.

- I am new to online lectures and there will be
several bumps! I am still hoping that you will appreciate the
beautiful algorithmic techniques.

- For uOttawa students: Please try to figure out
how to obtain the cuLearn account. (I can't help you with the
administrative aspect - please contact graduate and
departmental administrators.)

- For your benefit I will be posting some of the
videos from COMP 3801 class. Those are gentle introduction to
a subset of the topics in this course. The reason for posting
them is that you may review them beforehand to get somewhat
familiar with the contents and the style of this course. This
course will be at a much faster pace. We will cover
significantly more and advanced material. Think of the COMP
3801 videos as the background preparation.

- Learn (La)TeX as soon as possible - will help
you to write your thesis later!