Assignments

To learn some of the algorithmic techniques to handle data science problems.

Topics may include:

- Data Streaming
- Frequency
Moments

- Dimensionality Reduction
- Online Algorithms (including the role of Primal-Dual LPs in their analysis)
- Finding Similar Items using Hashing
- Nearest
Neighbor Searching

- Approximation algorithms design techniques

- Algorithmic Aspects of Social Networks (Graph Partitioning, Searching various substructures)
- Randomized
Linear Algebraic techniques

- Polynomial
identity testing

These topics may be adjusted based on the background and interests of the students.

We will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on:

- Analysis of algorithms (O-notation, recurrences, and complexity analysis)
- Elementary probability theory including expectation, indicator random variables, Variance, Markov's Inequality (contents of COMP 2804)
- Basic data structures (lists, trees, hashing, BST)
- Discrete mathematics (counting, permutations and combinations, graph theory, proof techniques: induction, contradiction, ..)
- Algorithmic techniques (divide and conquer, greedy, dynamic programming, BFS, DFS, Connectivity, Shortest Paths, and what is NP-Compleness)
- Linear Algebra (Eigenvalues/vectors,
Rank-Nullity, Vector Spaces, Norms).

- Foundations of Data Science by Blum, Hopcroft and Kanan.
- My Notes on Topics in Algorithm Design
- Mining of Massive Data Sets: mmds.org

- Several Research Articles that will be mentioned during the course.

Useful References related to various topics. This
will get modified as we go along the course.

- Linear Algebra:

- Eigenvalues (1 2)
- Video Lectures and the book of Gilbert Strang
- Chapter on Matrices for CS in My Notes on Topics in Algorithm Design

- PageRank:
- Page Rank
- Original paper by Brin and Page on PageRank: The anatomy of a large-scale hypertextual web search engine (1998).
- What
can you do with a web in your pocket by Brin, Motwani,
Page and Winograd.

- Link to the TED talk of Cedric Villani on "What's so sexy about Math" and the description of PageRank.
- Probability:

- STAT110
on Youtube

- Introduction to Probability book by Blitzstein and Hwang
- Discrete Structures for Computer Science: Counting, Recursion, and Probability by Michiel Smid
- Chapter on Probability in My Notes on Topics in Algorithm Design
- Introduction
to R

- Locality Sensitive Hashing
- Chapter 3 in Mining of Massive Data Sets by
Stanford Group (mmds.org)

- Chapter on LSH in My Notes on Topics in Algorithm Design
- Useful references on LSH
- STOC98 paper of Indyk-Motwani
- Data Streaming
- Mining Data Streams chapter of MMDS
- Wikipedia
Article

- DGIM Article
- Count-Min Sketch Article
- AMS Article (Approximating frequency moments)
- Flajolet and Martin's Article on number of distinct elements in a stream
- Some parts are covered in the Chapter in My Notes on Algorithm Design
- My talk on Bloom
Filters and Count-Min-Sketch

- Adwords
- Chapter on Advertisement on the Web from the MMDS Book.
- SVD/Dimensionality Reduction:
- Chapter in the MMDS book on Dimensionality Reduction
- Try a few SVDs using Wolfram Alpha.
- Randomized Load Balancing & Perfect Hashing
- http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec7.pdf
- Kleinberg&Tardos Algorithm Design Book,
Chapter 13.

There are two components, and you need to pass in both the components to pass this course:

**Assignments + Final Exam****50%:**We may have some assignments followed by a Final Exam during the exam period. Exam will be worth at least 35% marks for the course.**Project 50%**: (initial proposal 5% + initial presentation 5% + Final Report 20% + Final Presentation 20%). Outline is as follows:

- Pick a topic. (Look at references under "Reference Material" and conferences in related areas and also use Google Scholar to see who refers to those papers etc.)

Initial Proposal:Submit 1-2 page draft. What is your topic you chose? Why? What problem(s) you will look at? What you plan to do? Outline of sections of your report? Main References. Due (in pdf format) by February 2. Your 5 minute presentation is scheduled on February 4th during the class time slot.

Final Project Presentation:Scheduled in the week ofApril 4-8.Presentation is for approximately 12-15 minutes duration describing your project during the class time slot. (Final Exam will have questions from these reports/presentations).- Report: Due by April 4.
The report format will likely be a research article. Its best to use LaTeX (e.g. see Overleaf). The sections will include:

- Introduction (Motivation, Problem Statement, Related Work, Short Summary of what you did).
- Preliminaries (In case you need to discuss some notation, definitions, etc. as background)
- Main Section -
How did you solve the problem; State Algorithm;
State its Analysis; State its Correctness.

- Experiments (in case you performed any simulation etc.)
- Conclusions
(Summary + What did you learn? + What do you
think can be done in future?)

- References
- Report
will be approximately 6 pages long
and will be posted on the course
web-page. Final Exam will have
some questions from these reports.

- You may use a double column format - for example the style file from Canadian Conference in Computational Geometry Style File from here: http://vga.usask.ca/cccg2020/CCCG2020-tex-template.zip

**Jan. 12:**

- Opening Remarks (Slides)
- Count-Min Sketch (Slides)

- See additional references in the slides
- J. Misra and D. Gries, Finding repeated elements, Science of Computer Programming, Vol. 2 (2): 143 -152, 1982.
- G. Cormode and S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications, Jl. Algorithms 55(1): 58-75, 2005
- Section 10.1 in My Notes

**Jan. 14:**

- Reporting heavy hitters using CMS
- Approximating frequency moments in a stream
(Slides)

- AMS Article (Alon, Matias, Szegedy, The space complexity of approximating the frequency moments, JCSS 58(1): 137-146, 1999)
- FM Article (Flajolet and Martin, Probabilistic counting algorithms for data base applications, JCSS 31: 182-209, 1985.
- Section 10.3 in My Notes

**Jan. 19:****Frequency Moments - II**- Estimation of F_0 and F_2 frequency
moments (continuation of lecture on Jan 14th).

**Jan. 21: Locality-Sensitive Hashing**

__Slides__- STOC98 paper of Indyk-Motwani
- Useful references on LSH
- Chapter 3 in Mining of Massive Data Sets by Stanford Group (mmds.org)
- Chapter 9 on LSH in My Notes on Topics in Algorithm Design

**Jan. 26:****LSH-II +****Polynomial Identity Testing**- Slides on Polynomial Identity Testing
- DeMillo and Lipton, A probabilistic remark on algebraic program testing, Information Processing Letters 7(4):193-195, 1978.
- Mulmuley, Vazirani and Vazirani, Matching is as easy as matrix inversion, Combinatorica 7(1):105-113, 1987.
- Mitzenmacher and Upfal, Probability and Computing, Cambridge.
- Motwani and Raghavan, Randomized Algorithm, Cambridge.

**Jan. 28:**(Continuation of last class)**Polynomial Identity Testing with applications to Matching**

**Feb. 02: MWIS**

- Slides on MWIS
- U. Feige and D. Reichman, Recoverable values for independent sets, Random Structures and Algorithms 46(1): 142-159, 2015.

**Feb. 04: Short Presentations on the Projects +**

**Approximation Algorithms via Local Search****Slides on Approximations via Local Search (skip Geometric Hitting Sets)**

**Max-Weight Cut in Graphs**

**Feb. 09:****Approximation Algorithms via Local Search**

- Finishing analysis of 2-approximation of Max-Weight Cuts. 5-approximation to k-Median in Metric Complete Graphs.

- A. Gupta and K. Tangwongsam, Simpler Analyses of Local Search Algorithms for Facility Location (arXiv Paper)
- Arya et al., Local search heuristics for k-median and facility location problems, SIAM Jl. Computing 33(3): 544-562, 2004.
- N.H. Mustafa and S. Ray, Improved results on geometric hitting set problems, Discrete and Computational Geometry 44:883-895, 2010
- R. Aschner, M.J. Katz, G. Morgenstern and Y. Yuditsky, Approximation schemes for covering and packing, WALCOM, Lecture Notes in Computer Science 7748: 89-100, Springer, 2013.

**Feb. 11:****Analysis of k-Medians + Dimensionality Reduction - I**- Slides on Dimensionality Reduction
- Matousek, Lectures on Discrete Geometry, Volume 212 of Graduate Texts in Mathematics. Springer, New York, 2002.
- Dubhashi and Panconesi, Concentration of Measure for the Analysis of Randomized Algorithms, Cambridge University Press, 2009.
- Dasgupta, and Gupta, An elementary proof of a theorem of Johnson and Lindenstrauss" Random Structures & Algorithms, 22 (1): 60-65, 2003.
- Johnson and Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Contemporary Mathematics 26:189-206, 1984.
- Chapter 12 in My
Notes

**Feb. 16: Dimensionality Reduction - II**

- Distortion, Embedding metric spaces in
low-dimensional L_\infty Spaces.

**Feb. 18:**

**J-L Theorem**

**Feb 23/25 - Winter Break - No Classes.**

**Mar. 02: MWU Method - Deterministic Schemes**

**Mar. 04: MWU Method - Randomized Schemes**

**Mar. 09:****MWU Method + Locality Sensitive Orderings**- Slides on Locality Sensitive Orderings
- Chan, Har-Peled, Jones, On locality-sensitive orderings and their applications, SIAM Jl. on Computing 49(3): 583-600, 2020.

**Mar. 11:****Locality Sensitive Orderings**

**Mar 16: Saeed's Talk on JL-Embedding Lower/Upper Bounds**

**Mar. 18:****Online Matching - I**- Online Bipartite Matching - Primal Dual LP Analysis
- See Section 11.1 - 11.4 of My Notes
- Karp, Vazirani, and Vazirani, An optimal algorithm for online birpartite matching ACM-STOC 1990
- Devanur,
Jain and Kleinberg,
Randomized primal-dual
analysis of ranking
for bipartite matching
**,**ACM-SIAM SODA 2013. - Mehta,
Saberi, Vazirani and
Vazirani, AdWords and
generalized online
matching, Jl. ACM Vol.
54, 2007.

- Kalyanasundaram
and Pruhs, An optimal
deterministic
algorithm for online
b-matching,
Theoretical Computer
Science 233(1-2):
319-325, 2000.

**Mar. 23:****Online Matching - II**- Online
Bipartite
Matching -
Waterlevel
Algorithm

**Mar. 25: Online Matching - III****Randomized and Balance Algorithm**

**Mar. 30:****Approximation Algorithms-I**- Min Cost st-cuts, Multiway Mincuts, Multicuts in Graphs
- Garg, Vazirani and Yannakakis, Primal-dual approximation algorithms for integral flow and multicuts in trees, Algorithmica 18(1): 3-20, 1997.
- Garg, Vazirani and Yannakakis, Multiway cuts in node weighted graphs, Journal of Algorithms 50(1): 49-61, 2004.
- Calinescu, Karloff and Rabani, Approximation algorithms for the 0-extension problem, ACM-SIAM SODA 2001.
- Fakcharoenphol, Rao and Talwar, A tight bound on approximating arbitrary metrics by tree metrics, JCSS 69(3): 485-487, 2004.
- The
design of approximation
algorithms, Williamson and
Shmoys, Cambridge University
Press, 2011.

**Apr. 01:****Approximation Algorithms-II**

**Apr. 06:****Presentations on the Projects**

**Apr. 08:****Presentations on the Projects**

**FINAL EXAM: Scheduled by Exam Services at Carleton**

Pregnancy Obligation.

Religious Obligation.

Academic Accommodations for Students with Disabilities

Survivors of Sexual Violence.

Accommodation for Student Activities.

Student Academic Integrity Policy.

Unauthorized Co-operation or Collaboration.

- Students are
**NOT**allowed to collaborate on assignments. Please avoid using search engines to look for answers etc. Just a word of caution - in theoretically oriented courses, it is important to come up with your own ideas for the proof/an algorithm/a contradiction/etc. Sometimes these are like logical puzzles - if somebody tells you a solution then they are trivial and hard part is to come up with a solution. What we want to learn is how to solve them ourselves. - Past experience has shown conclusively that those who do not put adequate effort into the assignments do not learn the material and have a probability near 1 of doing poorly on the exams/tests.
- Penalties for
academic offences can be found on the ODS webpage:
*.*

- Link
to the 1st offering of COMP 5112 in Fall 2020

- All communications will be done via Brightspace. If you have a question that may benefit the whole class, please post it in the forum or ask during the class/office hour. (The discussion that are of private nature can be done by sending me an e-mail.)
- I am planning to hold my office hours
immediately before and after the lectures.

- For uOttawa students: Please try to figure out
how to obtain the Brightspace account at Carleton. (I can't
help you with the administrative aspect - please contact
graduate and departmental administrators.)

- Learn (La)TeX as soon as possible - will help you to write your thesis later!