ADS-22: Algorithms for Data Science (Summer 2022)
Weekly Schedule
Assignments
Announcements
Instructor: Anil Maheshwari
E-mail: anil@scs.carleton.ca
Lectures: Delivered at RKMVERI in August
2022. This course is offered in a compressed format. We will likely
have 9 hours of lectures per week for three weeks.
Office hours: During the breaks and
immediately before/after the lectures. Alternatively, send me an
e-mail at anil@scs.carleton.ca
Course
objectives:
To learn some of the algorithmic
techniques to handle data science problems.
Topics may include:
- Randomized
Algorithms in Linear Algebra
- Multiplicative-Weight
Update Method with applications
- Techniques for designing approximation
algorithms
- Dimensionality
Reduction
- Online
Algorithms (including the role of Primal-Dual LPs in
their analysis)
- Polynomial identity testing
- Fixed-Parameter
Tractability
- Nearest Neighbor Searching
- Finding
Similar Items using Hashing
- Data Streaming
- Algorithmic
Aspects of Social Networks (Graph Partitioning,
Searching various substructures)
These topics may be adjusted based on the background and
interest of the students in the class.
Required Background:
We will cover a spectrum of techniques from the design and
analysis of algorithms. It is assumed that you have a very good
grasp on:
- Analysis of algorithms (O-notation,
recurrences, and complexity analysis)
- Elementary probability theory including
expectation, indicator random variables, Variance, Markov's
Inequality (contents of COMP 2804)
- Basic data structures (lists, trees, hashing,
BST)
- Discrete mathematics (counting, permutations
and combinations, graph theory, proof techniques:
induction, contradiction, ..)
- Algorithmic techniques (divide and conquer,
greedy, dynamic programming, BFS, DFS, Connectivity, Shortest
Paths, and what is NP-Compleness)
- Linear Algebra (Eigenvalues/vectors,
Rank-Nullity, Vector Spaces, Norms).
Note that there will not be sufficient time to
review the background material to a satisfactory level during the
course. (In nutshell you must have a background that is equivalent
to the following Carleton Courses: COMP 1805, COMP 2402,
COMP 3804, and a course in Linear Algebra.)
Grading Scheme:
- Home
Work Exercises during the lectures. Onus is on you to record
the exercises and submit a solution before the beginning of
the next class.
- Project
- Pick any research paper of your interest
that is related to the course.
- Prepare a short (3-4 pages)
article in LaTeX, possibly using the overleaf system.
Note: Overleaf can be found at
https://www.overleaf.com and you may use
https://www.overleaf.com/latex/templates/lipics-v2019-sample/gqgybwgdpbpq
style file.
- Make a short
10-15 minute presentation to introduce the problem and
discuss the key points in the solution. Presentations
will take place around mid-late November.
- Test(s)
- Date to be set by RKMVERI
Schedule for
Summer 2022 ADS Course @RKMVERI
- Aug 02: Multiplicative-Weight
Update Method (Deterministic and Randomized Methods with costs
in [0,1])
- Slides
- Arora, Hazan and Kale, The multiplicative
weights update method: a meta-algorithm and
applications, Theory of Computing 8(1): 121-164, 2012.
- Chapter 11.5
in My
Notes
- Aug 04: Minimax Theorem via MWU Method
- Applications
in game theory
- Aug 06: MWIS
- Completing
the proof of minimax theorem
- Maximum-Weight
Independent Set Problem
- Slides on
MWIS
- U. Feige and
D. Reichman, Recoverable values for
independent sets, Random Structures
and Algorithms 46(1): 142-159, 2015
- Aug 09: Approximation
Algorithms via
Local Search
- Slides
on
Approximations
via Local
Search
(skip
Geometric
Hitting Sets)
- 2-approximation
of Max-Weight Cuts.
- 5-approximation
to k-Median in Metric Complete
Graphs.
- A. Gupta and
K. Tangwongsam, Simpler Analyses of
Local Search Algorithms for Facility
Location (arXiv Paper)
- Arya et al.,
Local search heuristics for k-median
and facility location problems, SIAM
Jl. Computing 33(3): 544-562, 2004.
- N.H. Mustafa
and S. Ray, Improved results on
geometric hitting set problems,
Discrete and Computational Geometry
44:883-895, 2010
- R. Aschner,
M.J. Katz, G. Morgenstern and Y.
Yuditsky, Approximation schemes for
covering and packing, WALCOM,
Lecture Notes in Computer Science
7748: 89-100, Springer, 2013.
- Aug 11: 5-approximation to k-median
(contd.) + Locality-Sensitive Orderings (Approximate Nearest
Neighbor)
- Aug 11 (Afternoon): Locality-Sensitive
Hashing (Finding Similar Objects)
- Aug 13: Locality Sensitive Orderings
(Continued)
- Aug 15: Discussion on Projects +
Assignments
- Aug 16: Locality Sensitive Orderings +
Fixed-Parameter Tractable Algorithmic Techniques
- Slides
on FPT
- Downey and Fellows, Parameterized
Complexity. Springer, 1999.
- Cygan, Fomin, Kowalik, Lokshtanov, Marx,
Pilipczuk, Pilipczuk, and Saurabh, Parameterized
Algorithms,
Springer 2015.
- Aug 18: FPT techniques - Naive,
Branch-and-bound, LP
Note: Lectures in September will be on Zoom from 6:30 PM IST
(9AM EDT)
- Sept 07: Review of FPT techniques + Iterative Compression -
illustration via Vertex Cover.
- Sept 09: Iterative Compression - illustration
via Feedback Vertex Set.
- Sept 12: Dimensionality Reduction
- Video
- Slides
on Dimensionality Reduction
- Matousek, Lectures on Discrete
Geometry, Volume 212 of Graduate Texts in
Mathematics. Springer, New York, 2002.
- Dubhashi and Panconesi,
Concentration of Measure for the Analysis of
Randomized Algorithms, Cambridge University
Press, 2009.
- Dasgupta, and Gupta, An elementary
proof of a theorem of Johnson and
Lindenstrauss" Random Structures &
Algorithms, 22 (1): 60-65, 2003.
- Johnson and Lindenstrauss,
Extensions of Lipschitz mappings into a
Hilbert space, Contemporary Mathematics
26:189-206, 1984.
- Chapter 12 in My
Notes
- Sept 14: Dimensionality Reduction
- Video
- Universality of L_\infty
- Johnson and Lindenstrauss
Theorem
- Sept 16: Johnson and Lindenstrauss
Theorem
- Sept 19: Johnson and Lindenstrauss
Theorem + Online
Matching
- Video-JL
- Slides
- Video-Matching
- Online Bipartite Matching - Primal
Dual LP Analysis
- See Section 11.1 - 11.4
of My
Notes
- Karp,
Vazirani, and
Vazirani, An optimal
algorithm for online
birpartite matching
ACM-STOC 1990
- Devanur,
Jain and Kleinberg,
Randomized primal-dual
analysis of ranking
for bipartite matching,
ACM-SIAM SODA
2013.
- Mehta,
Saberi, Vazirani and
Vazirani, AdWords and
generalized online
matching, Jl. ACM Vol.
54, 2007.
- Kalyanasundaram
and Pruhs, An optimal
deterministic
algorithm for online
b-matching,
Theoretical Computer
Science 233(1-2):
319-325, 2000.
- Sept 21: Analysis of Greedy Matching using Primal-Dual LP +
Waterlevel Algorithm for Online Matching
- Sept 23: Waterlevel+Balance Algorithm for Online Matching
- Sept 26: Analysis of Balance Algorithm (LP-->
Complementary Slackness)
Project Presentations: Mid to Late November
Final Exam: Date to be set by RKMVERI
Weekly Schedule (This is from the
COMP 5112 Winter 2022 offering at Carleton):
- Opening Remarks (Slides)
- Count-Min Sketch (Slides)
- See additional references in the slides
- J. Misra and D. Gries, Finding repeated elements, Science
of Computer Programming, Vol. 2 (2): 143 -152, 1982.
- G. Cormode and S. Muthukrishnan, An improved data stream
summary: the count-min sketch and its applications, Jl.
Algorithms 55(1): 58-75, 2005
- Section 10.1 in My
Notes
- Jan. 14:
- Reporting heavy hitters using CMS
- Approximating frequency moments in a stream
(Slides)
- AMS
Article (Alon, Matias, Szegedy, The space complexity
of approximating the frequency moments, JCSS 58(1): 137-146,
1999)
- FM
Article (Flajolet and Martin, Probabilistic
counting algorithms for data base applications, JCSS 31:
182-209, 1985.
- Section 10.3 in My
Notes
- Jan. 19: Frequency Moments - II
- Estimation of F_0 and F_2 frequency
moments (continuation of lecture on Jan 14th).
- Jan. 21: Locality-Sensitive Hashing
- Jan. 26: LSH-II +
Polynomial Identity Testing
- Slides on
Polynomial Identity Testing
- DeMillo and
Lipton, A probabilistic remark on
algebraic program testing, Information
Processing Letters 7(4):193-195, 1978.
- Mulmuley,
Vazirani and Vazirani, Matching is as
easy as matrix inversion,
Combinatorica 7(1):105-113, 1987.
- Mitzenmacher
and Upfal, Probability and Computing,
Cambridge.
- Motwani and
Raghavan, Randomized Algorithm,
Cambridge.
- Jan. 28: Polynomial
Identity Testing with applications
to Matching (Continuation of last class)
- Feb. 02: MWIS
- Slides on
MWIS
- U. Feige and
D. Reichman, Recoverable values for
independent sets, Random Structures
and Algorithms 46(1): 142-159, 2015.
- Feb. 04: Approximation Algorithms via
Local Search
- Feb. 09: Approximation
Algorithms via Local Search
- Finishing
analysis of 2-approximation of
Max-Weight Cuts. 5-approximation to
k-Median in Metric Complete Graphs.
- A. Gupta and
K. Tangwongsam, Simpler Analyses of
Local Search Algorithms for Facility
Location (arXiv
Paper)
- Arya et al.,
Local search heuristics for k-median
and facility location problems, SIAM
Jl. Computing 33(3): 544-562, 2004.
- N.H. Mustafa
and S. Ray, Improved results on
geometric hitting set problems,
Discrete and Computational Geometry
44:883-895, 2010
- R. Aschner,
M.J. Katz, G. Morgenstern and Y.
Yuditsky, Approximation schemes for
covering and packing, WALCOM, Lecture
Notes in Computer Science 7748:
89-100, Springer, 2013.
- Feb. 11: Analysis of k-Medians +
Dimensionality Reduction - I
- Slides on Dimensionality Reduction
- Matousek, Lectures on Discrete
Geometry, Volume 212 of Graduate Texts in
Mathematics. Springer, New York, 2002.
- Dubhashi and Panconesi,
Concentration of Measure for the Analysis of
Randomized Algorithms, Cambridge University
Press, 2009.
- Dasgupta, and Gupta, An elementary
proof of a theorem of Johnson and
Lindenstrauss" Random Structures &
Algorithms, 22 (1): 60-65, 2003.
- Johnson and Lindenstrauss,
Extensions of Lipschitz mappings into a
Hilbert space, Contemporary Mathematics
26:189-206, 1984.
- Chapter 12 in My
Notes
- Feb. 16: Dimensionality Reduction -
II
- Distortion, Embedding metric spaces in
low-dimensional L_\infty Spaces.
- Feb 23/25 -
Winter Break - No Classes.
- Mar. 02: MWU Method - Deterministic Schemes
- Slides
- Arora, Hazan and Kale, The multiplicative
weights update method: a meta-algorithm and
applications, Theory of Computing 8(1): 121-164, 2012.
- Chapter 11.5
in My
Notes
- Mar. 04: MWU Method - Randomized Schemes
- Mar. 09: MWU Method + Locality Sensitive Orderings
- Mar. 11: Locality Sensitive Orderings
- Mar. 18: Online Matching - I
- Online Bipartite Matching - Primal
Dual LP Analysis
- See Section 11.1 - 11.4
of My
Notes
- Karp,
Vazirani, and
Vazirani, An optimal
algorithm for online
birpartite matching
ACM-STOC 1990
- Devanur,
Jain and Kleinberg,
Randomized primal-dual
analysis of ranking
for bipartite matching,
ACM-SIAM SODA
2013.
- Mehta,
Saberi, Vazirani and
Vazirani, AdWords and
generalized online
matching, Jl. ACM Vol.
54, 2007.
- Kalyanasundaram
and Pruhs, An optimal
deterministic
algorithm for online
b-matching,
Theoretical Computer
Science 233(1-2):
319-325, 2000.
- Mar. 23: Online
Matching - II
- Online
Bipartite
Matching -
Waterlevel
Algorithm
- Mar. 25: Online Matching - III
- Randomized
and Balance
Algorithm
- Mar. 30: Approximation
Algorithms-I
- Min
Cost st-cuts, Multiway
Mincuts, Multicuts in Graphs
- Garg,
Vazirani and Yannakakis,
Primal-dual approximation
algorithms for integral flow
and multicuts in trees,
Algorithmica 18(1): 3-20,
1997.
- Garg,
Vazirani and Yannakakis,
Multiway cuts in node weighted
graphs, Journal of
Algorithms 50(1): 49-61, 2004.
- Calinescu,
Karloff and Rabani,
Approximation algorithms for
the 0-extension problem,
ACM-SIAM SODA 2001.
- Fakcharoenphol,
Rao and Talwar, A tight bound
on approximating arbitrary
metrics by tree metrics, JCSS
69(3): 485-487, 2004.
- The
design of approximation
algorithms, Williamson and
Shmoys, Cambridge University
Press, 2011.
- Apr. 01: Approximation Algorithms-II
Reference Material:
Useful References related to various topics. This
will get modified as we go along the course.
- PageRank:
- Probability:
- Locality Sensitive Hashing
- Data Streaming
- Adwords
- Chapter on Advertisement on the Web from the
MMDS Book.
- SVD/Dimensionality Reduction:
- Chapter in the MMDS book on Dimensionality
Reduction
- Try a few SVDs using Wolfram Alpha.
- Randomized Load Balancing & Perfect
Hashing
- http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec7.pdf
- Kleinberg&Tardos Algorithm Design Book,
Chapter 13.
University Policies
We follow all the rules and regulations set by RKMVERI.
Student Academic Integrity Policy. Every student should
be familiar with the RKMVERI academic integrity policy. I am
teaching the course on a volunteer basis, and I am not familiar with
the rules and regulations of RKMVERI. Therefore, I will take and
follow the advise of RKMVERI's Head of CS Department on any matter
of concern.
Announcements:
- This is the 2nd offering of this course at
RKMVERI. The contents will be somewhat different from the
first offering as some of the material covered previously now
gets covered in your Data Mining course. Our course is modeled
after COMP 5112 and COMP 3801 that I teach at Carleton.