COMP 5112: Algorithms for Data Science (Winter 2022 Term)
Weekly Schedule
Assignments
Announcements
Instructor: Anil Maheshwari
Office: Herzberg Building 5125B
E-mail: anil@scs.carleton.ca
Lectures: Lectures twice a week. I am
planning to hold lectures on Wednesdays and Fridays at 2:30 PM. The
recorded videos will likely be posted in Brightspace. Initially, the
course is online, and hopefully, by the end-of-January, it will move
into the classroom setting in TB 446. If so, please come to the
class, though we will make suitable arrangements if you can't come
physically.
Office hours: During the breaks and
immediately before/after the lectures.
Course
objectives:
To learn some of the algorithmic
techniques to handle data science problems.
Topics may include:
- Data
Streaming
- Frequency
Moments
-
Dimensionality Reduction
- Online
Algorithms (including the role of Primal-Dual LPs in
their analysis)
- Finding
Similar Items using Hashing
- Nearest
Neighbor Searching
- Approximation algorithms design techniques
- Algorithmic
Aspects of Social Networks (Graph Partitioning,
Searching various substructures)
- Randomized
Linear Algebraic techniques
- Polynomial
identity testing
These topics may be adjusted based on the background and
interests of the students.
Required Background:
We will cover a spectrum of techniques from the design and
analysis of algorithms. It is assumed that you have a very good
grasp on:
- Analysis of algorithms (O-notation,
recurrences, and complexity analysis)
- Elementary probability theory including
expectation, indicator random variables, Variance, Markov's
Inequality (contents of COMP 2804)
- Basic data structures (lists, trees, hashing,
BST)
- Discrete mathematics (counting, permutations
and combinations, graph theory, proof techniques:
induction, contradiction, ..)
- Algorithmic techniques (divide and conquer,
greedy, dynamic programming, BFS, DFS, Connectivity, Shortest
Paths, and what is NP-Compleness)
- Linear Algebra (Eigenvalues/vectors,
Rank-Nullity, Vector Spaces, Norms).
Note that there will not be sufficient time to
review the background material to a satisfactory level during the
course. (In nutshell you must have a background that is equivalent
to the following Carleton Courses: COMP 1805, COMP 2402,
COMP 3804, and a course in Linear Algebra.)
Reference Material:
Useful References related to various topics. This
will get modified as we go along the course.
- PageRank:
- Probability:
- Locality Sensitive Hashing
- Data Streaming
- Adwords
- Chapter on Advertisement on the Web from the
MMDS Book.
- SVD/Dimensionality Reduction:
- Chapter in the MMDS book on Dimensionality
Reduction
- Try a few SVDs using Wolfram Alpha.
- Randomized Load Balancing & Perfect
Hashing
- http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec7.pdf
- Kleinberg&Tardos Algorithm Design Book,
Chapter 13.
Grading Scheme: (Tentative)
There are two components, and you need to pass in both the
components to pass this course:
- Assignments
+ Final Exam 50%: We may
have some assignments followed by a Final Exam during the
exam period. Exam will be worth at least 35% marks for the
course.
- Project
50%: (initial proposal 5% + initial presentation 5%
+ Final Report 20% + Final Presentation 20%). Outline is
as follows:
- Pick a topic. (Look at references
under "Reference Material" and conferences
in related areas and also use Google Scholar to
see who
refers to those papers etc.)
- Initial
Proposal: Submit 1-2 page
draft. What is your topic you
chose? Why? What problem(s) you will
look at? What you plan to do?
Outline of sections of your report?
Main References. Due (in pdf format)
by February
2. Your 5
minute presentation is scheduled
on February 4th during the class
time slot.
- Final
Project Presentation: Scheduled
in the week of April 4-8.
Presentation
is for approximately 12-15 minutes
duration describing your project during
the class time slot. (Final Exam
will have questions from these
reports/presentations).
- Report: Due by April 4. The
report format will likely be a research article. Its
best to use LaTeX (e.g. see Overleaf). The sections will
include:
- Introduction
(Motivation, Problem Statement, Related Work,
Short Summary of what you did).
- Preliminaries
(In case you need to discuss some notation,
definitions, etc. as background)
- Main Section -
How did you solve the problem; State Algorithm;
State its Analysis; State its Correctness.
- Experiments (in
case you performed any simulation etc.)
- Conclusions
(Summary + What did you learn? + What do you
think can be done in future?)
- References
- Report
will be approximately 6 pages long
and will be posted on the course
web-page. Final Exam will have
some questions from these reports.
- You may
use a double column
format - for example
the style file from
Canadian Conference
in Computational
Geometry Style File
from here:
http://vga.usask.ca/cccg2020/CCCG2020-tex-template.zip
Schedule for
Winter 2022 Term (Tentative):
- Opening Remarks (Slides)
- Count-Min Sketch (Slides)
- See additional references in the slides
-
J. Misra and D. Gries, Finding repeated elements, Science of
Computer Programming, Vol. 2 (2): 143 -152, 1982.
-
G. Cormode and S. Muthukrishnan, An improved data stream
summary: the count-min sketch and its applications, Jl.
Algorithms 55(1): 58-75, 2005
- Section 10.1 in My
Notes
- Jan. 14:
- Reporting heavy hitters using CMS
- Approximating frequency moments in a stream
(Slides)
- AMS
Article (Alon, Matias, Szegedy, The space complexity
of approximating the frequency moments, JCSS 58(1): 137-146,
1999)
- FM
Article (Flajolet and Martin, Probabilistic
counting algorithms for data base applications, JCSS 31:
182-209, 1985.
- Section 10.3 in My
Notes
- Jan. 19: Frequency Moments - II
- Estimation of F_0 and F_2 frequency
moments (continuation of lecture on Jan 14th).
- Jan. 21: Locality-Sensitive Hashing
- Jan. 26: LSH-II +
Polynomial Identity Testing
- Slides on
Polynomial Identity Testing
- DeMillo and
Lipton, A probabilistic remark on
algebraic program testing, Information
Processing Letters 7(4):193-195, 1978.
- Mulmuley,
Vazirani and Vazirani, Matching is as
easy as matrix inversion,
Combinatorica 7(1):105-113, 1987.
- Mitzenmacher
and Upfal, Probability and Computing,
Cambridge.
- Motwani and
Raghavan, Randomized Algorithm,
Cambridge.
- Jan. 28: Polynomial
Identity Testing with applications
to Matching (Continuation of last class)
- Feb. 02: MWIS
- Slides on
MWIS
- U. Feige and
D. Reichman, Recoverable values for
independent sets, Random Structures
and Algorithms 46(1): 142-159, 2015.
- Feb. 04: Short
Presentations on the Projects +
Approximation Algorithms via
Local Search
- Feb. 09: Approximation
Algorithms via Local Search
- Finishing
analysis of 2-approximation of
Max-Weight Cuts. 5-approximation to
k-Median in Metric Complete Graphs.
- A. Gupta and
K. Tangwongsam, Simpler Analyses of
Local Search Algorithms for Facility
Location (arXiv
Paper)
- Arya et al.,
Local search heuristics for k-median
and facility location problems, SIAM
Jl. Computing 33(3): 544-562, 2004.
- N.H. Mustafa
and S. Ray, Improved results on
geometric hitting set problems,
Discrete and Computational Geometry
44:883-895, 2010
- R. Aschner,
M.J. Katz, G. Morgenstern and Y.
Yuditsky, Approximation schemes for
covering and packing, WALCOM, Lecture
Notes in Computer Science 7748:
89-100, Springer, 2013.
- Feb. 11: Analysis of k-Medians +
Dimensionality Reduction - I
- Slides on Dimensionality Reduction
- Matousek, Lectures on Discrete
Geometry, Volume 212 of Graduate Texts in
Mathematics. Springer, New York, 2002.
- Dubhashi and Panconesi,
Concentration of Measure for the Analysis of
Randomized Algorithms, Cambridge University
Press, 2009.
- Dasgupta, and Gupta, An elementary
proof of a theorem of Johnson and
Lindenstrauss" Random Structures &
Algorithms, 22 (1): 60-65, 2003.
- Johnson and Lindenstrauss,
Extensions of Lipschitz mappings into a
Hilbert space, Contemporary Mathematics
26:189-206, 1984.
- Chapter 12 in My
Notes
- Feb. 16: Dimensionality Reduction -
II
- Distortion, Embedding metric spaces in
low-dimensional L_\infty Spaces.
- Feb 23/25 -
Winter Break - No Classes.
- Mar. 02: MWU Method - Deterministic Schemes
- Slides
- Arora, Hazan and Kale, The multiplicative
weights update method: a meta-algorithm and
applications, Theory of Computing 8(1): 121-164, 2012.
- Chapter 11.5
in My
Notes
- Mar. 04: MWU Method - Randomized Schemes
- Mar. 09: MWU Method + Locality Sensitive Orderings
- Mar. 11: Locality Sensitive Orderings
- Mar 16: Saeed's Talk on JL-Embedding
Lower/Upper Bounds
- Mar. 18: Online Matching - I
- Online Bipartite Matching - Primal
Dual LP Analysis
- See Section 11.1 - 11.4
of My
Notes
- Karp,
Vazirani, and
Vazirani, An optimal
algorithm for online
birpartite matching
ACM-STOC 1990
- Devanur,
Jain and Kleinberg,
Randomized primal-dual
analysis of ranking
for bipartite matching,
ACM-SIAM SODA
2013.
- Mehta,
Saberi, Vazirani and
Vazirani, AdWords and
generalized online
matching, Jl. ACM Vol.
54, 2007.
- Kalyanasundaram
and Pruhs, An optimal
deterministic
algorithm for online
b-matching,
Theoretical Computer
Science 233(1-2):
319-325, 2000.
- Mar. 23: Online
Matching - II
- Online
Bipartite
Matching -
Waterlevel
Algorithm
- Mar. 25: Online Matching - III
- Randomized
and Balance
Algorithm
- Mar. 30: Approximation
Algorithms-I
- Min
Cost st-cuts, Multiway
Mincuts, Multicuts in Graphs
- Garg,
Vazirani and Yannakakis,
Primal-dual approximation
algorithms for integral flow
and multicuts in trees,
Algorithmica 18(1): 3-20,
1997.
- Garg,
Vazirani and Yannakakis,
Multiway cuts in node weighted
graphs, Journal of
Algorithms 50(1): 49-61, 2004.
- Calinescu,
Karloff and Rabani,
Approximation algorithms for
the 0-extension problem,
ACM-SIAM SODA 2001.
- Fakcharoenphol,
Rao and Talwar, A tight bound
on approximating arbitrary
metrics by tree metrics, JCSS
69(3): 485-487, 2004.
- The
design of approximation
algorithms, Williamson and
Shmoys, Cambridge University
Press, 2011.
- Apr. 01: Approximation Algorithms-II
- Apr. 06: Presentations on
the Projects
- Apr. 08: Presentations on
the Projects
- FINAL EXAM: Scheduled by Exam
Services at Carleton
University Policies
We follow all the rules and regulations set by Carleton
University, Dean of Science, and the School of Computer Science
regarding accommodating students with any kind of need(s). Please
consult with the appropriate authorities to see how you can be
accommodated and please follow their procedures. For
information about Carleton's academic year, including registration
and withdrawal dates, see Carleton's Academic Calendar.
Following is a standard list of recommendations that we have been
advised to provide you with respect to whom to contact for the
appropriate action(s):
Pregnancy Obligation. Please contact your instructor with
any requests for academic accommodation during the first two weeks
of class, or as soon as possible after the need for accommodation is
known to exist. For more details, visit Equity Services.
Religious Obligation. Please contact your instructor with
any requests for academic accommodation during the first two weeks
of class, or as soon as possible after the need for accommodation is
known to exist. For more details, visit Equity Services.
Academic Accommodations for Students with Disabilities If you
have a documented disability requiring academic accommodations in
this course, please contact the Paul Menton Centre for Students with
Disabilities (PMC) at 613-520-6608 or pmc@carleton.ca for
a formal evaluation or contact your PMC coordinator to send your
instructor your Letter of Accommodation at the beginning of the
term. You must also contact the PMC no later than two weeks before
the first in-class scheduled test or exam requiring accommodation
(if applicable). After requesting accommodation from PMC, meet with
your instructor as soon as possible to ensure accommodation
arrangements are made. For more details, visit the Paul
Menton Centre website.
Survivors of Sexual Violence. As a community, Carleton
University is committed to maintaining a positive learning, working
and living environment where sexual violence will not be tolerated,
and survivors are supported through academic accommodations as per
Carleton's Sexual Violence Policy. For more information about the
services available at the university and to obtain information about
sexual violence and/or support, visit:
carleton.ca/sexual-violence-support.
Accommodation for Student Activities. Carleton University
recognizes the substantial benefits, both to the individual student
and for the university, that result from a student participating in
activities beyond the classroom experience. Reasonable accommodation
must be provided to students who compete or perform at the national
or international level. Please contact your instructor with any
requests for academic accommodation during the first two weeks of
class, or as soon as possible after the need for accommodation is
known to exist. For more details, see the policy.
Student Academic Integrity Policy. Every student should
be familiar with the Carleton University student academic integrity
policy. A student found in violation of academic integrity standards
may be awarded penalties which range from a reprimand to receiving a
grade of F in the course or even being expelled from the
program or University. Examples of punishable offences include:
plagiarism and unauthorized co-operation or collaboration.
Plagiarism. As defined by Senate, "plagiarism is
presenting, whether intentional or not, the ideas, expression of
ideas or work of others as one's own". Such reported offences will
be reviewed by the office of the Dean of Science.
Unauthorized Co-operation or Collaboration. Senate policy
states that "to ensure fairness and equity in assessment of term
work, students shall not co-operate or collaborate in the completion
of an academic assignment, in whole or in part, when the instructor
has indicated that the assignment is to be completed on an
individual basis". For this course, the following holds:
- Students are NOT allowed to collaborate on
assignments. Please avoid using search engines to look for
answers etc. Just a word of caution - in theoretically oriented
courses, it is important to come up with your own ideas for the
proof/an algorithm/a contradiction/etc. Sometimes these are like
logical puzzles - if somebody tells you a solution then they are
trivial and hard part is to come up with a solution. What we
want to learn is how to solve them ourselves.
- Past experience has shown conclusively that those who do not
put adequate effort into the assignments do not learn the
material and have a probability near 1 of doing poorly on the
exams/tests.
- Penalties for
academic offences can be found on the ODS webpage: https://science.carleton.ca/academic-integrity/.
Announcements:
- Link
to the 1st offering of COMP 5112 in Fall 2020
- All communications will be done via Brightspace. If you have a question that may benefit
the whole class, please post it in the forum or ask
during the class/office hour. (The discussion that are of
private nature can be done by sending me an e-mail.)
- I am planning to hold my office hours
immediately before and after the lectures.
- For uOttawa students: Please try to figure out
how to obtain the Brightspace account at Carleton. (I can't
help you with the administrative aspect - please contact
graduate and departmental administrators.)
- Learn (La)TeX as soon as possible - will help
you to write your thesis later!