COMP 5112/COMP4900G: Algorithms for Data Science (Fall 2024
Term)
Weekly Schedule
Grading
Scheme
Instructor: Anil Maheshwari
Office: Herzberg Building 5125B
E-mail: anil@scs.carleton.ca
Lectures: Lectures on Mondays and
Wednesdays 10:05 - 11:25 AM.
See public class schedule/Brightspace
for the room location in Tory.
Office hours:
Mondays 08:30-09:45 AM
(HP 5125b). All general
announcements will be made during the class and/or via the
Brightspace system.
Teaching Assistant: Very unlikely
Course
objectives:
To learn some of the algorithmic
techniques to handle data science problems. Emphasis is on
providing correctness proofs, establishing competitive
ratios, and analyzing computational complexity for each of
the algorithms discussed during the course.
Topics may include:
- Approximation algorithms design
techniques
- Dimensionality
Reduction
- Online
Algorithms (including the role of Primal-Dual LPs in
their analysis)
- Finding
Similar Items using Locality-Sensitive Hashing
- Nearest
Neighbor Searching
- Clustering
- FPT
Algorithmic Design Techniques
- Algorithmic
Aspects of Social Networks (Graph Partitioning,
Searching various substructures)
These topics may be adjusted based on the background,
interests of the students, and the amount of lecture time
available.
Required Background:
We will cover a spectrum of techniques from the design and
analysis of algorithms. It is assumed that you have an excellent
grasp on:
- Analysis of algorithms (O-notation,
recurrences, and complexity analysis)
- Elementary probability theory including
expectation, indicator random variables, Variance, Markov's
Inequality (contents of COMP 2804)
- Basic data structures (lists, trees, hashing,
BST)
- Discrete mathematics (counting, permutations
and combinations, graph theory, proof techniques:
induction, contradiction, ..)
- Algorithmic techniques (divide and conquer,
greedy, dynamic programming, BFS, DFS, Connectivity, Shortest
Paths, and what is NP-Completeness)
- Linear Algebra (Eigenvalues/vectors,
Rank-Nullity, Vector Spaces, Norms).
Note that there will not be sufficient time to
review the background material to a satisfactory level during the
course. (In nutshell you must have a background that is equivalent
to the following Carleton Courses: COMP 1805, COMP 2402,
COMP 3804, and a course in Linear Algebra.)
Reference Material:
Useful references related to various topics. This
will get modified as we go along in the course.
- Locality Sensitive Hashing
- Online Bipartite
Matching
- Karp,
Vazirani, and Vazirani, An
optimal algorithm for online
birpartite matching ACM-STOC
1990
- Devanur,
Jain and Kleinberg, Randomized
primal-dual analysis of ranking
for bipartite matching, ACM-SIAM
SODA 2013.
- Mehta,
Saberi, Vazirani and Vazirani,
AdWords and generalized online
matching, Jl. ACM Vol. 54, 2007.
- Kalyanasundaram
and Pruhs, An optimal
deterministic algorithm for
online b-matching, Theoretical
Computer Science 233(1-2):
319-325, 2000.
- Chapter on Advertisement on the Web from the
MMDS Book.
- Eden, Feldman, Fiat, and Segal, An
Economics-Based Analysis of RANKING for Online Bipartite
Matching, arXiv, 2020.
- Echenique, Immorlica, and Vazirani, Online
and Matching-Based Market Design, Cambridge University
Press, 2023.
- See Section 11.1
- 11.4 of Notes
- Randomized Load Balancing & Perfect
Hashing
- http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec7.pdf
- Kleinberg&Tardos Algorithm Design Book,
Chapter 13.
- Polynomial Identity Testing
- DeMillo and
Lipton, A probabilistic remark on
algebraic program testing, Information
Processing Letters 7(4):193-195, 1978.
- Mulmuley,
Vazirani and Vazirani, Matching is as
easy as matrix inversion,
Combinatorica 7(1):105-113, 1987.
- Mitzenmacher
and Upfal, Probability and Computing,
Cambridge.
- Motwani and
Raghavan, Randomized Algorithm,
Cambridge.
- MWIS
- U. Feige and
D. Reichman, Recoverable values for
independent sets, Random Structures
and Algorithms 46(1): 142-159, 2015.
- Even more Approximation Algorithms
- Turning down the noise in blogosphere,
El-Arini, Veda, Shahaf, Guestrin, KDD 2009.
- An analysis of approximations for maximizing
submodular set function. Mathematical Programming 14,
265-294, 1978.
- Multiplicative-Weight Update Method
- Arora, Hazan and Kale, The multiplicative
weights update method: a meta-algorithm and
applications, Theory of Computing 8(1): 121-164, 2012.
- Chapter 11 of Notes.
- Locality-Sensitive Orderings
- Chan, Har-Peled, Jones, On
locality-sensitive orderings and their applications, SIAM
Jl. on Computing 49(3): 583-600, 2020.
- Dimensionality Reduction
- Matousek, Lectures on Discrete
Geometry, Volume 212 of Graduate Texts in
Mathematics. Springer, New York, 2002.
- Dubhashi and Panconesi,
Concentration of Measure for the Analysis of
Randomized Algorithms, Cambridge University
Press, 2009.
- Dasgupta, and Gupta, An elementary
proof of a theorem of Johnson and
Lindenstrauss" Random Structures &
Algorithms, 22 (1): 60-65, 2003.
- Johnson and Lindenstrauss,
Extensions of Lipschitz mappings into a
Hilbert space, Contemporary Mathematics
26:189-206, 1984.
- Chapter 12 of Notes
- Color Coding
- Alon, Yuster, and Zwick: Color Coding, Jl.
ACM 42(4): 844-856, 1995.
- Clustering
- Arthur and Vassilvitskii, k++-Means:
the advantages of careful seeding, ACM-SIAM SODA 2017.
- Articles worth pursuing for projects in
addition to above references
- E. Lee, Partitioning a graph into small
pieces with applications to path transversal, SODA 2017.
- J. Matousek, Chapter 10: Transversals and
Epsilon Nets in Lectures in Discrete Geometry, Springer,
2002.
- S. Har-Peled, Chapter 5, Geometric
Approximation Algorithms, AMS 2011
- S. Har-Peled, Chapter 15,
Geometric Approximation Algorithms, AMS 2011
- Agarwal, Har-Peled, R. Raychaudhry and S.
Sintos, Fast approximation algorithms for piercing boxes by
points, SODA 2024.
- Articles on bounded treewidth graphs
- A. Eden, M.Feldman, A. Fiat and K.
Segal, An Economics-Based
Analysis of RANKING for Online Bipartite Matching, SOSA
2021.
- N. Alon, M. Yossi and M. Szegedy, The space
complexity of approximating frequency moments, Jl Comp.
System Science 58(1), 1999. Also in ACM STOC 1996.
- R. Moser and G. Tardos, Constructive Proof
of Lovasz Local Lemma, Jl. ACM 57(2), 2010.
- G. Bodwin, An alternate proof of
near-optimal light spanners, SOSA 2024.
- A. Gupta. E. Lee and J. Li, Local search
based approach for set covering, SOSA 2023.
- J. Fakcharoenphol, S. Rao, and K. Talwar, A
tight bound on approximating arbitrarily metrics by tree
metrics, JCSS 69(3): 485-497, 2004.
-
R. Kupfer and N. Nisan, Finding a hidden edge, arXiv 2022.
- Basu et al., A sublinear algorithm for
approximate shortest paths in large networks, WSDM 2025.
-
Grading
Scheme: (Tentative)
There are three components:
- Final Exam
(20%, Scheduled by Exam Services): Final Exam will
encompass the course material as well as the key ideas
from the seminars presented by students.
- In Class Test
(15%):
In Class
test will cover all the lecture material
before the Fall break. This will be held
in the class on October
27th.
- Project (COMP 5112): (Written
project proposal (10%), 3-Minute Proposal Presentation
5% + Final Report 25% + Final Presentation 25%).
Outline is as follows:
- Pick a topic. (Look at references
under "Reference Material" and conferences
in related areas. Also, use Google Scholar to
see who
refers to those papers etc.) You may look for
papers/citations in recent proceedings of the
ACM-SIAM Symposium on Discrete Algorithms
conference and SIGKDD Conference for relevant
topics.
- Initial
Proposal: Submit one page
draft. What is the topic you
chose? Why? What problem(s) you will
look at? What you plan to do?
Outline of sections of your report?
Main References. Due (in pdf format)
by September
29. Your
3-minute presentation is
scheduled on October 1 during the class
time slot.
- Final
Project Presentation:
Scheduled in last 3-4 weeks of the
class. Presentation is for
approximately 15 minutes duration.
- Project
Report: Due couple of
days before your presentation.
The report format will
likely be a research article. Its best to use LaTeX
(e.g. see Overleaf). The sections will include:
- Introduction
(Motivation, Problem Statement, Related Work,
Short Summary of what you did).
- Preliminaries
(In case you need to discuss some notation,
definitions, etc. as background)
- Main Section -
How did you solve the problem; State Algorithm;
State its Analysis; State its Correctness.
- Experiments (in
case you performed any simulation etc.)
- Conclusions
(Summary + What did you learn? + What do you
think can be done in future?)
- References
- Report
will be approximately 6 pages long
and will be posted on the course
web-page. Final Exam will have
some questions from these
reports.
- You may
use a double column
format - for example
the style file from
Canadian Conference
in Computational
Geometry Style File
from here:
http://vga.usask.ca/cccg2020/CCCG2020-tex-template.zip
- It will
also help the
community if you
update/create the
relevant Wikipedia
page relevant to
your project. You
will be suitably
rewarded with bonus
marks.
- Project (COMP 4900): Same as above, except that the final project report
is of 4 pages and the final presentation is for about
12 minutes.
Fall 2025 Schedule
Sep
03: Course
Logistics +
MWU Method
- Arora, Hazan and Kale, The multiplicative
weights update method: a meta-algorithm and
applications, Theory of Computing 8(1): 121-164,
2012.
- Chapter 11 in Notes.
- Summary
on MWU (with extra material)
Sept
08: MWU
(Cont.)
Randomized
Schemes
Sept 10:
Sept
15:
Sept 17:
Sept
22:
Sept 24:
Sept
29: Project Proposal Due
Oct 01: 3-Minute Proposal Presentation
Oct
06:
Oct
08:
Oct
15:
Oct
27: In Class Test
Oct
29:
Nov
03:
Nov
05:
Nov
10:
Nov
12: