COMP 5112/COMP4900G: Algorithms for Data Science (Fall 2023
Term)
Weekly Schedule
Instructor: Anil Maheshwari
Office: Herzberg Building 5125B
E-mail: anil@scs.carleton.ca
Lectures: Lectures on Tuesday 8:35 -
11:25 AM.
See public class schedule for the
location.
Office hours:
Mondays 08:35-09:55 AM
(HP 5125b). All general
announcements will be made during the class and/or via the
Brightspace system.
Teaching Assistant: Very unlikely
Course
objectives:
To learn some of the algorithmic
techniques to handle data science problems.
Topics may include:
- Data
Streaming
- Frequency
Moments
-
Dimensionality Reduction
- Online
Algorithms (including the role of Primal-Dual LPs in
their analysis)
- Finding
Similar Items using Locality-Sensitive Hashing
- Nearest
Neighbor Searching
- Approximation algorithms design techniques
- Algorithmic
Aspects of Social Networks (Graph Partitioning,
Searching various substructures)
- Randomized
Linear Algebraic techniques
- Polynomial
identity testing
These topics may be adjusted based on the background,
interests of the students, and the amount of lecture time
available.
Required Background:
We will cover a spectrum of techniques from the design and
analysis of algorithms. It is assumed that you have an excellent
grasp on:
- Analysis of algorithms (O-notation,
recurrences, and complexity analysis)
- Elementary probability theory including
expectation, indicator random variables, Variance, Markov's
Inequality (contents of COMP 2804)
- Basic data structures (lists, trees, hashing,
BST)
- Discrete mathematics (counting, permutations
and combinations, graph theory, proof techniques:
induction, contradiction, ..)
- Algorithmic techniques (divide and conquer,
greedy, dynamic programming, BFS, DFS, Connectivity, Shortest
Paths, and what is NP-Completeness)
- Linear Algebra (Eigenvalues/vectors,
Rank-Nullity, Vector Spaces, Norms).
Note that there will not be sufficient time to
review the background material to a satisfactory level during the
course. (In nutshell you must have a background that is equivalent
to the following Carleton Courses: COMP 1805, COMP 2402,
COMP 3804, and a course in Linear Algebra.)
Reference Material:
Useful references related to various topics. This
will get modified as we go along in the course.
- Locality Sensitive Hashing
- Online Bipartite
Matching
- Karp,
Vazirani, and Vazirani, An
optimal algorithm for online
birpartite matching ACM-STOC
1990
- Devanur,
Jain and Kleinberg, Randomized
primal-dual analysis of ranking
for bipartite matching, ACM-SIAM
SODA 2013.
- Mehta,
Saberi, Vazirani and Vazirani,
AdWords and generalized online
matching, Jl. ACM Vol. 54, 2007.
- Kalyanasundaram
and Pruhs, An optimal
deterministic algorithm for
online b-matching, Theoretical
Computer Science 233(1-2):
319-325, 2000.
- Chapter on Advertisement on the Web from the
MMDS Book.
- Eden, Feldman, Fiat, and Segal, An
Economics-Based Analysis of RANKING for Online Bipartite
Matching, arXiv, 2020.
- Echenique, Immorlica, and Vazirani, Online
and Matching-Based Market Design, Cambridge University
Press, 2023.
- See Section 11.1
- 11.4 of My
Notes
- Randomized Load Balancing & Perfect
Hashing
- http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec7.pdf
- Kleinberg&Tardos Algorithm Design Book,
Chapter 13.
- Polynomial Identity Testing
- DeMillo and
Lipton, A probabilistic remark on
algebraic program testing, Information
Processing Letters 7(4):193-195, 1978.
- Mulmuley,
Vazirani and Vazirani, Matching is as
easy as matrix inversion,
Combinatorica 7(1):105-113, 1987.
- Mitzenmacher
and Upfal, Probability and Computing,
Cambridge.
- Motwani and
Raghavan, Randomized Algorithm,
Cambridge.
- MWIS
- U. Feige and
D. Reichman, Recoverable values for
independent sets, Random Structures
and Algorithms 46(1): 142-159, 2015.
- Even more Approximation Algorithms
- Turning down the noise in blogosphere,
El-Arini, Veda, Shahaf, Guestrin, KDD 2009.
- An analysis of approximations for maximizing
submodular set function. Mathematical Programming 14,
265-294, 1978.
- Multiplicative-Weight Update Method
- Arora, Hazan and Kale, The multiplicative
weights update method: a meta-algorithm and
applications, Theory of Computing 8(1): 121-164, 2012.
- Chapter 11 of my notes.
- Locality-Sensitive Orderings
- Chan, Har-Peled, Jones, On
locality-sensitive orderings and their applications, SIAM
Jl. on Computing 49(3): 583-600, 2020.
- Dimensionality Reduction
- Matousek, Lectures on Discrete
Geometry, Volume 212 of Graduate Texts in
Mathematics. Springer, New York, 2002.
- Dubhashi and Panconesi,
Concentration of Measure for the Analysis of
Randomized Algorithms, Cambridge University
Press, 2009.
- Dasgupta, and Gupta, An elementary
proof of a theorem of Johnson and
Lindenstrauss" Random Structures &
Algorithms, 22 (1): 60-65, 2003.
- Johnson and Lindenstrauss,
Extensions of Lipschitz mappings into a
Hilbert space, Contemporary Mathematics
26:189-206, 1984.
- Chapter 12 of my notes
- Color Coding
- Alon, Yuster, and Zwick: Color Coding, Jl.
ACM 42(4): 844-856, 1995.
Grading Scheme: (Tentative)
|
COMP 4900 G |
COMP 5112
|
Assignments
|
25%
|
25%
|
Final Exam
|
25%
|
25%
|
Project
|
40%
|
50%
|
Class Participation
|
10%
|
|
There are four components:
- Final Exam:
A Final Exam
during the exam period. Final Exam is scheduled by the
exam services at Carleton and will take place during the
exam period in December.
- Assignments:
Few assignments during the course. Please only refer to
class notes and the reference material listed on
the web-page and/or during lectures for solving
assignment problems. Please do not collaborate.
Please cite all the references used for solving
each of the problems. All assignments need to be
submitted electronically using the brightspace
system.
- Class Participation: Attending the class, answering the questions, and
asking relevant questions.
- Project (COMP 5112): (Initial proposal
and presentation 10% + Final Report 20% + Final
Presentation 20%).
Outline is as follows:
- Pick a topic. (Look at references
under "Reference Material" and conferences
in related areas. Also, use Google Scholar to
see who
refers to those papers etc.) You may look for
papers/citations in recent proceedings of the
ACM-SIAM Symposium on Discrete Algorithms
conference and SIGKDD Conference for relevant
topics.
- Initial
Proposal: Submit one page
draft. What is the topic you
chose? Why? What problem(s) you will
look at? What you plan to do?
Outline of sections of your report?
Main References. Due (in pdf format)
by October 3.
Your 5
minute presentation is scheduled
on October 3 during the class
time slot.
- Final
Project Presentation: Scheduled
on December 05
during the class. (BTW, we may
have to schedule the presentations
outside the class time slot.) Presentation is for
approximately 20 minutes duration.
(Final Exam will have questions from these
reports/presentations).
- Project
Report: Due by December 05. The
report format will likely be a research article. Its
best to use LaTeX (e.g. see Overleaf). The sections will
include:
- Introduction
(Motivation, Problem Statement, Related Work,
Short Summary of what you did).
- Preliminaries
(In case you need to discuss some notation,
definitions, etc. as background)
- Main Section -
How did you solve the problem; State Algorithm;
State its Analysis; State its Correctness.
- Experiments (in
case you performed any simulation etc.)
- Conclusions
(Summary + What did you learn? + What do you
think can be done in future?)
- References
- Report
will be approximately 6 pages long
and will be posted on the course
web-page. Final Exam will have
some questions from these
reports.
- You may
use a double column
format - for example
the style file from
Canadian Conference
in Computational
Geometry Style File
from here:
http://vga.usask.ca/cccg2020/CCCG2020-tex-template.zip
- It will
also help the
community if you
update/create the
relevant Wikipedia
page relevant to
your project. You
will be suitably
rewarded with bonus
marks.
- Project (COMP 4900): (initial proposal and presentation 10% +
Final Report 30%)
Outline is as follows:
- Pick a topic. (Look at references
under "Reference Material" and conferences
in related areas. Also, use Google Scholar to
see who
refers to those papers etc.)
You may
look for papers/citations in
recent proceedings of the ACM-SIAM
Symposium on Discrete Algorithms
conference and SIGKDD Conference
for relevant topics.
- Initial
Proposal: Submit one page
draft. What is the topic you
chose? Why? What problem(s) you will
look at? What you plan to do?
Outline of sections of your report?
Main References. Due (in pdf format)
by October
3rd. Your 5
minute presentation is scheduled
on October 3 during the class
time slot.
- Project
Report: Due by December 5. The
report format will likely be a research article. Its
best to use LaTeX (e.g. see Overleaf). The sections will
include:
- Introduction
(Motivation, Problem Statement, Related Work,
Short Summary of what you did).
- Preliminaries
(In case you need to discuss some notation,
definitions, etc. as background)
- Main Section -
How did you solve the problem; State Algorithm;
State its Analysis; State its Correctness.
- Experiments (in
case you performed any simulation etc.)
- Conclusions
(Summary + What did you learn? + What do you
think can be done in future?)
- References
- Report
will be approximately 4 pages long
and will be posted on the course
web-page.
- You may
use a double column
format - for example
the style file from
Canadian Conference
in Computational
Geometry Style File
from here:
http://vga.usask.ca/cccg2020/CCCG2020-tex-template.zip
- It will
also help the
community if you
update/create the
relevant Wikipedia
page relevant to
your project. You
will be suitably
rewarded with bonus
marks.
Fall 2023 Term Schedule:
Sep 12: Introduction + MWU Method
(Deterministic Schemes) + LSH (Jaccard
Similarity and Signatures)
- MWU
- Arora, Hazan and Kale, The multiplicative
weights update method: a meta-algorithm and
applications, Theory of Computing 8(1): 121-164,
2012.
- Chapter 11.1 of my notes.
- LSH
Sep 19: MWU - Randomized
Schemes + LSH
- MWU: Chapter
11.2 of my notes
- LSH: Chapter 8.3 and 8.4 of my notes.
Sep 26: LPs using MWU + Sensitive Family
of LSH functions + MWIS
- LPs using MWU
- Sensitive Family
- MWIS
- U. Feige
and D. Reichman, Recoverable
values for independent sets,
Random Structures and Algorithms
46(1): 142-159, 2015.
- Section
14.1.2 of Notes
Oct 03: Short Presentations on Projects +
MWIS +
Approximation
Algorithms -
Local Search
(Weighted
Max-Cut)
- Summary on MWIS
- Local Search
- A. Gupta and K. Tangwongsam, Simpler
Analyses of Local Search Algorithms for Facility
Location (arXiv
Paper)
- Arya et al., Local search heuristics
for k-median and facility location problems,
SIAM Jl. Computing 33(3): 544-562, 2004.
- N.H. Mustafa and S. Ray, Improved
results on geometric hitting set problems,
Discrete and Computational Geometry 44:883-895,
2010
- R. Aschner, M.J. Katz, G. Morgenstern
and Y. Yuditsky, Approximation schemes for
covering and packing, WALCOM, Lecture Notes in
Computer Science 7748: 89-100, Springer, 2013.
- Chapter 14.2 in Notes
- Summary
on Local Search
Oct
10:
Approximation
via Local
Search
(k-median) +
Max k-coverage
Oct
17: Locality
Sensitive
Orderings +
Dimensionality
Reduction
- Chan, Har-Peled, Jones, On
locality-sensitive orderings and their
applications, SIAM Jl. on Computing 49(3):
583-600, 2020.
- Quadtrees
- DFS Traversal
- Permutations - Orderings
- Dimensionality Reduction
- Matousek, Lectures on Discrete Geometry,
Volume 212 of Graduate Texts in Mathematics.
Springer, New York, 2002.
- Dubhashi and Panconesi, Concentration of
Measure for the Analysis of Randomized Algorithms,
Cambridge University Press, 2009.
- Dasgupta, and Gupta, An elementary proof
of a theorem of Johnson and Lindenstrauss" Random
Structures & Algorithms, 22 (1): 60-65, 2003.
- Johnson and Lindenstrauss, Extensions of
Lipschitz mappings into a Hilbert space,
Contemporary Mathematics 26:189-206, 1984.
- Chapter 12 of my notes.
Oct
31: Locality
Sensitive
Orderings +
Dimensionality
Reduction
Nov
07:
Dimensionality
Reduction and
Online
Bipartite
Matching
Nov
14:
Dimensionality
Reduction +
Online
Matching
Nov
21: Waterlevel
Algorithm for
Fractional
Matching + LP
rounding
Algorithms
Nov
28: Approximation
Algorithms
(using LP)
- Min Cost st-cuts,
Multiway Mincuts, Multicuts in
Graphs
- Garg, Vazirani and Yannakakis,
Primal-dual approximation algorithms for
integral flow and multicuts in trees,
Algorithmica 18(1): 3-20, 1997.
- Garg, Vazirani and Yannakakis,
Multiway cuts in node weighted graphs,
Journal of Algorithms 50(1): 49-61, 2004.
- Calinescu, Karloff and Rabani,
Approximation algorithms for the 0-extension
problem, ACM-SIAM SODA 2001.
- Fakcharoenphol, Rao and Talwar, A
tight bound on approximating arbitrary metrics
by tree metrics, JCSS 69(3): 485-487, 2004
- The design of approximation
algorithms, Williamson and Shmoys, Cambridge
University Press, 2011.
- Chapter 14.3 of Notes
- Summary
on Approximation Algorithms using Metric LPs