Algorithms for Data Science

Weekly Schedule


Instructor: Anil Maheshwari
Office: Herzberg Building 5125b (Online for time being)
E-mail: anil@scs.carleton.ca

Lectures: Online lectures twice a week. Lectures on Tuesday and Thursday at 6:30 AM & 7:30 AM EST (Eastern/Daylight Savings Standard Time).
Offered for the first time in the online format at Ramakrishna Mission Vivekananda Education and Research Institute (Belur) from February-May 2021, live from Ottawa.
Topics overlap with COMP 3801 and COMP 5112 courses at Carleton.

Office hours: During the breaks and immediately after the online lectures.         

Course objectives: 

To learn some of the algorithmic techniques to handle data science problems.
Topics may include:

Required Background:
 

We will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have a very good grasp on:

Reference Material:

Useful References related to various topics. This will get modified as we go along the course.



Weekly Schedule:

Lecture # & Date
Slides
Video Recording/Slides
References/Comments
1 (Feb 9th)
Course Introduction

Count-Min Sketch

L1
Topics Covered:
- Introduction
- Majority Element
- Heavy Hitters
- CMS Algorithm


Count-Min Sketch Article

See also Section 10.1 in My Notes
2 (Feb 16)
Count-Min Sketch
L2
CMS Table, Construction + Proofs
Heaps
Markov's inequality for discrete random variables
 
3 (Feb 18)
Count-Min Sketch

Probability Basics


L3-(CMS)

CMS-Scribbled Slides

L3-(Probability Basics)


Probability Basics
4 (Feb 23)
Balls and Bins

Bloom Filter




L4-Balls and Bins

L4-Bloom Filters


See also Section 10.2 in My Notes



5 (Feb 25)
Bloom Filters

Hashing
L5-Bloom Filters

L5-Hashing


Chapter in CLRS Algorithms Book.
6 (Mar 2)
Universal Hashing


L6-Universal Hashing
Chapter in CLRS Algorithms Book.
7 (Mar 4)
Perfect Hashing

Frequency-Moments
L7-Perfect Hashing

L7-F_0 Estimation
Chapter in CLRS Algorithms Book.

See Chapter 10 in my notes.
8 (Mar 9)
Frequency-Moments L8-F0/F2-Estimation

9 (Mar 16 )
Analysis of Frequency Moment F_2

DGIM Algorithm
F2-estimate


Counting 1s
See Chapter 10 in my notes.


See Chapter 10 in my notes.
10 (Mar 23)
DGIM Algorithm

Polynomial Testing
Counting-1s and Variants

Checking Bit Strings



11 (Mar 25)
Polynomial Testing
Multivariate Polynomials, Determinants, and Perfect Matchings
  • Chapter 7 - Motwani and Raghavan Randomized Algorithms Book
  • Chapter 1.3 - Mitzenmacher and Upfal Probability and Computing Book
  • Mulmuley, Vazirani and Vazirani - Matching is as easy as matrix inversion, Combinatorica 7(1): 105-113, 1987.
  • DeMillo and Lipton, A probabilistic remark on algebraic program testing, IPL 7(4):193-195, 1978
12 (Mar 30)
Finding Perfect Matching

Geometric Approximation:
Approximate Nearest Neighbor

Isolation Lemma + Matching


Approx-Nearest-Neighbor



Chan, Har-Peled and Jones, Locality-Sensitive Orderings, SIAM Jl. Computing 49(3):583-600, 2020
 13 (Apr 1)
Geometric Approximation:
Approximate Nearest Neighbor
Quadtrees, Orderings, ANN

14 (Apr 13)
Approximate Nearest Neighbors
Orderings + Applications

14 (Apr 13)

Locality-Sensitive Hashing


Documents, Minhash Signatures
STOC98 paper of Indyk-Motwani on "Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality.

Chapter on LSH in My Notes on Topics in Algorithm Design
15 (Apr 15)
Signature Matrix, S-Curve, Sensitive Family, Applications

Fingerprint Matching
Banding Technique, Metric Spaces and Sensitive Families
Chapter on LSH in My Notes on Topics in Algorithm Design
16 (Apr 20)
AND-OR Sensitive Family
Fingerprinting

Dimensionality Reductions
- Isometric Embeddings
-

AND-OR Family + Fingerprinting


Isometric Embedding



Chapter on Dimensionality Reduction in My Notes on Topics in Algorithm Design
17(Apr 22)
Dimensionality Reductions
- Universality of L_\infty
- Distortion
- Embeddings Metric Spaces in L_\infty Normed Spaces
- J-L. Lemma

- Universality of L_\infty
- Distortion
- Embeddings Metric Spaces in L_\infty Normed Spaces

18 (Apr 27)
- Embeddings Metric Spaces in L_\infty Normed Spaces

- J-L. Lemma
Embeddings Metric Spaces in L_\infty Normed Spaces

Normal Distribution and J-L Theorem



Chapter on Dimensionality Reduction in My Notes on Topics in Algorithm Design
19 (Apr 29)
Proof of J-L Theorem

Introduction to Online Matching and Adwords Problem
Proof of J-L Theorem

Introduction to Online Algorithms


Chapter on Online Algorithms in My Notes on Topics in Algorithm Design
20 (May 4)
Online Bipartite Matching
- Analysis via Primal-Dual


Analysis of Greedy Online Bipartite Matching via Primal-Dual
Matching LP


Chapter on Online Algorithms in My Notes on Topics in Algorithm Design

21 (May 6)
Online Matching
- Fractional Matching
- Randomized Matching

Fractional Matching - Primal Dual Analysis
-Karp, Vazirani and Vazirani, An optimal algorithm for online bipartite matching, STOC 1990

- Devanur, Jain and Kleinberg, Randomized Primal-Dual Analysis of RANKING for Online Bipartite Matching, SODA 2013

Chapter on Online Algorithms in My Notes on Topics in Algorithm Design
22 (May 11)
- Balance Algorithm
- Adwords




Introduction to MWU

Analysis of Balance Algorithm





Predicting Stock Market Index
- Mehta, Saberi, Vazirani and Vazirani, AdWords and generalized online matching, JACM 54(6), 2007

- Kalyanasundaram and Pruhs, An optimal deterministic algorithm for b-matching,  TCS 233: 319-325, 2000

Chapter on Online Algorithms in My Notes on Topics in Algorithm Design
23 (May 13)
Proof of Balance Algorithm

Multiplicative Weight Update Method
- Regret Minimization
- Application

Proof of Balance via Complementary Slackness

Deterministic Methods for MWU
Chapter on Online Algorithms in My Notes on Topics in Algorithm Design
Arora, Hazan, and Kale, Multiplicative-Weight Update Method: A Meta-Algorithm and Applications, Theory of Computing 6:121-164, 2012.

24 (May 18)
Multiplicative Weight Update Method - Randomization
Randomized MWU Methods
Chapter on Online Algorithms in My Notes on Topics in Algorithm Design
25 (May 20)
k-means Clustering Algorithms Clustering - k-Means& k++-Means



Introduction to Recommender System
David Arthur and Sergei  Vassilvitskii, k-Means++: the advantage of careful seeding, SODA 2007
https://cseweb.ucsd.edu/~dasgupta/291-geom/kmeans.pdf

mmds.org chapter on Dimensionality Reduction
26 (May 25)


Recommender System- Applications of SVD
Completing Proof of k++-Means

SVD and its Applications to Recommender System


See My Notes - Chapter on Matrices for CS for SVD.
27 (May 27)
SVDs

Course Summary
SVD+Comments on CUR

Topics Not Covered
Maximum Weight Independent Set
(Recoverable Values)


FPT - Vertex Cover

Page Rank

Graph Partitioning using Laplacian

Min-Cost Bipartite Matching

Graph Algorithms on Social Networks

Randomized Linear Algebra

Feige and Reichman, Recoverable Values for Independent Sets, Random Structures and Algorithms 46(1): 142-159, 2015.