COMP 5112/COMP4900G: Algorithms for Data Science (Fall 2023 Term)

Weekly Schedule      


Instructor: Anil Maheshwari
Office: Herzberg Building 5125B
E-mail: anil@scs.carleton.ca


Lectures: Lectures on Tuesday 8:35 - 11:25 AM. See public class schedule for the location.
Office hours  Mondays 08:35-09:55 AM (HP 5125b). All general announcements will be made during the class and/or via the Brightspace system.


Teaching Assistant: Very unlikely

Course objectives:   

To learn some of the algorithmic techniques to handle data science problems.
Topics may include:

These topics may be adjusted based on the background, interests of the students, and the amount of lecture time available.


Required Background:
 

We will cover a spectrum of techniques from the design and analysis of algorithms. It is assumed that you have an excellent grasp on:
Note that there will not be sufficient time to review the background material to a satisfactory level during the course. (In nutshell you must have a background that is equivalent to the following Carleton Courses:  COMP 1805, COMP 2402, COMP 3804, and a course in Linear Algebra.)

Reference Material:

Useful references related to various topics. This will get modified as we go along in the course.


Grading Scheme: (Tentative)


COMP 4900 G COMP 5112
Assignments
25%
25%
Final Exam
25%
25%
Project
40%
50%
Class Participation
10%


There are four components
:

Outline is as follows:

  1. Pick a topic. (Look at references under "Reference Material"  and conferences in related areas. Also, use Google Scholar to see who refers to those papers etc.) You may look for papers/citations in recent proceedings of the ACM-SIAM Symposium on Discrete Algorithms conference and SIGKDD Conference for relevant topics.
  2. Initial Proposal: Submit one page draft.  What is the topic you chose? Why? What problem(s) you will look at? What you plan to do? Outline of sections of your report? Main References. Due (in pdf format) by October 3. Your 5 minute presentation is scheduled on October 3 during the class time slot.
  3. Final Project Presentation: Scheduled on December 05 during the class.  (BTW, we may have to schedule the presentations outside the class time slot.) Presentation is for approximately 20 minutes duration.  (Final Exam will have questions from these reports/presentations).
  4. Project Report: Due by December 05. The report format will likely be a research article. Its best to use LaTeX (e.g. see Overleaf). The sections will include:
      1. Introduction (Motivation, Problem Statement, Related Work, Short Summary of what you did).
      2. Preliminaries (In case you need to discuss some notation, definitions, etc. as background)
      3. Main Section - How did you solve the problem; State Algorithm; State its Analysis; State its Correctness.
      4. Experiments (in case you performed any simulation etc.)
      5. Conclusions  (Summary + What did you learn? + What do you think can be done in future?)
      6. References
      7. Report will be approximately 6 pages long and will be posted on the course web-page. Final Exam will have some questions from these reports.  
      8. You may use a double column format - for example the style file from Canadian Conference in Computational Geometry Style File from here:  http://vga.usask.ca/cccg2020/CCCG2020-tex-template.zip
      9. It will also help the community if you update/create the relevant Wikipedia page relevant to your project. You will be suitably rewarded with bonus marks. 

Outline is as follows:

  1. Pick a topic. (Look at references under "Reference Material"  and conferences in related areas. Also, use Google Scholar to see who refers to those papers etc.)
    You may look for papers/citations in recent proceedings of the ACM-SIAM Symposium on Discrete Algorithms conference and SIGKDD Conference for relevant topics.
  2. Initial Proposal: Submit one page draft.  What is the topic you chose? Why? What problem(s) you will look at? What you plan to do? Outline of sections of your report? Main References. Due (in pdf format) by October 3rd. Your 5 minute presentation is scheduled on October 3 during the class time slot.
  3. Project Report: Due by December 5. The report format will likely be a research article. Its best to use LaTeX (e.g. see Overleaf). The sections will include:
      1. Introduction (Motivation, Problem Statement, Related Work, Short Summary of what you did).
      2. Preliminaries (In case you need to discuss some notation, definitions, etc. as background)
      3. Main Section - How did you solve the problem; State Algorithm; State its Analysis; State its Correctness.
      4. Experiments (in case you performed any simulation etc.)
      5. Conclusions  (Summary + What did you learn? + What do you think can be done in future?)
      6. References
      7. Report will be approximately 4 pages long and will be posted on the course web-page.
      8. You may use a double column format - for example the style file from Canadian Conference in Computational Geometry Style File from here:  http://vga.usask.ca/cccg2020/CCCG2020-tex-template.zip
      9. It will also help the community if you update/create the relevant Wikipedia page relevant to your project. You will be suitably rewarded with bonus marks.



Fall 2023 Term Schedule:

Sep 12: Introduction + MWU Method (Deterministic Schemes) + LSH (Jaccard Similarity and Signatures)

Sep 19: MWU - Randomized Schemes + LSH

Sep 26: LPs using MWU + Sensitive Family of LSH functions + MWIS

Oct 03:
Short Presentations on Projects + MWIS + Approximation Algorithms - Local Search (Weighted Max-Cut)

Oct 10: Approximation via Local Search (k-median) + Max k-coverage
Oct 17: Locality Sensitive Orderings + Dimensionality Reduction

Oct 31: 
Locality Sensitive Orderings + Dimensionality Reduction
 
Nov 07: Dimensionality Reduction and Online Bipartite Matching

Nov 14: Dimensionality Reduction + Online Matching 
Nov 21: Waterlevel Algorithm for Fractional Matching + LP rounding Algorithms

Nov 28:
Approximation Algorithms (using LP)