Overview

Welcome to the web page of COMP 5118 - Trends in Big Data Management. This is a grad-level course for MSC and PhD students in Carleton University and the University of Ottawa. Each year we focus on some research areas in the general field of data management. These research areas change from one term to another based on how hot these research areas are. This term, we focus on the following topics: Question Answering, Data Mining, Data Cleaning, Data Integration, Graph Processing, and Blockchain. Check the schedule below to see the papers we review this term. Most of the papers we will be covering during the term are published in top-tier conferences, and are very recent. This should give us a chance to know what the research community of data management is currently working on. Psst, this will also (hopefully) give you ideas for the course project, which you should take very seriously.

Contact Information

Herzberg Laboratories 5433
1125 Colonel By Dr
Ottawa, Ontario K1S 5B6

613-520-2600 ext. 4254
myFirstName.myLastNameWithoutHyphen@carleton.ca

There's also this anonymous feedback form, in which you can swear at me. But during the swearing spree, please give me some constructive feedback.

Grading

In this course, students will be reading and reviewing papers for each class. During the class, some students will be presenting the papers for the week, they and the rest of the class (including me) will be discussing these papers and our take on them. There is also a term-long project, which is worth the biggest chunk of your grade. Following is the marks breakdown:

  • Project 45%
  • Presentations 25%
  • Paper Reviews 15%
  • Class Participation 15%

Project

The research project could be any of the following:

  1. New research idea: A prototype implementation of a new research idea that addresses one of the drawbacks or limitations of an existing research work, or a completely new research idea that is inspired by any of your readings.
  2. Experimental Study: An experimental comparison and evaluation of existing work in a specific research topic. Students are not supposed to reimplement all of the existing solution. Rather, they should be able to reuse an existing code base with minor changes to run the benchmark. The main contribution in the benchmark is to give insights that did not exist in the systems used in the evaluation.
  3. Survey: With the extensive research efforts in the topics covered in this course, a survey paper should summarize and categorize the major research contributions in a specific area. The survey should not be a mere summarization of existing papers, rather, the students should provide their own insight on the surveyed body of work. For example, they can provide a categorization or a taxonomy that highlights that major research directions in that area. Students can also identify the open research problems that were hardly addressed in the literature.
  4. System Implementation and Reproducibility (must be individual project): I have a number of systems I would like implemented. Your project could be choosing any of them, and implement and reproduce the results reported.

The project can be done individually or in groups (except the system implementation). However, the assessment will take into consideration how many students are in the group. E.g., if one student demonstrates contributions in her/his project that is equal to the contributions for a team of three students, students should expect a high variance in grades.

The project deliverables will be:

  1. Project Proposal: This should be a two-page proposal (including references) in ACM Proceedings Format. To write a good proposal, I strongly suggest reading Jennifer Widom's tips for writing introductions. I also strongly suggest reading the whole thing as it's helpful for writing research papers in general. This proposal is due on February 22nd. If you have a solid idea that you would like to submit before the deadline to get better feedback and give yourself more time to work on the project, early submissions of the proposal are STRONGLY encouraged.
  2. Project Paper: Again, in ACM Proceedings Format. This should be at least 7 pages including references. Depending on the size of the group and contributions, the paper could be longer. So, there is no page-limit. Due date for the project paper is April 9th (11:59 PM). Late submissions are allowed for two more weeks with a hard deadline for submission on April 23rd.
  3. Source Code: Your source code is expected to be publically available on github. The github link for your project should be in the project paper. Please write a good README that clearly describe how to run your code. Due date for the project source code is the same as for the project paper.

Presentations

There will be 21 presentation throughout the term. This workload may not be evenly distributed over the students doing this class. Therefore, the student who presents one more presentation than average will get a bonus. Each presentation should be 30 to 35 minutes long, followed by a 30 to 35 minutes of discussion of the paper. The presenter should not only present the details of the paper, but also suggest the discussion points at the end of his/her presentation.

Paper Reviews

The paper reviews are due at 11:00 AM on the day of the class. The format for the review is fixed: Summary of the paper, three or more strong points, three or more weak points, and any additional comments you may have on this paper. The number of fields required is small, but you are expected to be elaborative. Theoretically, if your review is written in a Word document, it should be at least one page long in 12 pt. Your two worst reviews will not count towards your grade.

Paper Review Submission Link

Class Participation

This is a seminar-based class, meaning that your participation in the class is essential. You are encouraged to ask questions, answer other students questions, give comments over the papers we discuss, etc.

Schedule

Date Topics Papers Speakers
January 8th Course Introduction & Recent Game Changers in Data Managament N/A Ahmed El-Roby
January 15th Question Answersing
  1. Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition
  2. Learning to Answer Complex Questions over Knowledge Bases with Query Composition
1. Ritika Bhatia
2. Razieh Tekieh
January 22nd Question Answering
Data Mining
  1. Leveraging Frequent Query Substructures to Generate Formal Queries for Complex Question Answering
  2. Mining an "Anti-Knowledge Base" from Wikipedia Updates with Applications to Fact Checking and Beyond
1. Yingjun Dai
2. Raghad Rowaida
January 29th Data Mining
  1. Maverick: Discovering Exceptional Facts from Knowledge Graphs
  2. Fractal: A General-Purpose Graph Patern Mining System

1. Emmanuel Ayeleso
2. Abdelghny Orogat
February 26th Data Cleaning
  1. Auto-Detect Data-Driven Error Detection in Tables
  2. HoloDetect: Few-Shot Learning for Error Detection
  3. Uni-Detect: A Unified Approach to Automated Error Detection in Tables
1. Patrick Killeen
2. Segun Odunade
3. Isabelle Liu
March 4th Data Integration
  1. Raha: A Configuration-Free Error Detection System
  2. Table Union Search on Open Data
  3. JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes
1. Anusha Umesh
2. Alex Gagnon
3. Razieh Tekieh
March 11th Data Integration
Graph Processing
Blockchain
  1. Ontology-based Entity Matching in Attributed Graphs
  2. The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing
  3. Blockchain: The story so far
1. Emmanuel Ayeleso
2. Taslimur Rahman
3. Alex Gagnon
March 18th Graph Processing
Blockchain
  1. Scaling Up Subgraph Query Processing with Efficient Subgraph Matching
  2. Efficiently Answering Regular Simple Path Queries onLarge Labeled Networks
  3. Blurring the Lines between Blockchains and Database Systems: the Case of Hyperledger Fabric
1. Raghad Rowaida
2. Fathima Nizwana Yusuf
3. Ziaullah Dawrankhil
March 25th Blockchain
  1. Blockchain Meets Database: Design and Implementationof a Blockchain Relational Database
  2. Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
  3. Towards Scaling Blockchain Systems via Sharding
1. Fathima Nizwana Yusuf
2. Mostafa Elkaterji
3. Wilfredo Tovar