Introduction - FSU Computer Science

Download Report

Transcript Introduction - FSU Computer Science

COP5725
Advanced Database Systems
Spring 2017
Introduction
Tallahassee, Florida, 2017
Welcome to COP5725!
• COP5725: Advanced Database Systems
– Course website: all you need to know about COP5725
http://www.cs.fsu.edu/~zhao/cop5725/main.html
– Time:
2pm--3:15pm Mondays and Wednesdays
– Venue:
LOV 103
• Please go over the syllabus carefully before taking the
class!
1
Welcome to COP5725!
• Instructor
– Prof. Peixiang Zhao http://www.cs.fsu.edu/~zhao
– Office hours:
• Monday, Wednesday: 3:30pm-4:30pm
• Or by appointment
– Office: LOV 262
– Research interest:
• Database, data mining, information/social network and graph analysis
• TA
– Dr. Yongjiang Liang
– Office hours: Thursday 1:30pm – 2:30pm
– Office: MCH 106-A
2
The Goal of COP5725!
1. Reflection of the foundation:
– Climb up to the shoulders
– the foundational models, representations, systems, and techniques
for relational database systems, by way of reading and lectures
2. Projection on the outlook:
– And look out from here! Be inspired
– what’s the next advanced database systems?
– by way of reading and presenting the classics and the state-of-theart, and by way of doing projects!
• “We can do it!”
3
The Contents of COP5725!
• Relational Database Internals
–
–
–
–
–
–
Fundamentals for relational databases
Data storage and representation
Advanced indexing
Query processing and execution
Query optimization
……
• Advanced Database Topics
–
–
–
–
Parallel/Distributed databases (MapReduce)
Data mining (selected topics)
Data on the Web
……
4
Welcome to COP5725!
• Textbook
– Database Systems: The Complete Book 2nd edition
– Hector Garcia-Molina, Jeff Ullman and Jennifer Widom
• Recommended reading
– Database Management Systems 3rd edition, by Raghu Ramakrishnan
and Johannes Gehrke
– Readings in Database Systems 5th edition, by P. Bailis J. Hellerstein
and M. Stonebraker (http://www.redbook.io)
– The Web
• Prerequisites
– COP4710: Introduction to Database Systems
– COP4530: Data Structures and Algorithms
– Good programming skills
5
Welcome to COP5725!
• Components of the course
1. Two lectures every week (?)
2. Two assignments (10%)
3. A series of papers to be read and summarized (15%)
•
One or two-page paper summary to be submitted during the
class on the due date
4. Paper presentation (5%)
•
Every group will present one paper related to the project in the
class for 15(?) minutes
5. Semester-long project (30%)
•
•
Research-flavor
Implementation-flavor
6. A set of quizzes (5%)
7. Final exam (35%)
6
Paper Summaries
• Milestone papers in database systems
• Every paper will be assigned early in the course website, and can
be downloaded within the campus network
• One to two pages summary includes
– What is the problem?
– Why is this problem important and worthy of a thorough study?
– Why is this problem difficult?
– What are the innovative ideas and technical merits?
– Comments on the experimental evaluations
– Any drawbacks and potential improvement?
• Summarize based on your own understanding. Verbatim copying
from the paper results in low scores
• Contents in the paper will be tested in the final exam!
7
Paper Presentation
• Every group will have a chance to select one paper to present in
the class
– The paper should be closely related to the project you are conducting
– The slides (pptx/ppt/pdf) should be sent to the instructor at least one day
prior to the class you will be presenting
– The slides organization should be similar to the requirement of the paper
summary
– 15(?) minutes presentation and Q&A
• Student will sign up for the presentation in the near future
8
Project
• Theme: choose either of the two
1. Research-flavor: mainly for Ph.D. students
•
find an interesting, nontrivial data management problem, propose a
novel and effective solution to it
2. Implementation-flavor: mainly for M.S. students
•
find interesting methods/algorithms in a data management paper,
implement it, and perform experimental studies
•
Teamwork: a group of one or two students (but no more!)
•
The project is partitioned into multiple milestones, each of
which requires deliverables
•
Pay attention to the workload!
9
Multi-stage Project
1. Group formation (0%)
2. Project Proposal (10%)
–
What I want to do?
3. Literature Survey (20%)
–
What are the state-of-the-art?
4. Status report (10%)
– What I have achieved thus far
5. Source code, software and final report (60%)
–
Dude, these are my deliverables!
10
Implementation Project
• Topics:
– Choose a research paper published in the following conferences/journals after
2002, implement the idea and finish all experimental studies related to this idea
– Conferences: SIGMOD, VLDB, ICDE, KDD, ICDM, SDM, SIGIR, WWW,
CIKM
– Journals: TODS, VLDB Journal, TKDD, TKDE
•
Workload (in C/C++ or Java)
–
•
3000-5000 lines of code; real/synthetic data, experimental studies
Expectation
–
Source code, software, detailed readmes and scripts, and a final report
•
Repeatability, Completeness of datasets and experimental studies, Efficiency,
Effectiveness, Scalability ……
•
You may demo your implementation to TA
11
Research Project
• Topics:
– A state-of-the-art data management, mining problem in your research area
• Workload
•
–
Problem definition, algorithm design and analysis, implementation (more
than 3000 lines of code, in C/C++ or Java), experimental studies
–
Your innovative ideas!
Expectation
–
A conference-quality (potential publishable) paper
–
Source code, software, detailed readmes and scripts
–
You may demo your implementation to TA
12
Quizzes
• The first quiz will be held on Wednesday 01/11
– Takes up 3% of your full credit!
– Coverage:
• Fundamentals in relational DB
• Data structures and algorithms
• Remaining quizzes will be held throughout the
semester
– Call for attendance
– Get feedbacks and suggestions from students
13
Is This Course Suitable For Me?
• First-day Attendance Policy at FSU
• Prerequisites MUST be satisfied
– Introduction to database systems
• Relational model, relational algebra, relational design, SQL, B/B+
tree, hashing, transaction management, crash recovery……
– Data structures and algorithms
•
•
•
•
•
Difference between stack and queue?
Worst-case complexity for insertion/deletion in Red-black trees?
Dijkstra algorithm for shortest-path computation
Set-cover is NP-complete
…….
• Feel comfortable in programing (a lot)
14
COP5725 =
How DB Knowledge is created + How to create more
• In terms of topics, COP5725 is not:
– about Linux + Apache + PHP + MySQL (LAMP)
– about designing DBs that are in BCNF
– about SQL3 and stored procedures
– about Oracle tuning and implementation
• In terms of methodology, COP5725 is not solely
– by reading textbook and acing it
– by implementing a well-specified DB algorithm, e.g., B+tree
15
How to Get the Most out of COP5725?
• Read and think before class
– read the textbooks for related concepts
– read the papers
• Use lectures as road map for studying
– Lecture notes won’t cover all the material
• Use your peers in learning
– discuss in/out of classes to enhance understanding
• Explore interesting projects creatively
– learning by doing
16
Any questions so far?
17
Evolution of Data Management
• Jim Gray: Evolution of Data Management. IEEE
Computer 29(10): 38-46 (1996)
18
Prehistory Thoughts: Emergence of the Notion of DBMS
• William C. McGee: Generalization: Key to Successful
Electronic Data Processing. J. ACM 6(1): 1-23 (1959)
• When data processing was mostly ad-hoc programs --Need generalization, e.g.,
– sorting
– file maintenance
– data access
– modification and update
– report generation
– ……
19
How Did We Get Here?
• The dominating relational database system, which
we take for granted now, was deemed impossible to
implement and difficult to use in its early days
• But-- Quoting Jim Gray:
These innovations give one of the best examples of research prototypes
turning into products. The relational model, parallel database systems, active
databases, and object-relational databases all came from the academic and
industrial research labs. The development of database technology has
been a textbook case of successful collaboration between academy
and industry.
-- Evolution of Data Management
20
Examples
21
In Industry
22
In Science – Turing Awardees
CHARLES BACHMAN, 1973
JAMES GRAY, 1998
EDGAR CODD, 1981
MICHAEL STONEBRAKER,
2014
23
The Grand Challenges of Data Management
• Relational DBMS was invented in early 70’s, and now
50+ billion mature industry
• What are we still working on? Big Data!
– https://www.youtube.com/watch?v=vbb-AjiXyh0
– http://www.youtube.com/watch?v=LrNlZ7-SMPk
• What is the ultimately advanced DB?
– Data of all sorts--- Prevalent on the Web!
– What have you been searching lately?
– What you search is what you want?
• New challenges naturally arise
– structured vs. unstructured data
– querying vs. analysis vs. mining vs. learning
– closed “base” vs. the open Web
24
Have fun!
What Does 'Big Data' Mean and Who Will Win?
http://research.microsoft.com/apps/video/default.aspx?id=258302&l=i
Tallahassee, Florida, 2017