Transcript document

Course Overview:
CS 395T Semantic Web, Ontologies and
Cloud Databases
Daniel P. Miranker
Objectives:
• Get to know each other
• Set expectations
1: Introduction
Data Management & Engineering
1
Course Requirements
• Several lab homeworks. (completion grade)
– Build an ontology
– Write SPARQL queries
– Simple HADOOP exercise
• 2 paper presentations
– (may overlap term project)
• Term project
1: Introduction
Data Management & Engineering
2
Presentation Content
• Miranker will present about 1/2 of CS386d,
Database Management Systems, in about
1/2 of the material’s normal time.
• Student presentations of papers.
 Attendance is required
1: Introduction
Data Management & Engineering
3
Papers
• Miranker will provide an initial set of papers
• Remainder of the class will be crowd-sourced.
–
–
–
–
Student’s each required to nominate >= 3 papers.
List is compiled.
Each paper is assigned to 3 referees (just like a conference).
Miranker organizes the class from the referee reports.
1: Introduction
Data Management & Engineering
4
Presentations
• Elements of Public Speaking.
• Structure
– Two presentations,
• improvement will be noted
– Draft slides due one week ahead of time,
• will be reviewed in a one-on-one meeting
– Feedback from the class,
• Miranker not in the room
1: Introduction
Data Management & Engineering
5
Database Systems Getting
Exciting Again
---> weren’t exciting for a long time.
1: Introduction
Data Management & Engineering
6
In the recent past, DBMS
witnessed:
• Commoditization
– Database ---> database management system
----> relational database management system
– Canonical RDBMS architecture is a mainstay.
• (and so are architectures for Operating Systems and Networks)
1: Introduction
Data Management & Engineering
7
DBMS Architecture
Query Engine
Transaction Manager
Storage Manager
1: Introduction
Data Management & Engineering
8
DBMS Architecture
Storage Manager
• Exploit memory hierarchy
to compensate for slow
disks.
– working sets (from OS)
– search algorithms
• Specifics
– manage a heap of disk pages
– allocation of main memory
(buffer management)
– index methods, e.g. B+ tree
(access paths)
1: Introduction
Data Management & Engineering
Storage Manager
RAM
9
RAM
organized as page
buffers
Indexes
data blocks
1: Introduction
Data Management & Engineering
10
DBMS Architecture, 2
Transaction Manager
• Manage many users sharing a
database, (speed)
•
– Every thing gets written at least 3
times.
Transaction Manager
log
Storage Manager
– Every DB write, also logged to
redundant disks, (a.k.a. stable store)
•
RAM
1: Introduction
Cope with machine crashes
ACID properties
–
–
–
–
Data Management & Engineering
Atomic
Consistent
Isolated
Durable
11
DBMS Architecture 3
Query Engine
• SQL execution environment
– parse
– compile to logical operators
– optimize: Choose a good set of
access paths and sequence of
database operators
(a.k.a. a physical plan)
1: Introduction
Data Management & Engineering
Query Engine
Transaction Manager
Storage Manager
12
DBMS Architecture 3
Query Engine
• SQL execution environment
– parse
– compile to logical operators
– optimize: Choose a good set of
access paths and sequence of
database operators
(a.k.a. a physical plan)
1: Introduction
Data Management & Engineering
Query Engine
Transaction Manager
Storage Manager
13
What Changed?
1: Introduction
Data Management & Engineering
14
What Changed?
• Internet
• Moore’s law
– Computing is forever getting cheaper
• Processing
• Storage
• Bandwidth
– People are not getting cheaper
1: Introduction
Data Management & Engineering
15
Implications
• Business models founded on
–  in the asymptote computing and bandwidth
are _ _ e e
• Economy of scale
• People cost dominate
 Data Centers
 Computing as Services
 Massive application of commodity components
(Electricity?)
1: Introduction
Data Management & Engineering
16
Software Engineering Implications
• A DBMS is a shared
resource and the place to
persist all data.
Thus, unambiguously
• Content and programming
of a DBMS is
– at the center of
– must interoperate with
all the other software
development
1: Introduction
Data Management & Engineering
17
Three Tier Architecture
• Three Tier Architecture
– Pervasive?
every hardware vendor
sells a preloaded rack.
1: Introduction
Data Management & Engineering
18
XML has become a standard for
data transfer
1: Introduction
Data Management & Engineering
19
Service Oriented Architecture
Internet
Service:= (usually) a remote database query or transaction
1: Introduction
Data Management & Engineering
20
Three Tier Architecture
• Three Tier Architecture
– Pervasive?
every hardware vendor sells a
preloaded rack.
– What does it mean if you know
about databases?
you’re a king
– What if you are a professor of
Computer Science?
1: Introduction
Data Management & Engineering
21
Definitions (old slide 1)
• Database*: A collection of data
• Database Management System (DBMS): A software
system that provides a set of services on a database.
*A word on notation. Underlined terms are technical terms whose definition I expect you to
know well.
1: Introduction
Data Management & Engineering
22
Examples: (old slide 2)
• Relational Database Management System
• Operating Systems
–
–
–
–
What about operating systems?
__________
__________
__________
• facebook
1: Introduction
Data Management & Engineering
23
RDBMS Architecture Motived
• Core Database Architecture
– How to cope with disks.
• The only [computational] moving part.
• Its not changing.
1: Introduction
Data Management & Engineering
24
What about disks?
• Why do computers have disks? (good)
– inexpensive, large persistent, storage.
• persistent storage: data is unaltered if the power goes off.
• Why do we wish they didn’t? (bad)
– slow
• 8-12 msec. seek time. ~ 0.1 that in rotational latency
– they break
1: Introduction
Data Management & Engineering
25
Solid State Disk Drives! (SSD)
• Have been promised for 40 years
• Long term impact on DBMS architecture promises
to be great.
• Current impact, negligible.
• Let’s look at the real numbers: ___________
1: Introduction
Data Management & Engineering
26
Renaissance of Database Research
• Semantic Web
• Cloud Databases (NoSQL)
• Other Specialized Databases
– General purpose database, all thing to all people.
• amortize cost of product over largest possible market
– Today market is so large, functionality so broad
• focussed feature set --> more effective product
• market fragments are large enough to support specialized
product.
1: Introduction
Data Management & Engineering
27
Semantic Web
•
Knowledge-base techniques to
simplify large-scale systems
• SPARQL/linked data query
Data Integration
Query Engine
Transaction Manager
NoSQL Cloud Databases
• Non-ACID transaction models
•
Fault-tolerance through redundancy
vs. stable-store
Storage Manager
1: Introduction
Data Management & Engineering
28
Renaissance of Database Research
• Semantic Web
– started with focus on search
– now, data interchange and data integration
• Cloud Databases
• Other Specialized Databases
– General purpose database, all thing to all people.
• amortize cost of product over largest possible market
– Today market is so large, functionality so broad
• focussed feature set --> more effective product
• market fragments are large enough to support specialized
product.
1: Introduction
Data Management & Engineering
29