Final Review

Download Report

Transcript Final Review

COP4540 Database
Management System
Final Review
Reviewed by Ramakrishna.
Parts of this are taken from
Fernando Farfan’s presentation
AGENDA
Exercises to do….
 Ch6
JDBC
 Ch7
3-Tier Architecture
 Ch8.
Storage and Indexing
 Ch10. Tree-Structured Indexing
 Ch7.
XML Documents
 Ch27. XQUERY: Querying XML Data

Exercises to do….

Remember to practice even numbered exercises
of the book

Solutions are available online at
http://www.cs.wisc.edu/~dbbook/openAccess/thi
rdEdition/supporting_material.htm

Important Recommended exercises: 8.3, 8.4,
8.5, 8.7, 8.10, 8.11
JDBC

What is JDBC ? Explain its purpose.
 JDBC
is Java DataBase Connectivity used to
enable integration of SQL with a general purpose
programming language.

Explain JDBC Architecture
4
components :
Application,
 Driver Manager,
 Data Source Specific Drivers,
 Data Sources (stored in
MySql,Oracle,DB2,MSSQL,Access and so on).

JDBC

Explain the individual steps required to
submit a query to a data source and to
retrieve results in JDBC ?
 JDBC
Driver Management (class.forName(..)),
 Establishing Connection (Connection Object),
 Executing SQL Statements:
Statement,
 PreparedStatement,
 CallableStatement.

 Retrieving
results ( Examining ResultSets).
Stored Procedures

Explain the term stored procedure, and give
examples why stored procedures are useful.
 Stored
procedures are programs that run on the
database server and can be called with a single SQL
statement.
 Runs inside the process space of the database
server.
 Stored procedures can also be used to reduce
network communication.
 They are locally-executed and results are packaged
in to one big result.
 Different users can re-use the stored procedure.
3-Tier Application Architecture

What is a 2-tier architecture ? Explain
different types of 2-tier architectures.

It’s a client-server architecture.
Client
Application Logic
Network
DBMS
Client
Architecture 1: Thin Clients ( web browsers)
3-Tier Application Architecture

Architecture 2: Thick Clients
Client
Application Logic
DBMS
Network
Client
Application Logic
Disadvantages ?
1) No central place to update
2) Need to trust client
3-tier architecture

What are the advantages of 3-tier
architecture ?
 Heterogeneous
Systems
 Thin
Clients
 Integrated Data Access
 Scalable
Network
Client (user interface)
Network
DBMS
(database)
Client (web browser)
Application Logic
(in C++ or Java..)
CH8. OVERVIEW OF STORAGE
AND INDEXING
DBMS stores vast quantities of data
 Data must persist across program executions

 Data
is stored on external storage devices.
The unit of information read from or written to
disk is a page.
 The cost of page I/O dominates the cost of
typical database operations.


Input from disk to main memory
 Output from main memory to disk
CH8. OVERVIEW OF STORAGE
AND INDEXING


Simplest file structure is an unordered file, or
heap file.
An index is a data structure to optimize certain
kinds of retrieval operations.
 CLUSTERED
- When a file is organized so that the
ordering of data records is the same as the ordering
of data entries in some index
 UNCLUSTERED

- It is un-clustered otherwise
There is UTMOST one clustered index and
several un-clustered ones for a table.
Example
Employee (Eid: integer, name: String,
age: integer…..)
 Assume data records sorted by name
 Assume an index on name

 Index

on name  CLUSTERED
Assume an index on age
 Index
on age  UNCLUSTERED
Example: Using Indexes

Employee Data Records : Sorted by Name
 Page


(1,Alex,30)
(2,Amy,21)
 Page



2:
(3, Bob,31)
(4, Brenda,21)
Select * from Employee where name like “A%”;
 Use

1:
index on name  Retrieve Page 1.
Select * from Employee where age = 21;
 Use
index on age  Retrieve Page 1 & Page 2.
Clustered and UnClustered
Indexes

Retrieval using a Clustered index
 Data
retrieval using minimum number of
Data page I/Os.

Retrieval using an Unclustered index
 In
worst case, 1 page I/O for every
qualifying tuple.
Another Example

Consider “Select E.dno from Employee E
where E.age > 40”;
 Assume
B+ Tree index on age.
 Is it better to use this index ? Or just do a
segment scan ?

Answer : Depends on several factors:
 Depends
on Selectivity of the condition.
 Whether the index is clustered or unclustered.
CH8. OVERVIEW OF STORAGE
AND INDEXING

8.3 Consider a relation stored as a randomly ordered
file for which the only index is an unclustered index
on a field called sal. If you want to retrieve all
records with sal > 20, is using the index always the
best alternative? Explain.

No. In this case, the index is unclustered, each qualifying
data entry could contain an rid that points to a distinct
data page, leading to as many data page I/Os as the
number of data entries that match the range query. In
this situation, using index is actually worse than file scan.
CH8. OVERVIEW OF STORAGE
AND INDEXING

8.5 Explain the difference between Hash indexes and
B+-tree indexes. In particular, discuss how equality
and range searches work.

Hash Index: Hashing function. Quickly maps a search
key value to a bucket. Inserts/Deletes are simple. Linked
list for collisions. Good at equality searches. Terrible for
range queries.

B+-tree Index: Hierarchical search data structure.
Maintenance is costly. Costly for individual record
lookup. Efficient for range queries.
CH8. OVERVIEW OF STORAGE
AND INDEXING

Consider the following relation
Answer: Update the salary of an employee
using the employee id.
CH8. OVERVIEW OF STORAGE
AND INDEXING
Update the age or employee id for some
department id
Update the salaries of all employees in
some department
Constraints
Constraints
Constraints
Constraints
Assume tables are already created.
ALTER TABLE ENROLLED add constraint cons1
check((select count(*) from ENROLLED group by
cname)>=5)
ALTER TABLE ENROLLED add constraint cons2
check((select count(*) from ENROLLED group by
cname)<=30)
Constraints
Views
Views
Tree-Structured Indexing

Explain ISAM and B+ Trees ? What are the
differences, advantages and disadvantages
?

ISAM – Indexes Sequential access Method
Effective for static files
 Unsuitable for dynamically changing files

 B+
Tree
Dynamic index structure
 Adjusts well to changes
 Does well for both range and equality selections

Tree-Structured Indexing

Insertions and deletions in ISAM
 Happen
only in leaf pages
 Insertions – adding records to the overflow chain
 Search – inefficient as the chain grows

In B+ tree even after several insertions and
deletions,
 The
tree is kept balanced
 Height of the tree  length of the path from root
to leaf
Query Evaluation
Consider a relation R(a,b,c,d,e) containing 5,000,000
records, where each data page of the relation holds
10 records. R is organized as a sorted file with
secondary indexes. Assume that R.a is a candidate
key for R, with values lying in the range 0 to
4,999,999, and that R is stored in R.a order. For each
of the following relational algebra queries, state
which of the following three approaches is most likely
to be the cheapest:
 Access the sorted file for R directly.
 Use a (clustered) B+ tree index on attribute R.a.
 Use a linear hashed index on attribute R.a.
Query Evaluation
CH7. XML DOCUMENTS

7.1 When is an XML document well-formed?
When is an XML document valid?

An XML document is valid if it has an
associated DTD and the document follows the
rules of the DTD. An XML document is wellformed if it follows three guidelines:
1.
2.
3.
It starts with an XML declaration.
It contains a root element that contains all other
elements.
All elements are properly nested.
CH27. XQUERY

<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
CH27. XQUERY
Query:
doc("books.xml")/bookstore/book/title
 Result:

<title lang="en">Everyday Italian</title>
<title lang="en">Harry Potter</title>
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
CH27. XQUERY
Query:
doc("books.xml")/bookstore/book[price
<30]
 Result:

<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
CH27. XQUERY
Query:
for $x in
doc("books.xml")/bookstore/book
where $x/price>30 order by $x/title
return $x/title
 Result:

<title lang="en">Learning XML</title>
<title lang="en">XQuery Kick Start</title>
CH27. XQUERY


Query:
<ul> {
for $x in doc("books.xml")/bookstore/book/title
order by $x
return <li>{$x}</li>
} </ul>
Result:
<ul>
<li><title lang="en">Everyday Italian</title></li>
<li><title lang="en">Harry Potter</title></li>
<li><title lang="en">Learning XML</title></li>
<li><title lang="en">XQuery Kick Start</title></li>
</ul>