Transcript ch15

Chapter 15
Database and the
Internet
Fundamentals of Database Management Systems
by
Mark L. Gillenson, Ph.D.
University of Memphis
Presentation by: Amita Goyal Chin, Ph.D.
Virginia Commonwealth University
John Wiley & Sons, Inc.
Chapter Objectives

List the four differences between the
Internet database environment and the
standard database environment.

Describe the database connectivity issues
in the Internet environment.

Describe the expanded set of data types
found in the Internet environment.
15-2
Chapter Objectives

Describe such database control issues as
performance, availability, scalability, and
security and privacy in the Internet
environment.

Describe the significance of data
extraction into XML in the Internet
environment.
15-3
The Internet

E-mail

File transfers

Electronic commerce





Shop online
Bank online
Get news online
Get entertainment online
Etc.
15-4
Internet & Databases

Essence of e-commerce activity is data stored in
databases.

Data for a company’s product selections comes
from a database.

Your orders with a company go into a database.

Checking your bank account balance requires
querying a database.
15-5
Internet & Non-Internet
Database Environment:
Similarities

Most (but not all) e-commerce databases
are relational databases and many are
transactional in nature.

SQL and other standard query tools can
be and are used in the e-commerce
environment by the companies that own
the databases.
15-6
Internet & Non-Internet
Database Environment:
Differences

Database Connectivity Issues

Expanded Set of Data Types

Database Control Issues

Data Extraction into XML
15-7
Database Connectivity Issues

In a simple database
environment, the
application program,
DBMS, and data are
all contained within a
single computer
(single or multi-user
environment).
15-8
Database Connectivity
Issues: Client/Server Systems

Two classes of computers.

Client computers are enduser PCs, connected to a
server computer on a LAN.

Server contains application
programs, the database
management system, and
the database, which all of
the clients share.
15-9
Database Connectivity
Issues: Client/Server Systems

When an end user wants to run an application or
retrieve data from the shared database:

client computers handle the initial processing (the
presentation or graphical user interface aspect)

the data is sent on to the server for processing by the
application code, including data retrieval from the
shared database.

server returns the results to the client PC.

client is responsible for formatting the screen display.
15-10
Database Connectivity
Issues: Client/Server System

Usually thought of as a system built on a
LAN.

World Wide Web (WWW) can be
considered to be a massive client/server
system built on the Internet.
15-11
Database Connectivity
Issues: Client/Server System WWW

The clients are the PCs that
individuals and companies use
to connect to the Internet.

The browsers in the PCs
handle the client side screen
presentation.

The servers are the company
Web servers with which
people use their PCs for ecommerce.
15-12
Database Connectivity Issues:
Hardware Components of the
Web - Database Connectivity

There are three levels
of computers in this
arrangement:



the client PCs
the Web server
the database server.
15-13
Database Connectivity Issues:
Specialized Web Software for
E-Commerce

When a client’s browser
sends a message to the
Web server, message
follows rules of the
Transmission Control
Protocol/Internet Protocol
(TCP/IP).

All Internet traffic
(including, e.g., e-mail)
must follow TCP/IP.
15-14
Database Connectivity Issues:
Specialized Web Software for
E-Commerce

The client’s message
must also follow the
rules of the Hypertext
Transfer Protocol
(HTTP), which is an
additional protocol
layer for WWW traffic
on the Internet.
15-15
Database Connectivity
Issues: TCP/IP & HTTP

TCP specifies how the message is broken
up into smaller “packets” for transmission.

IP deals with the address of the computer
to which the message is being sent.

HTTP indicates the type of browser in the
client and other information needed to
format Web pages.
15-16
Database Connectivity Issues:
The Web Database
Environment

There can be different kinds of:





Hardware, even just between the Web server and the
database server
Application software languages
Browsers on the client side
Data, not just data in relational databases.
Need specialized interfaces and specialized
software known as middleware.
15-17
Database Connectivity
Issues: Interfaces

Common Gateway Interface (CGI)

Application Program Interface (API)

These interfaces have software “scripts”
associated with them that allow them to
exchange data between the application in
the server and the databases controlled by
the database server.
15-18
Database Connectivity
Issues: ODBC

Open Database Connectivity

One standard way of accessing data.

Designed as an interface to relational
databases.
15-19
Database Connectivity
Issues: JDBC

Java Database Connectivity

Another standard way of accessing data.

Designed as an interface to relational
databases.
15-20
Database Connectivity
Issues: Middleware

Used for connecting the applications in the
Web server with the databases in the
database server.

Examples:
 Cold
Fusion
 Oracle Application Server
 Microsoft Active Server Pages (ASP)
 and others.
15-21
Expanded Set of Data Types

Data Types needed for various kinds of data:







Numeric
Character
Large text blocks
Graphic images
Video clips
Audio clips
Databases that support Web sites must be
capable of storing, searching, and retrieving this
wide variety of data.
15-22
Expanded Set of Data Types:
Oracle LOB Category

LOB = large object category, includes data types:

Binary LOB (BLOB) - for graphic images, photographs, video
clips, audio clips

Binary File (BFILE) - a pointer to a file external to the database.

Character LOB (CLOB) - for large text files or documents

National Character LOB (NCLOB) - for large text files or
documents in non-Latin characters
15-23
Expanded Set of Data Types:
Informix Universal Server

The object/relational DBMS Informix
Universal Server provides another style of
handling multimedia and large text data
using “Data Blades”:
 IMAGE
- a general-purpose image data type
 DOC - for storing large text blocks
 A set of data types, including point, line,
polygon, path, and circle, which can be used
for storing a variety of graphic images.
15-24
Database Control Issues

The Internet database environment
requires a special emphasis on:
 Performance
 Availability
 Scalability
 Security
and Privacy
15-25
Database Control Issues:
Performance

Internet traffic to a Web site, the number of
people or companies trying to access it
simultaneously, can vary greatly and because of
a variety of factors such as:





Time of day (worldwide basis)
Season of the year (e.g., Christmas shopping)
Popularity of a Web site
A major new product introduction
A major event (e.g., the Victoria’s Secret annual
fashion show)
15-26
Database Control Issues:
Performance

Spikes in Internet traffic require serious
predictive capacity planning.

System performance is also affected by
software design and database design.

Performance-boosting techniques should
be considered.
15-27
Database Control Issues:
Performance-Boosting
Techniques

Physical design techniques, including denormalization.

Database persistence - holding a temporary copy of the
retrieved data outside of the database.


Query cache - a special dedicated memory associated with the
Web server
Use canned queries instead of having employees write
their own SQL queries to access data over an internal
Intranet.
15-28
Database Control Issues:
Availability

A company’s Web site and the databases that it
accesses should be available to the public at all
times.

Three o’clock in the morning in one part of the
world is the middle of the day in another.

24/7
15-29
Database Control Issues:
Availability

An information system may be unavailable
because:
 A system
or telecommunications failure
 Failure of a support system, such as an
electrical outage
 Planned down period for system maintenance
 Excessive traffic that clogs the system
15-30
Database Control Issues:
Improving Availability Clustering

A cluster of several servers is built, each with its
own replicated copy of the database.

As queries come in over the Web, sophisticated
software checks the activity on each of the
servers and their databases and performs load
balancing, sending each particular query to a
server that is relatively idle at that moment.
15-31
Database Control Issues:
Scalability

An information system (IS) that supports a Web
site and its traffic growth must be scalable.

An IS must be capable of growing in size without
adversely affecting the operations of the site.

Hardware and software must be chosen that is
capable of rapid and major expansion.
15-32
Database Control Issues:
Security and Privacy

In the business-to-consumer e-commerce
environment, the company wants as many
people as possible to “visit” its Web site and buy
its products.

This means that hackers, data thieves, virus
writers, and anyone else with mischief on their
minds has an openly published entry point into
the company’s information system.
15-33
Database Control Issues:
Security Measures- Isolation

Separating the different parts of the
information system so that they run on
different computers.

The Web server and the database server
should be different computers.
 Should
be separated from the rest of the
company’s information system by being on a
separate LAN.
15-34
Database Control Issues:
Security Measures - Firewalls

Making major use of
firewalls.

Firewalls can be
separate “proxy”
computers that extract
data from incoming
messages and pass
the data on in a
different format to the
Web server.
15-35
Database Control Issues:
Privacy

Companies have long held personal data about
their customers in their databases.

Now companies are communicating digitally with
their customers through their Web sites,
including passing their personal data, over the
Internet.

The collected personal data in the company’s
database makes a tempting target for someone
out to steal such data.
15-36
Data Extraction into XML

XML = Extensible Markup Language

When a Web server sends a Web page to your
PC, the text and data in the page comes
formatted in HyperText Markup Language
(HTML).

Embedded HTML “tags” mark up the text and
data, instructing your PC’s browser on how to
display the page on your monitor.
15-37
Data Extraction into XML:
SGML

SGML = Standard Generalized Markup
Language

HTML is derived from SGML

SGML is also capable of indicating the meaning
of data.

XML, also derived from SGML, focuses on
meaning of data.
15-38
Data Extraction into XML:
XML DTD
 DTD =
<!ELEMENT book>
<!ELEMENT booknumbe r (#PCDATA)>
Document
<!ELEMENT booknam e (#PCDATA)>
type
<!ELEMENT publicationyea r (#PCDATA)>
definition
<!ELEMENT page s (#PCDATA)>
<!ELEMENT publishername (#PCDATA)>
 Good
<book>
Reading’s
<booknu mber>374566</booknumbe r>
database
<bookname>Catch-22</booknam e>
<publicationye ar>1955</publicationyear>
example
<pages>443</pages>
<publishername>Simon and Schuster</publishername>
15-39
Data Extraction into XML: XML
and Database Management

Supply chain management and
electronic data interchange (EDI)
require coordination between companies
of database constructs.

XML provides an independent layer of
data definition that is separate from the
particular formatting of each company’s
data in their databases.
15-40
Data Extraction into XML:
Good Reading Bookstores
15-41
“Copyright 2004 John Wiley & Sons, Inc. All rights reserved. Reproduction or
translation of this work beyond that permitted in Section 117 of the 1976
United States Copyright Act without express permission of the copyright owner
is unlawful. Request for further information should be addressed to the
Permissions Department, John Wiley & Sons, Inc. The purchaser may make
back-up copies for his/her own use only and not for distribution or resale. The
Publisher assumes no responsibility for errors, omissions, or damages caused
by the use of these programs or from the use of the information contained
herein.”
15-42