Database & Data Warehouse
Download
Report
Transcript Database & Data Warehouse
Database and Data Warehouse
October 8, 2015
LEARNING GOALS
Explain basic concepts of data management.
Describe traditional file systems and identify
their problems.
Define database management systems and
describe their various functions.
Explain how the relational database model
works.
Explain Object-Oriented databases.
Explain Data Warehouse, Data Mart
2
Mini case: Bank Accounts’ data
What basic data/info are required for opening
bank account?
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
3
What is a database?
Collection of related files containing records on
entities like people, places, events, things, etc.
Databases make data easy to access and
manage.
Customers Info
Accounts Info
Access and Management tools
Employees Info
4
Basic Concepts of Data Management
Table 1
Table 2
Form 1
Acc #:_______
Name:_______
Table 3
Report
Database:
Collection of data
organized in
different
containers
5
Basic Concepts of Database systems
Accounts table
AccountID
Customer
Type
Balance
660001
John Smith
Checking
$120.00
660002
Linda Martin
Saving
$9450.00
660003
Paul Graham
Checking
$3400.00
Each table has:
Fields
Records
1 Primary key
Table
– Two-dimensional structure composed of rows and columns
Field
– Like a column in a spreadsheet
Field name
– Like a column name in a spreadsheet
– Examples: AccountID, Customer, Type, Balance
Field values
– Actual data for the field
Record
– Set of fields that describe an entity (a person, an account, etc.)
Primary key
– A field, or group of fields, that uniquely identifies a record
6
Basic Concepts in Data Management
A Primary key could be a single field like in these tables
Primary key
AccountID
Customer
Type
Balance
660001
John Smith
Checking
$120.00
660002
Linda Martin
Saving
$9450.00
660003
Paul Graham
Checking
$3400.00
Primary key could be a composite key, i.e. multiple fields
7
Traditional File Systems
Early attempt to computerize manual filling system
System of files that store groups of records used by
a particular software application
Simple but with a cost
– Inability to share data
– Inadequate security
– Difficulties in maintenance and expansion
– Allows data duplication (e.g. redundancy)
Application 1
Application 2
Program 1
Program 2
Program 1
Program 2
File 1
File 1
File 1
File 1
File 2
File 2
File 2
File 2
File 3
File 3
File 3
File 3
8
Traditional File System Anomalies
Insertion anomaly
– Data needs to be entered more than once if
located in multiple file systems
Modification anomaly
– Redundant data in separate file systems
– Inconsistent data in your system
Deletion anomaly
– Failure to simultaneously delete all copies of
redundant data
– Deletion of critical data
9
Database Advantages
Database advantages from a business
perspective include:
– Ease of data insertion
• Example: can insert a new address once; and the
address is updated in all forms, reports, etc.
– Increased flexibility
• Handling changes quickly and easily
– Increased scalability and performance
• Scalability: how the DB can adapt to increased demand
– Reduced information redundancy & inconsistency
– Increased information integrity (quality)
• Can’t delete a record if related info is used in other container
– Increased information security
Common Database software
–
–
–
–
–
–
–
–
–
–
–
–
Desktop
Types of DBMSs
Server / Enterprise
Desktop
Handheld
– Designed to run on desktop computers
– Used by individuals or small businesses
– Requires little or no formal training
– Does not have all the capabilities of larger DBMSs
– Examples: Microsoft Access, FileMaker
12
Types of DBMSs (Cont.)
Server / Enterprise
– Designed for managing larger and complex databases by
large organizations
– Typically operate in a client/server setup
– Either centralized or distributed
• Centralized – all data on one server
– Easy to maintain
– Prone to run slowly when many simultaneous users
– No access if the one server goes down
• Distributed – each location has part of the database
– Very complex database administration
– Usually faster than centralized
– If one server crashes, others can still continue to operate.
– Examples: Oracle Enterprise, DB2, Microsoft SQL Server
13
Types of DBMSs (Cont.)
Handheld
– Designed to run on handheld devices
– Less complex and have less capabilities than
Desktop or Server DBMSs
– Example: Oracle Database Lite, IBM’s DB2
Everywhere.
14
Database Management System (DBMS)
Combination of software and data for
– Collecting, storing and managing data in a
database environment.
A DBMS includes:
– Database
– Database engine (for accessing and modifying the
DB content)
– Data Manipulation Language
Application 1
Program-1
Program-2
Application 2
Program-1
DBMS
Program-2
15
Database Management System (DBMS)
Software through which users and application
programs interact with a database
DBMS Functions
Create database structure (tables, relationships,
schema, etc.)
Transform data into information (reports, ..)
Provide user with different logical views of actual
database content
Provide security: password authentication, access control
– DBMSs control who can add, view, change, or delete
data in the database
Physical view
ID Name Amt
01 John 23.00
02 Linda 3.00
03 Paul 53.00
Logical views
ID
02
Name
Paul
Name
Linda
Amt
53.00
ID Name Amt
01 John 23.00
02 Linda 3.00
17
DBMS Functions (cont.)
Allowing multi-user access with control
– Control concurrency of access to data
– Prevent one user from accessing data that has not
been completely updated
• When selling tickets online, Ticketmaster allows you to
hold a ticket for only 2 minutes to make your purchase
decision, then the ticket is released to sell to someone
else – that is concurrency control
18
Database Models
Database model = a representation of the
relationship between structures (e.g. tables) in a
database
Common database models
– Flat file model
– Relational model (the most common, today)
– Object-oriented database model
– Hierarchical model
– Network model
19
Flat File Database model
Stores data in basic table structures
No relationship between tables
Used on PDAs for address book
20
Relational Database Model
Multiple two-dimensional tables related by common fields
Uses controlled redundancy to create fields that provide
linkage relationships between tables in the database
– These fields are called foreign keys – the secret to a
relational database
– A foreign key is a field, or group of fields, in one table
that is the primary key of another table
Handles One-to-Many and One-to-One
relationships
21
Object-Oriented Database model
Needed for multimedia applications that
manage images, voice, videos, graphics, etc.
Used in conjunction with Object-oriented
programming languages
Slower compared to relational DBMS for
processing large volume of transactions
Hybrid object-relational Databases are
emerging
22
Hierarchical Database Model
Data is organized into a tree-like structure using parentchild relationships.
Created in the 1960s by IBM
Limited to storing data in One-to-Many relationships
– One parent segment to many child segments
Not very flexible
Examples: IBM’s Information Management System (IMS)
and Windows registry.
Network Database model
Developed in 1969
Many-to-Many relationships between entities
Any record may be linked to any other record
Highly flexible but also highly complex
Hard to maintain
Rarely used today
Data Warehouse
a logical collection of information gathered from
many different data sources
Stores current and historical data
supports business analysis activities and decisionmaking tasks
The primary purpose of a data warehouse is to
aggregate information throughout an organization
into a single repository for decision-making purposes
25
Data Warehouse Fundamentals
Many organizations need internal, external, current,
and historical data
Data Warehouse are designed to, typically, store
and manage data from operational transaction
systems, Web site transactions, external sources,
etc.
26
Data Warehouse: A Multi-Tiered Architecture
Other
sources
Operational
DBs
Metadata
Extract
Transform
Load
Refresh
Monitor
&
Integrator
Data
Warehouse
OLAP Server
Serve
Analysis
Query
Reports
Data mining
Data Marts
Data Sources
Data Storage
OLAP Engine Front-End Tools
27
Multidimensional Analysis
Data mining – the process of analyzing data to
extract information not offered by the raw data alone
Data-mining tools use a variety of techniques (fuzzylogic, neural networks, intelligent agents) in order to
find patterns and relationships in large volumes of data
and infer rules that predict future behavior and guide
decision making
Other analytical tools: query tools, statistical tools,
etc. used to
Analyze data, determine relationships, and test hypotheses
about the data
28
Data Warehouse Fundamentals
Extraction, transformation, and loading (ETL) – a process that extracts
information from internal and external databases, transforms the information
using a common set of enterprise definitions, and loads the information into
a data warehouse.
Information Cleansing or Scrubbing
Organizations must maintain high-quality data
in the data warehouse
Information cleansing or scrubbing
– a process that weeds out and fixes or discards
inconsistent, incorrect, or incomplete information
– first, occurs during ETL. Then, when the data is in
the Data Warehouse using Information cleansing
or scrubbing tools.
30
Data Mart
Subset of data warehouses that is highly focused
and isolated for a specific population of users
Example: Marketing data mart, Sales data mart, etc.
31
Database vs. Data Warehouse
Databases contain information in a series of
two-dimensional tables
In a Data Warehouse and data mart,
information is multidimensional, it contains
layers of columns and rows
Total annual sales
of TV in U.S.A.
Date
2Qtr 3Qtr
4Qtr
sum
U.S.A
Canada
Mexico
sum
32
Country
TV
PC
VCR
sum
1Qtr
Summary Questions
Notes
1) What is a database, a table, a field, a record, a primary key, a composite
key?
2) What are the problems with traditional file systems?
3) What are the major functions of a DBMS?
4) (a) Name some Desktop DBMSs. (b) Name some Enterprise DBMSs. (c)
Handheld DBMSs
5) Describe hierarchical database model, network model
6)
What are the differences between Flat File, Relational, and Objectoriented database models?
7)
What is Data warehouse? Data Mart?
8)
What is Extraction, transformation, and loading (ETL)? What is datamining? What is Information cleansing or scrubbing?
33