data modelling - Applied Computer Science

Download Report

Transcript data modelling - Applied Computer Science

ACS1803 Lecture Outline 2
DATA MANAGEMENT CONCEPTS
Text, Ch. 3
How do we store data (numeric and character records)
in a computer so that we can optimize on storage and
retrieve data efficiently and flexibly [different possible
combinations]?
- we have already seen data management basics in
material on MS Access dbms
-
we now expand this to computers in general
The Database Approach
• Traditional approach to data management:
– Each distinct operational system used data files
dedicated to that system
• Database approach to data management:
– Pool of related data is shared by multiple
application programs
2
-
firstly, there were individual files for each set of programs
later, database approach:
o one centralized source of data for all programs
to tap into
o minimizes redundancy and provides retrieval
efficiency
o can have one database for some subject area:
student database, employee database
how do we structure the data [on disk] for a
database?
Different people need different combinations
[views] of data; yet the database must satisfy all views
Relational Database
- data set up as a centralized collection of
tables
- tables are linked by common columns of
data
o
- tables are designed together to minimize
repetition
- this is the most common database structure
-
- this is the one used in microcomputer
databases and many larger ones too
The database management system
-software that sets up the database structure
(‘skeleton’ on disk according to a certain model, fills the
structure with data and retrieves the data to provide
meaningful information (p. 130 bot)
-related to using a dbms:
- data dictionary: defines each field and record, explains
what each field means and who is authorized to update
it (p. 143)
-query language: used to extract data that satisfy
certain criteria from a database
The Database Management
System
Data Dictionary
• Is a document that database designers prepare to help individuals
enter data
• Provides several pieces of information about each attribute in the
database including:
- Name
- Key (is it a key or part of a key?)
- Data Type (date, alphanumeric, numeric, etc.)
- Valid Value (the format or numbers allowed)
• Can be used to enforce Business Rules which are captured by the
database designer to prevent illegal or illogical values from entering
the database. (e.g. who has authority to enter certain kinds of data)
6
e.g., SQL: Structured Query Language:
- Popular language for making requests to a
relational dbms (p. 145 bot)
e.g., SELECT LAST NAME, DEPARTMENT,
SALARY
FROM
EMPLOYEE
WHERE
DEPARTMENT = ‘4530’ AND SALARY > 25000
Popular dbms:
Access - individual
Oracle - Client / Server network
Data bases and computer networks
- can have centralized database (at one
location)
or, fragmented, distributed database: different
parts of the database are stored in the locations where
they are accessed most often, but continue to be fully
accessible to others;
Database Administrator (DBA): person in an IS
department responsible for the database (p. 147)
Databases on the Web:
- the only way for organizations to conduct business on
the Internet is to people outside the organizations
access to their databases (e.g., items for auction) 138
bot
Text databases
Databases are thought of as consisting of structured
records of fields; but can also have ‘databases’ of text
-
- these can be on the web and can use search
engines to search for keywords
- such ‘databases’ can come on CD-ROMs
Data Warehousing
- most regular organizational databases are
‘transactional’ and current; their contents change
daily
- but transaction data in such a
database as of a certain time (a ‘snapshot’) can be
archived
- a data warehouse is a large (relational)
database of archived data used to gain extra
insights to support management decision making
-combines data from databases across the
organization
--data mart: smaller collection of data that focuses
on a particular subject or department
Data Warehouses, Data Marts, and Data
Mining (continued)
Operational
Informational
Extract
Data
Department
Databases
• Day-to-day
department
transactions
• Used primarily by
departments
Extract
Data
Data
Warehouse
Data
Mart
• Extracted
department
transactions
• Used for
business
analysis
• Extracted subset
of a data
warehouse
• Used for highly
specific business
analysis
11
Data Warehousing and Mining
p. 155-157
• data in warehouses is accumulated and may reflect
many years of business activity
• various business analysis programs (models) can
then be applied to the data warehouse to gain insight
• organizations often choose mainframe computers
with multiple CPUs to store and manage data
warehouses;
•
•
Data Mining
• the process of selecting, exploring, and modelling
large amounts of data to discover previously
unknown relationships
- data mining software searches through large
amounts of data for meaningful patterns of information
often used by marketing managers to see
purchasing patterns
- can be used in banking: to find profitable
customers and patterns of fraud
Data Warehouses, Data Marts,
and Data Mining (continued)
• Predictive analysis:
– Form of data mining that combines historical
data with assumptions about future conditions
to predict outcomes of events
– Used by retailers to upgrade occasional
customers into frequent purchasers
– Software can be used to analyze a company’s
customer list and a year’s worth of sales data to
find new market segments
14
Database Design:
-before we can even start asking a dbms to set up
a database structure for us, we must know what data we
need to store and in what form
-this work of determining what our database should
look like is called database design
- it is the ‘up front’ work in using dbms
- - a first step in database design is data modelling
we look at a business and see what main items we
want to keep data about [data entities] and what are the
organizational relationships between these items
we then draw a diagram of entities and relationships
between them; this is called data modelling
- we draw an entity-relationship diagram
(see board)
STUDENT
takes
COURSE
Is taken by
- from this first step a data analyst would eventually
determine a database design (schema) (what tables we
will have and where the common columns will be)
Designing Databases - Associations
Entity Relationship Diagram (ERD)
• Diagramming tool used to express entity relationships
• Very useful in developing complex databases
Example
• Each Home Stadium has a Team (One-to-One)
• Each Team has Players (One-to-Many)
• Each Team participates in Games
• For each Player and Game there are Game Statistics
17
More design
• The entity-relationship diagram is then
augmented to show attributes of each entity
-Then, a large ERD is “trimmed down’
(normalized) and after that, entities usually
become tables and attributes become
columns in the tables
Relationships in an ERD
• Relationships between entities in the ERD
dictate where the common column should
be to relate certain tables
The Relational Model - Example
20
REA Model
• - an ERD where the entities model specific
business RESOURCES, EVENTS and
AGENTS
is called a Resource Event Agent model
- This model is useful in designing databases
for various business information systems
(e.g. modern accounting systems)
REA Example
• COOKIES(resource)
Offered for
SALE(ev) org by
SALESPERSON (ag)
attended by
CUSTOMER(ag)
REA Example
23
Why is database knowledge important?
how you structure your data, so you shall be able
to process your data
-***a database is the ‘back end’ of a computer-based
information system
-See handout (MS Word) on Data Modeling Explanation