DBMS - Computer Information Systems

Download Report

Transcript DBMS - Computer Information Systems

ICS321 / IBM205 Management
Information Systems
Dr. Ken Cosh
Databases: Foundations of
Business Intelligence
Review
• Last week we looked at;
– Hardware
• Input, Output devices, Processors, Memory
– Types of computer
• Mainframe, Midrange, PC’s
– The emergence of Client Server relationships
• Thin Clients, Fat Clients, P2P
– Networks
• Centralised, Decentralised, Distributed
This Weeks Topics
• The ‘Data’ Resource
– Organising Data
– Databases
The Importance of Good Data
• Garbage In, Garbage Out;
– Bad data will result in bad information (hence bad
decisions)
• An effective information system depends not
only on having good data, but on storing,
organising and accessing the data.
• Challenges to this include;
– Organisational Obstacles
– Data Integration and Quality
Organisational Obstacles
• Implementing new data models requires reexamining the role of data within an
organisation,
– Who has access to what data, and when?
– Changing the allocation (or sharing) of data can
impact on current power relationships, and so is often
met by political resistance.
• Traditionally data is stored in file format, with
each department having a selection of files.
Modern approaches introduce databases and
DBMS that can be shared across multiple
departments
Data Integration and Quality
• When moving data from a file format to a
database (or sharing data from one source
across multiple systems), converting data
formats can be costly, and risks losing
important information.
Traditional File Format
•
•
•
•
Bit (binary digit) – smallest unit a computer handles.
Byte – Group of bits which can represent a character
Field – Group of bytes which can represent a word
Record – Group of related fields (such as name, course,
grade, date)
• File – Group of records of the same type (such as a
group of students records)
• Database – Group of related files (such as grouping the
student’s personal history, financial records)
More terms
• Entity – a person, place, thing or event for
which data is stored about.
• Attribute – Any characteristic or quality
describing an entity, for example ‘age’.
• Key Field – Every record in a file, should
contain at least one key field, which
uniquely identifies instances of that record,
so that they can be retrieved, updated or
sorted, for example ‘order number’.
Key Terms
Database
Course File History File Finance File
File
NAME
John Stewart
Sarah Smith
Ken Cosh
COUR
ICS321
ICS321
ICS321
DATE
2005
2005
2005
GRADE
C+
B
A+
Record
NAME
John Stewart
COUR DATE
ICS321 2005
GRADE
C+
Field
John Stewart (NAME Field)
Byte
01001010 (Letter J in ASCII)
Bit
0
So What’s the Problem?
• Remember Systems within systems
(subsystems), interfacing systems and
adaptive systems?
– Each system tends to grow and adapt
independently.
– Functional units develop systems isolated
from other units.
– Each functional unit develops many
databases; personnel has personnel, payroll,
medical insurance, pensions, mailing file….
Problems
• Data Redundancy and Confusion
– Duplicate Data in multiple data files.
– The same data can have different names,
different meanings, different related data in
different places.
– The same name might be used for different
data in different places.
– Database confusion makes implementing a
SCM, CRM or Enterprise wide system difficult.
Problems 2
• Program-Data Dependence
– There is a tight relationship between the data in files
and the programs using them.
– Any changes to the data, results in necessary
changes to the programs that use the data.
– Maintaining data becomes costly.
• Lack of Flexibility
– Scheduled reports can easily be generated from the
data.
– Ad Hoc reports however are costly to generate.
While the information is somewhere in the system
getting it out is tricky.
Problems 3
• Poor Security
– Or poor control.
– There is now a lot of data in a lot of databases
throughout the organisation. It is difficult to control or
manage the data – who is accessing what data?
• Lack of Data Sharing & Availability
– With poor control over data, its difficult to share data
between functions.
– Accounts might benefit from some data that
manufacturing has, etc.
Databases
• “Group of related files (such as grouping
the student’s personal history, financial
records)”
• Ideally a database is a group of related
data which can be used to serve many
different applications, by centralising data
and minimising redundancy.
Data Management
Employees
Name
Address
ID
Position
Payroll
Hours Worked
Pay Rate
Tax
Gross Pay
Benefits
Life Insurance
Golf Membership
Healthcare
Retirement
Personnel
Applications
DBMS
Payroll
Applications
Benefits
Applications
DBMS
• The DBMS sits between the actual data and the
applications which use the data.
• This saves the user from needing to understand
the actual physical way the data is stored,
instead presenting a logical view of it.
• The user doesn’t need to know the data
definition language, but instead could use a data
manipulation language such as SQL.
• In reality often the manipulation language is
hidden within an application.
DBMS
Data Definition
Data Manipulation
Creating & Changing the
logical structure of a
database
Querying & making
changes to the information
Database
Application
Generation
Data Administration
Menus, data entry
screens, reports and
application software
Who can see what
information; methods
for backup and
recovery
Hierarchical Database
ROOT
FIRST
CHILD
Performance
Ratings
SECOND
CHILD
Employee
Compensation
Job Assignment
Benefits
Salary History
Pension
History
Life Insurance
Health
Hierarchical Data
• Suppose from the previous data structure,
we wanted to access the salary history for
all people with the job title “Assistant”,
accessing that data would not be easy.
• While certain scheduled reports can be
generated, ad hoc reports are not as
flexible.
Relational Databases
• Data is organised into tables, which could
be visualised as a spreadsheet. In each
table data is organised into rows / records
(or tuples).
• Any piece of data from any table can be
linked to any piece of data in another
table, so long as they have a common
data element (field).
Designing Databases
• Designing Relational Databases normally
begins from building an Entity Relationship
Diagram (E-R Diagram).
– An Entity is a person, place, thing, or event for
which data is collected and maintained.
– Deciding what data should be stored about
each entity
– Defining the relationships between entities.
• A customer can have many order numbers.
• A doctor treats many patients.
E-R Diagrams
1
M
Has
Customer
1
M
Treats
Doctor
1
Manager
Order No.
Patients
1
Leads
Department
Normalisation
• Process of creating small stable data structures
from complex groups of data.
• Relationships between entities can be ‘one to
one’, ‘one to many’ or ‘many to many’.
• An E-R diagram can become very complex with
many data elements to be stored for many
entities with complex relationships.
• Normalisation attempts to break entities down
into smaller entities, and tries to remove
complicated ‘many to many’ relationships.
Designing Databases
• Another key factor in designing databases is
designing how they will be distributed
– Will there be one central database server with clients
accessing the data?
• Single point of Risk
• Powerful and expensive server required.
– Will there be distributed data base servers, in multiple
physical locations?
– If distributed how and when will the data be updated?
(Batch process overnight?)
– What happens if 2 replicated databases have
conflicting updates?
SQL
• Structured Query Language
– The commonest data manipulation language for
relational databases.
– Used to query (get information from) databases, and
also to put information into databases.
• SELECT Part_Number, Supplier_Code FROM
Part WHERE Unit_Price < 25.00;
– This would return the all the part numbers and
supplier codes from the database called ‘part’ which
cost less than 25.00.
SQL
• Suppose we want to know;
– “By actual vs budgeted, how many size 8
shoes in black did we sell last month in the
southeast and southwest regions, compared
to the same month over the past 5 years.”
• A complex request…
– If we can build the query, it could effect the
performance of the database system!
• Especially if it is a live operating system.
Trends in Databases
• Multidimensional Visualisations
– Often managers want to see relationships
between large amounts of data.
• Sales of 4 different projects over a period of
months versus projected sales.
– Can implement a multidimensional
database…
– Or a multidimensional visualisation tool.
– When further dimensions need to be added,
virtual reality or embedded data can be used.
Trends in Databases
• Data warehouse
– A database of current and historical data which can
be accessed by anyone within the organisation.
– Backups are made of all the data from all applications
and stored in the warehouse.
– Data in the warehouse can not be changed though, it
is simply a record of company history, which can be
used to find trends etc.
• A data mart
– A subsection of a data warehouse specifically
targeted towards a certain group of people.
• Data mining
– The process of extracting patterns and or rules from a
data warehouse. Often used by marketing
departments to profile customers.
Object-Oriented DB
• Hierarchical and Relational databases assume
that data is in character or numerical form.
• How about databases that store data which can’t
easily be represented in files and tables (such
as graphics, sounds, java applets or any other
multimedia).
• O-O databases are designed to deal with these
diverse data types, however they tend to be a lot
slower than relational databases.