Normalisation - Pravin Shetty > Resume
Download
Report
Transcript Normalisation - Pravin Shetty > Resume
Lecture 3
This lecture will introduce
• More aspects of database design
• Modelling and transformations
• Normalisation
• Some examples of modelling
• And an excursion into the land of Non Structured Data
CSE3180 Summer 2005 Lect 3 / 1
Data Structure Diagrams
Requirements Analysis
Conceptual Design
Physical Design
Implementation Design
CSE3180 Summer 2005 Lect 3 / 2
Conceptual Design
- Create an enterprise (total) model
- Normalise user views
- Integrate user views
- Create a database
- Review
CSE3180 Summer 2005 Lect 3 / 3
Modelling User Requirements
DATA
Entity Relationship Modelling (Semantic)
Normalisation (Minimality) - meaning the minimum
number of attributes to define the Primary Key
PROCESS
Data Flow Analysis
Procedure Formation
Functional Analysis
CSE3180 Summer 2005 Lect 3 / 4
Well Structured Data
Advantages
• Better integration of applications through shared access
to common data.
· More adaptable systems because files and databases will
follow the natural structures inherent in data rather than
application oriented relationships.
· Processing should be simpler if data is held in its simplest
least redundant form.
CSE3180 Summer 2005 Lect 3 / 5
Data Focussed Development
1. Develop broad data model and broad function model.
2. Partition data model by function.
3. Develop detailed data model by each functional partition.
4. Develop the detailed procedure models by functional
partition.
CSE3180 Summer 2005 Lect 3 / 6
Primary Key Selection
A data item or element that describes one entity type.
STUDENT RESULT
StudentNo Unit Code
Year
Semester
Grade
S1
CSE4230
2003
1
N
S1
CSE4230
2003
2
HD
PRIMARY KEY or IDENTIFIER
· Some or all of the attributes describing an entity type serve
to uniquely identify each entity instance.
CSE3180 Summer 2005 Lect 3 / 7
Value Sets or Domains
Each attribute type supports one entity type and is often
based on an underlying domain or value set.
EMPLOYEE
EmpNo Emp_Gender
Emp_Name
E1
Jones
Female
Emp_Hire_Date
20/6/1989
DEPENDENT
Empno
Family_Name First_Name Gender Birthdate
E1
Jones
Amelia
Female 23/6/1986
CSE3180 Summer 2005 Lect 3 / 8
Synonyms or Aliases
· Be careful to differentiate between things and their names.
ITEM = PART = PRODUCT ???
· SYNONYM or ALIAS
The same object called different names by different
people.
· This problem concerns entity types, relationship types and
attribute types.
CSE3180 Summer 2005 Lect 3 / 9
Homonyms
Different things called the same name by different people ??
EMPLOYEE START DATE
· In the Personnel area this may mean the date an employee
started with the company.
· In a given department this may mean the date an employee
started with the department.
· This problem also concerns entity types, relationship types
and attribute types.
CSE3180 Summer 2005 Lect 3 / 10
Design and Data Structures
Data structures are the bricks and mortar that hold
databases together.
Data structures (for the ANSI/SPARC standard) are defined
in the internal model level and implemented in the physical
data organisation.
Data structures are often hidden from the application
programmer, since they are primarily used by the DBMS and
Operating System.
Understanding data structures is important for performance
reasons, to improve program design and allow easier
communication with DBMS specialists.
CSE3180 Summer 2005 Lect 3 / 11
Goals of Relational Design
What Relations should exist and what Attributes should
they contain?
Avoid Redundancy if possible
- minimise storage space
Avoid Anomalies
Avoid Nulls (but be careful)
Avoid Joins which produce spurious rows
CSE3180 Summer 2005 Lect 3 / 12
Some Assumptions
A group of attributes has a natural “inherent” structure
This structure is independent of the way the data is used
Normalisation (meaning : to be able to access complete, clear
and correct data - to and from tables)
• introduced by E.F. Codd together with relational database
theory
• originally Codd defined three normal forms
• later expanded to Boyce-Codd and fourth and fifth
normal forms
CSE3180 Summer 2005 Lect 3 / 13
A Sample Data Structure
PRIME _MINISTER
PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP
MARRIAGE
PM_NAME SPOUSE_NAME MAR_YR PM_AGE NR_CHILDREN
MINISTRY
MIN_NR PM_NAME PARTY DAY_COMM MTH_COMM YR_COMM
CSE3180 Summer 2005 Lect 3 / 14
Normalisation
Formal measure of why one grouping of attributes
may be better than another
Each Normal Form requires that a Relation satisfies
criteria for that form and this eliminates a different
kind of redundancy
Normalised Relations will remain consistent following
database operations and will store each fact only once
Database operations applied to unnormalised relations
may lead to anomalies
PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP
CSE3180 Summer 2005 Lect 3 / 15
Some Anomalies
Relation : ASSIGN
Person-Id Project_budget
Project
Time Spent on Project
S75
32
P1
7
S75
40
P2
8
S79
32
P1
4
S79
27
P3
1
S80
40
P2
5
-
17
P4
-
Null Values are considered to be anomalies
CSE3180 Summer 2005 Lect 3 / 16
Anomalies
Person-Id Project_budget
Project
Time Spent on Project
S75
32
P1
7
S75
40
P2
8
S79
32
P1
4
S79
27
P3
1
S80
40
P2
5
-
17
P4
-
Insertion Anomaly
add row ASSIGN values( ‘S85’,35,’P1’,9)
- two conflicting budgets for P1 S75 32 P1 7
Deletion Anomaly
S79 32 P1 4
delete row ASSIGN (‘S79’,27,’P3’,11)
- removes project budget for P3 S79 27 P3 1
CSE3180 Summer 2005 Lect 3 / 17
Functional Dependencies
- the values of one set of attributes affect the values
of another attributes
X
Y
The value of X determines the value of Y
The value of Y depends on the value of X
The simplest case is 1 attribute determines another single
attribute
AND the Primary Key value determines the values
of the non key attributes in each row.
CSE3180 Summer 2005 Lect 3 / 18
Functional Dependencies
Project
Project Budget
Person-Id
Project
Time Spent on Project
Functional Dependency Diagram
Project
Project Budget
CSE3180 Summer 2005 Lect 3 / 19
Functional Dependencies
• Two attributes are FUNCTIONALLY DEPENDENT if a
value for ONE attribute specifies a SINGLE value for the
other attribute
Grade
Rate of Pay
Driver
265.00
Crane Driver
310.95
Keyboard Operator 215.55
Systems Analyst
450.00
The Rate of Pay is functionally dependent on Grade
or, Grade determines Rate of Pay
or, grade----> Rate of Pay. Grade is a DETERMINANT
CSE3180 Summer 2005 Lect 3 / 20
Functional Dependencies
Some Terms: Primary Key Candidate Key
Foreign Key Concatenated Key
Non-Ambiguous Keys: An attribute (or attributes) of a table is
the Primary Key for that table if
1. All attributes in the table are functionally dependent on the
Primary Key
AND
2. Where the Primary Key is a collection of attributes
(concatenated key), no subcollection of the attributes used
also exhibits the functional dependence property
CSE3180 Summer 2005 Lect 3 / 21
Functional Dependencies
Translation:
1. The primary key will identify a given row (unique)
2. No portion of the primary key can be an independent
primary key
PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP
CSE3180 Summer 2005 Lect 3 / 22
Minimal Numbers of FD’s
Or, rationalise (minimise) the number of tables
in the physical database
Employee
Department
Employee determines Department
Department
Location
Department determines Location
Employee
Location
This FD is redundant.
It is known as a ‘Transitive Dependency’ and would be
excluded from the design model. (E-R diagram)
Employee determines Location
CSE3180 Summer 2005 Lect 3 / 23
Normalisation
1st Normal Form
Repeating groups must not occur
Unit
CSE3180
CSE2138
CSE9020
Student No. Result
9142717
9131618
9077184
8967384
8737980
9142717
6932475
C
D
P
N
P
P
HD
Name
Wilson
Renoir
Gilbey
Breton
Balzac
Wilson
Gilbey
1st Normal Form
CSE3180 Summer 2005 Lect 3 / 24
Normalisation
Corrected Table
Unit
CSE3180
CSE3180
CSE3180
CSE2138
CSE2138
CSE9020
CSE9020
StudentNo
9142717
9131618
9077184
8967384
8737980
9142717
6932475
Result
C
D
P
N
P
P
HD
Name
Wilson
Renoir
Gilbey
Breton
Balzac
Wilson
Gilbey
The stored
data must
be complete
Formally expressed as:
Results(unit code, studentno,result,name
CSE3180 Summer 2005 Lect 3 / 25
Normalisation
2nd Normal Form
Designed to eliminate Update Anomalies (4 categories)
Examples
1. Update
Where a Description alteration requires
total file scan and alterations made
2. Inconsistent Data Possibility of a code having more than
one description
3. Additions
Condition where a row cannot be fully
identified e.g. subject code only
4. Deletions
If subject Code CSE9020 is deleted,
location of associated data is not
possible
CSE3180 Summer 2005 Lect 3 / 26
Normalisation
3rd Normal Form
Final Stage ---> 3rd Normal Form
Watch for Determinants
Primary Keys are Determinants
Candidate Keys are Determinants
A table is in 3rd Normal Form (3NF) if
(i) It is in 2nd Normal Form and
(ii) Determinants are Candidate Keys
CSE3180 Summer 2005 Lect 3 / 27
Normalisation
As a check, review
1. Update
2. Inconsistent Data
3. Additions
4. Deletions
for Conditions and Difficulties of access (and retrieval)
CSE3180 Summer 2005 Lect 3 / 28
Normalisation Review
A relation is in 3NF if, and only if, the NONKEY ATTRIBUTES
are
(1) Mutually Independent
(2) Fully Dependent on the Total Primary Key
Or: A relation is in 3NF if, and only if, for the life of the
database, each row consists of a Primary Key Value which
identifies some Entity instance, together with a set of Zero
or more mutually independent attribute values which
describe the Entity in some way. (Date, C.J)
Or: Each field must represent a fact about the key, the Whole
Key and Nothing but the Key. (Kent, W)
CSE3180 Summer 2005 Lect 3 / 29
Normalisation - Examples
1. 1NF or First Normal Form Rule: Each Row
MUST CONTAIN the same number of columns
Example:
Course Instructor table
class code
lecturer
tutor
tutor
C3576
K567
B6745
R3289
Doe,J
Jones,R Smith,V
Nguyen,L
Fabbri,M Ong,W
Pratt,W
Archer,V Barrat,N
tutor
Ng,K
The number of columns (attributes) is not consistent
Create a table of TUTORS with Class Codes
CSE3180 Summer 2005 Lect 3 / 30
Normalisation - Examples
2NF: 2 criteria (1) The table must be in 1NF
(2) every attribute which is NOT part of the
Primary key must be functionally
dependent on the Primary Key
Building Room Seats No. of Levels
A
214
85
4
A
242A 25
4
B
213
135
6
The number of levels is dependent on Building, not Building +
Room.
Create another table - ‘building table’
CSE3180 Summer 2005 Lect 3 / 31
Solution
1 table :
Accommodation(building, room, seats)
Another table : Building(building, levels)
CSE3180 Summer 2005 Lect 3 / 32
Normalisation - Examples
3NF : A table is in 3rd Normal Form
if
(1) the table is in 2NF
and (2) every attribute which is NOT part of the Primary Key
is functionally dependent ONLY on the key meaning it is not dependent on any other non-key
attribute
CSE3180 Summer 2005 Lect 3 / 33
Normalisation - Examples
This table is NOT in 3NF:
ClassCode Lecturer
C3567
Doe,J
K4567
Fabbri,M
B7645
Nguyen,L
R3289
Pratt,W
ClassCode
C3567
K4567
B7645
R3289
Lecturer’s Office Phone
101 Bldg A
32412
312 Bldg B
31523
209 Bldg F
31518
406 Bldg E
32581
Lecturer
Lecturer Lecturer’s Office Phone
Doe,J
Doe,J
101 Bldg A
32412
Fabbri,M Fabbri,M 312 Bldg B
31523
Nguyen,L Nguyen,L 209 Bldg F
31518
Pratt,W
Pratt,W
406 Bldg E
32581
CSE3180 Summer 2005 Lect 3 / 34
Normalisation
Some Thoughts:
•
Databases should be constructed so that data occurs
only ONCE.
•
Redundant or repetitive data may produce
inconsistencies in the database during update
e.g. a person’s address is in 1 table and row only, and
other rows in other tables have keys referring to it.
(known as referenced and referencing keys/tables)
• A quick design test - ‘If a row is deleted, will valuable data
be lost ?’
If YES, remove the valuable information and place it in
another table
CSE3180 Summer 2005 Lect 3 / 35
Normalisation
Diagrammatic Functional Dependencies
supplier
name
p.k
supplier
number
status
city
part
name
p.k
part
number
Notice that these
attributes are
dependencies of the
primary key
color
mass
city
city assets
This attribute
IS NOT
CSE3180 Summer 2005 Lect 3 / 36
Other Normalisation Forms
Other forms of Normalisation: 4th Normal Form
4NF
A row should NOT contain two or more independent
multivalued facts about an entity.
Facts are independent of each other when there is no direct
connection between them. Generally occurs in a many to
many relationship
Example: A book publishing company can have many
authors producing many books
The books would (normally) be sold from many
booksellers stores
CSE3180 Summer 2005 Lect 3 / 37
Other Normalisation Forms
5th Normal Form
A table is in 5th Normal Form when its data can no longer be
decomposed into smaller tables without each row having
the same Primary Key
CSE3180 Summer 2005 Lect 3 / 38
MultiValued Dependencies MVD’s
Functional Dependencies cannot express all cases of
associations between attributes
Consider
PERSON and SKILL(s)
A PERSON may have one or many SKILLS
A SKILL may be owned by many PERSONS
PERSON --- > SKILL
SKILL ----> PERSON
Neither of these expressions cover the above statements
CSE3180 Summer 2005 Lect 3 / 39
MultiValued Dependencies MVD’s
An MVD defines that an ATTRIBUTE is always associated
with a given set of values of another ATTRIBUTE
PERSON --->> SKILL
Interpretation:
a single value of a set of attributes
determines a set of values of other attributes
AND
this set of values is the same, independent of
any other dependencies of the first named
attributes
PERSON ---->> skill_id, date_acquired
CSE3180 Summer 2005 Lect 3 / 40
Advanced Examples 4NF, 5NF
•
SALES_AREAS
Sales_Rep
Joe
Joe
Helen
Helen
Helen
Joe
Joe
Helen
Helen
Helen
Customer
BH Store
BH Store
MaxiStores
MaxiStores
MaxiStores
8-Up
8-Up
BH Store
BH Store
BH Store
Product_Class
Cosmetics
Haberdashery
Cosmetics
Clothing
Shoes
Cosmetics
Haberdashery
Cosmetics
Clothing
Shoes
CSE3180 Summer 2005 Lect 3 / 41
Advanced Examples 4NF, 5NF
In this table, there are MVD’s
Sales_Rep --->> Customer
Sales_Rep --->> Product_Class
FOURTH NORMAL FORM:
A relation is in 4NF if, for any multi-values dependency
X --->> Y, there are no attributes other than those in X and Y
In the table , Sales_Rep --->> Customer, BUT
also --->> Product_Class
Solution: Decompose SALES_AREAS into:
Sales_Rep(Sales_rep, Product_Class)
Areas(Sales_Rep, Customer)
CSE3180 Summer 2005 Lect 3 / 42
Advanced Examples 4NF, 5NF
FIFTH NORMAL FORM:
Restrictions on Dependencies (also illustrates a Business
Rule)
BH Store does not sell Haberdashery
MaxiStores do not sell Cosmetics
Thus the SALES_AREA cannot be decomposed as shown. It
can be decomposed into
Sales_Rep(Sales_Rep, Product_Class)
Areas( Sales_Rep, Customer)
Sells(Customer,Product_Sold)
CSE3180 Summer 2005 Lect 3 / 43
Advanced Examples 4NF, 5NF
IF not all Sales_Reps of a given Product_Class can sell that
Product_Class to a store, then the resultant table, based
on the original table, is
Sales_Rep
Customer
Product_Class
Joe
BH Store
Cosmetics
Helen
MaxiStores
Clothing
Helen
MaxiStores
Shoes
Joe
8-Up
Cosmetics
Joe
8-Up
Haberdashery
Helen
BH Store
Clothing
Helen
BH Store
Shoes
This relation cannot be decomposed and is in 5NF
CSE3180 Summer 2005 Lect 3 / 44
In Practice --1.Oracle insists that modelling to 3rd normal form is adequate
for all commercial databases
2. Since the emergence of the 4th and 5th forms, database
software (in the case of Oracle this is PL/SQL) is able to
provide the processing necessary to accommodate special
or extended control over data in database tables
3. Part of the development of a database is focussed on the
presence and understanding of ‘Business Rules’
4. There will be some examples of Business Rules in a later
lecture
5. You will need to develop Rules for your database as part of
its development and control
CSE3180 Summer 2005 Lect 3 / 45
Stages in Data Modelling - 1
1.
2.
3.
4.
Analyse User Requirements - User Views
Aggregate Functions - Highlight Synonyms
Prepare first cut of Entities
Determine Cardinality of Entities
– Develop E-R Diagrams
5. Determine Relationships 1:1, 1:M, M:N
6. Introduce Attributes - Data Structures (use Natural
Mapping)
7. Analyse Candidate Keys for Primary, Foreign and Access
key functions
8. Determine success of relationships
9. Review (if necessary) the E-R Diagram
CSE3180 Summer 2005 Lect 3 / 46
Stages in Data Modelling - 2
• Select and Test the Primary and Foreign Keys for each
table
• Select Candidate keys for access
• Normalise
• Test Normalisation
CSE3180 Summer 2005 Lect 3 / 47
Stages in Data Modelling - 3
• Review E-R Diagram and Data Structure Diagrams if
necessary
• Examine Data Structures/E-R Diagrams against original
views, requirements e.g. report contents
• Obtain confirmation and acceptance from User(s)
• Move to Physical Design Considerations
and, don’t forget the documentation !
CSE3180 Summer 2005 Lect 3 / 48
Data Modelling - Some Guidelines
• Do not prematurely combine entities into tables
• Concentrate on Access Mechanisms which can be shared
among requests
• Deviate from the model in a responsible manner (i.e. get
user acceptance, enthusiasm)
• Use table/view and attribute names which closely reflect
the data model names (Natural Mapping)
• Do not define multiple attributes as one composite attribute
in a table
CSE3180 Summer 2005 Lect 3 / 49
Database Design Guidelines
• Watch data types for each attribute, especially complex
keys
• Develop an ‘architecture’ to support ‘Business Rules’ for
procedures, integrity and access
• Look for developing technology trends
• Determine the reason/purpose of the proposed database
CSE3180 Summer 2005 Lect 3 / 50
Database Design Guidelines
• Determine the entities supported by the database - what
are the ‘main’ collections of data --> entities
• Determine the attributes of each entity
• Determine the Entity relationships
• Determine the Entity-Relationship types 1:1, 1:M, M:N
CSE3180 Summer 2005 Lect 3 / 51
Database Design Guidelines
Carefully assess the Constraints :
– Domain value sets
– Data types
– Calculations
– Unique keys
– Primary Keys
– Nulls (or even better, Not Nulls)
– ‘Special Processing’ requirements
– In a distributed environment, are the constraints ‘local’
or ‘global’
These are examples of the existence of Business Rules
CSE3180 Summer 2005 Lect 3 / 52
Database Design Guidelines
• Determine how the relationships can be related - Primary
Key/Foreign Key. Are additional attributes needed ?
• Re-assess your design - Logical, Can the objectives of (1)
be achieved ?
If no, what addition/reduction of Entities or Attributes is
necessary ?
• Modify your design as required, and re-evaluate
(it is an ITERATIVE Process)
CSE3180 Summer 2005 Lect 3 / 53
Some Common Design Problems
• Not all attributes in a table relate to the same entity or
subject. e.g. mixing passengers and aircraft details in the
same table
Hint: Re-examine the Primary Key and determinant
aspects
• Blank attributes in tables (no values)
Generally means they belong in another table
• Tables contains the same data-items
Can indicate that aggregation or consolidation might be
necessary
CSE3180 Summer 2005 Lect 3 / 54
Some Common Design Problems
• Continuous Design alterations
01. Normal during the logical design stage
Some suggestions:
Make prototypes, use sample data, determine
weaknesses, oversights
Consult with the user. Get approval / acceptance
Modify, re-design
02. Unworkable at physical design stage
A clear indication of a rushed or poorly developed logical
design. May be more appropriate to start again (and may
also be more cost effective)
CSE3180 Summer 2005 Lect 3 / 55
And those Key Terms ?
• A candidate key is an attribute or set of attributes whose
values uniquely identify individual rows of the entity set
• One candidate key is selected to act as the unique
identifier for instances of the entity. This candidate key is
called the primary key. The remaining candidate keys are
known as alternate keys
• A composite key is a candidate key which consists of more
than one attribute
• A foreign key is a column, or set of columns, which links
each referencing row in the table containing the foreign key
to the row in the referenced table containing the matching
candidate key value
CSE3180 Summer 2005 Lect 3 / 56
So far , we have dealt with ‘structured data’ All data in each table has the same form and
format
We are now going to look at some aspects of
Non Structured Data
CSE3180 Summer 2005 Lect 3 / 57
Data Management
Now we will look at:the imbalance between Information sourced from
structured data storage systems, mainly databases
and Information which is necessary for informed
management decision making processes based on non
structured data
CSE3180 Summer 2005 Lect 3 / 58
Non Structured Data
Up to date, we have studied some of the methods of file
organisation associated with efficient data management
and retrieval as embodied in the relational data base
model.
However, although industry has invested large amounts of
money in relational database applications, the greater
percentage of data access requirements are made of nonstructured data sources (a classic one being the World
Wide Web).
CSE3180 Summer 2005 Lect 3 / 59
Non-Structured Information
• Much effort and expense has been directed at the design,
capture, processing, storage and retrieval of data in a
structured form.
• There are a number of database management systems
(DBMS) which can store many trillions of bytes in
centralised, distributed and client/server based systems
• There are extensive backup, recovery and restart
procedures in place to ensure persistence and continuity of
data
CSE3180 Summer 2005 Lect 3 / 60
Non-Structured Information
The ratio of unstructured to structured information in many
organisations is approximately 8 to 1
It is easy to conclude that the the most important
component which drives much of the decision making in
key business processes is badly neglected.
Familiar examples are the World Wide Web,
Corporate intranets
email
on line discussion groups
CSE3180 Summer 2005 Lect 3 / 61
Non-Structured Information
Why is there this bias ?
Perhaps because structured information management has
been synonymous with information systems design ?
The technology for unstructured data management is
powerful, pervasive and well understood
CSE3180 Summer 2005 Lect 3 / 62
Non-Structured Information
Businesses are becoming more interconnected
Each new connection is made by, and relies on, exchange
of Information
One aspect of these ‘connections’ is the unpredictability of
the nodes and the nodal content
This impacts on the structured information model
CSE3180 Summer 2005 Lect 3 / 63
Non-Structured Information
The term ‘unstructured’ was invariably associated with
‘documents’.
They are a medium which we understand and use
There are other forms of unstructured data
audio
voice
images
graphical objects
These are forms of electronic documents
CSE3180 Summer 2005 Lect 3 / 64
Non-Structured Information
These forms are ‘unstructured’ because their exact content
and organisation are unpredictable
Unstructured information is any information type made up
of content which does not fit a predefined, descriptive
model or arrangement
It would be possible to impose (or superimpose) a structure
on a document to make document selection possible, but
there would be a cost
CSE3180 Summer 2005 Lect 3 / 65
Non-Structured Information
A document may need to be tracked using
author name
title
filing date
a short abstract of the content (sound familiar ?)
The effect of this is that the ‘content’ of the document can
now be accessed only through these 4 ‘keys’
Question : How could a lengthy research task be
categorised by value or content using this technique ?
CSE3180 Summer 2005 Lect 3 / 66
Non-Structured Information
That is another way of saying
the who
the when
the where
the why
and the how - realistic in today’s environment ?
So what is different with today’s decisions ?
The difference is the nature and kind of decisions made
CSE3180 Summer 2005 Lect 3 / 67
Non-Structured Information
Today’s business is driven by increasing rates of change
We have shifted from an industrial economy to a
knowledge driven economy
The need more information to support the decision-making
processes, and the dynamic nature of the business
environment means that the support from structured
information systems is starting to be inadequate
However there is a possibility of ‘information overload’ one of the accepted criteria of ‘knowledge work’ is a
competency to manage the increasing amounts of
information
CSE3180 Summer 2005 Lect 3 / 68
Non-Structured Information
In the current environment, the ‘products’ are now our
ideas
Ideas are driven by information
The term ‘Globalisation’ is appropriate
We think and we change or modify our plans - and this
has led to the loss of predictability
The the ‘knowledge environment’ business success
depends on the ability of knowledge workers to sift through
all of the available unstructured resources and to make
decisions - and faster than the competitors
The measurement of success is in degrees of innovation
CSE3180 Summer 2005 Lect 3 / 69
Non-Structured Information
So, what are the ‘sources of information’ ?
Corporate document bases
The Internet
The Extranets
Information subscription services
Dialog with Customers, suppliers, competitors
CSE3180 Summer 2005 Lect 3 / 70
Non-Structured Information
Are there any problems with this ?
In practice there are 2, and they are -
The 2 major problems of decision making are
1. Volume of information
2. The speed at which decisions need to be made
CSE3180 Summer 2005 Lect 3 / 71
Non-Structured Information
Information retrievals have moved away from the filtering of
unstructured data into a structured environment.
The emerging models accommodate the capture of
resources (and access) which leads to a dynamic and
unfiltered information repository which consists of joined
but separated sources.
Web sites are sought and searched via the Internet, and
possibly a corporate document repository is included in the
search.
CSE3180 Summer 2005 Lect 3 / 72
Non-Structured Information
Some of the information found will be transferred to a more
structured repository - possibly a competitive analysis
database.
A search in this way provides a set of search results but
does not change the information sources.
The Importance of Tools:
Users don’t normally have the time nor the skills to ‘get on
top of a variety and changing set of tools’ (what is your
experience with the various search engines you have used
on the Internet ?)
CSE3180 Summer 2005 Lect 3 / 73
Non-Structured Information
What is needed is a retrieval tool (hardware and software)
which understands how to work with different repositories
This leads to the ‘repository management system’ being
able to recognise the form and requirements of different
search tools or, if you like, software which recognises and
communicates exactly and completely with other software
- this is ‘intelligent’ software
Another aspect is the nature of a search - this has changed
from simple words or phrase retrievals to ‘context’ and
dynamic analysis and categorisation
CSE3180 Summer 2005 Lect 3 / 74
A Non Structured Data Query
Which of these 2 queries is likely to reveal real and
intelligent in-depth information ?
1. What did our customers order last month
or
2. What is the correlation between discounts we’ve offered
our best customers and predict how they will buy if we
double their discounts (or perhaps restrict or cancel their
customer discount cards)
CSE3180 Summer 2005 Lect 3 / 75
Non-Structured Information
Fortunately, existing systems support many different file
types in the base application - images, word-processing
systems, desktop publishing systems, spreadsheets, CAD
files ….
And just as fortunately, there are languages such as XML
which can handle this variety of data forms and content
2 methods of access
- user defined indexes in a full functioned database (3rd
party relational databases)
- a fully integrated full-text query engine
CSE3180 Summer 2005 Lect 3 / 76
What about ???
• Security
There are highly sophisticated security schemes (also
mentioned in Portals)
• Revision Tracking and Control
Essential for accurate and up to date information (what
revision number of MS-Word are you using ?)
• Document Check-in/Check-out
Recognition and tagging of all documents - none can be
ignored or ‘go missing’
• Usage audit trails
These features are standard features
CSE3180 Summer 2005 Lect 3 / 77
Non-Structured Information
Compound document managers
These are products which provide the facilities mentioned
in the previous overheads, but they have an interesting
extension
They treat documents as a collection of ‘pointers’ to various
and different collections of information - this ensures that
‘new’ data or information will always be part of a search.
(you recall the hypertext links - and have you used the links
which are attached to my Web page ?)
Other features which ensure document integrity include
roll-back, recovery, and audit trails
CSE3180 Summer 2005 Lect 3 / 78
Non-Structured Information
There is a single focal point around which unstructured
information management practice and technology have
converged
No surprise - it is the Internet
This facility has enabled the widespread distribution of
unstructured information from a very large base of
resources
The World Wide Web is a hyper-linked, unstructured
information repository with millions (?) of documents
CSE3180 Summer 2005 Lect 3 / 79
Non-Structured Information
The Web has made it possible for specialised content to be
published, as well as to advertise products
Specialised on-line information subscription services exist
which provide industry-specific information (for a fee)
Stock market information services and sites
Monash University has on-line examination results services
(and enrolments ?)
CSE3180 Summer 2005 Lect 3 / 80
Non-Structured Information
Internet technology is said to be ubiquitous - that is, it can
reach anywhere - if there are suitable devices of course
All current document management systems support some
degree of support for Web technology
Capabilities range
from
basic document viewing using dynamic HTML to make the
document ‘visible’ regardless of its source format
to
advanced document access and distribution
CSE3180 Summer 2005 Lect 3 / 81
Non-Structured Information
For instance a Computer Aided Design drawing (perhaps
the Burnley tunnel?) in a user’s browser without requiring
that the remote users have CAD software installed on their
machines
Or perhaps the Mitcham - Frankston ByPass construction
plans and estimates
Was the construction of the Monash tennis courts on Sir
John Monash Drive in this category ?
CSE3180 Summer 2005 Lect 3 / 82
Non-Structured Information
Some advanced search engines: (form flexibility)
Verity Inc
Information Server
PC Docs/Fulcrum SearchServer
Inktomi Corp
Search Engine
Externalisation - knowledge management
Autonomy Inc
Semio Corp
These use a semantic or lexical analysis engine to extract
meaning from the information repository
CSE3180 Summer 2005 Lect 3 / 83
Non-Structured Information
A traditional query would look like
‘interest rates AND stock prices’ and you would receive
documents which included the 2 search words
A query expressed as ‘I am interested in the effects of
interest rate changes on stock prices’ does not have specific
key words nor phrases.
Such an expression would be have its content meaning
derived from the information base. The result would be a
categorisation of topics contained in the information base.
These would then be further analysed by the user over a
varying number of criteria
CSE3180 Summer 2005 Lect 3 / 84
A few thoughts on ‘Detail’
Retailers :
Retailing is a competitive business. Success depends on
the Retailer knowing the customers.
There are 3 factors at play
1. The changing likes and dislikes of consumers.
New product preferences are the result of an aging
population, changes in family structure, flexible lifestyles
and to no small degree, enterprise bargaining and
working conditions.
A successful retailer must be aware and adapt to these
factors
CSE3180 Summer 2005 Lect 3 / 85
A few thoughts on ‘Detail’
2. The uniqueness of each customer.
Individual needs can only be accommodated by knowing
individual requirements.
‘Loyalty’ programmes are based on this premise
3. The importance of managing inventory levels, controlling
markdowns, maintaining margins.
Static inventory results in greater interest expense. This
acts as a barrier to reinvestment of stock which is moving.
Non-moving stock will (in many cases) require markdowns
to liquidate the stock, which negatively impacts margins
CSE3180 Summer 2005 Lect 3 / 86
A few thoughts on ‘Detail’
This latter point depends on the level of success with the
first 2 factors (changes, uniqueness).
Challenges to retail businesses are:
Knowing who the customer is.
Track and capture purchase history. Identify future
needs by storing data at a meaningful level of detail
As an example, a back to school marketing program
for customers who purchased school materials last
year.
CSE3180 Summer 2005 Lect 3 / 87
A few thoughts on ‘Detail’
Understand how the customer wishes to interact.
How will information be sought about a possible
purchase ? Personal visit, phone, Internet, email ..?
How will the sale be finalised - personal, phone,
Internet, credit card, cash , account entry ?
What is the customer’s preferred method of interaction
?
Perhaps a ‘personal’ profile should be developed
Be able to track and evaluate the strength of the
relationship with each customer
Purchase history plus other contacts - e.g. warranty
service, new or updated products
CSE3180 Summer 2005 Lect 3 / 88
A few thoughts on ‘Detail’
Know what is ‘enough’ detail.
Costs are associated with keeping details. Summary
information may however result in incorrect projections
and decisions.
Historical data (such as for air conditioners as opposed
to toothpaste)
Analysis of cyclical data for air conditioners should
assist in reducing risks of commitments and distribution
of such items
The revenue possibilities should outweigh the expense
of maintaining air conditioner data
The question is - how to decide what data to store
CSE3180 Summer 2005 Lect 3 / 89
A few thoughts on ‘Detail’
Incorporate the knowledge gained of customer needs into
‘business intelligence’
This knowledge should be used to
–
–
–
–
analyse past performance
get insight into current trends
blend this information into the business plan
develop systems which accurately reflect the customers’
needs
– develop systems which interact and support the profit
model
CSE3180 Summer 2005 Lect 3 / 90
Information Retrieval
Consider the following :
When I do count the clock that tells the time,
And see the brave day sunk in hideous night,
When I behold the violet past prime,
And sable curls all silvered o'er with white:
When lofty trees I see barren of leaves,
Which erst from heat did canopy the herd
And summer's green all girded up in sheaves
CSE3180 Summer 2005 Lect 3 / 91
Information Retrieval
Borne on the bier with white and bristly beard:
Then of thy beauty do I question make
That thou amongst the wastes of time must go,
Since sweets and beauties do themselves forsake,
And die as fast as they see others grow,
And nothing against time's scythe can make defence
Save breed to brave him when he takes thee hence
Shakespeare, Sonnet No.12
CSE3180 Summer 2005 Lect 3 / 92
Information Retrieval
Or , consider this:
Now is the winter of our discontent
Made glorious summer by this sun of York
And all the clouds that loured upon our house
In the deep bosom of the ocean buried
Now are our brows bound with victorious wreaths;
Our bruised arms hung up for monuments
Our stern alarums changed to merry greetings
Our dreadful marches to delightful measures
Shakespeare: Soliloquy ? ? ?
CSE3180 Summer 2005 Lect 3 / 93
Information Retrieval
Finally, another extract:
The peasants who survived the plague found themselves
in many cases afflicted by fresh burdens, for with fewer
people to work the land, overlords demanded a standstill in
wages and a return to feudal duties in full.
But, with the shortage of labour, workers naturally expected
to be valued more highly and to be given better pay and
more freedom.
Europe 14/15th Century
A history of the world - Rjunstead ( perhaps Enterprise Bargaining ?)
CSE3180 Summer 2005 Lect 3 / 94
Well, that’s enough for this week
Enjoy your weekend, and I’ll see you next Monday
(which should be the 29th of November,2004)
CSE3180 Summer 2005 Lect 3 / 95