IT for Business
Download
Report
Transcript IT for Business
Data for Business
1
Conventional Business Tools
Paper-Based
Letters
Telephone
Fax
Teleconferencing
Etc.
2
Evolution of Data for Business
Paper-based
(Basic Infrastructure
E-Docs
(Standalone)
Network
- LAN
- WAN
3
Stand-alone Computer
Cashier
Customer Profile
Inventory
Employee Profile
Using computers in business without connectivity
4
Intranet Computer
Local Area Network (LAN)
5
Internet Computer
Wide Area Network (WAN)
LAN
LAN
Unsecured Network
LAN
LAN
Bad Guy
6
Wireless Network
WLAN (Wireless LAN)
Wi-Fi
• Wireless network in computer systems which
enable connection to the internet or other
machines
More convenient but more exposed to
public
Need better protection
• Use data encryption
7
Levels of Data Access
Executive
Manager
Employees
Outside Organization
Within Organization
8
Data Sharing
We need to:
• Guarantee each worker access to the
right information, at the right time, from
the whatever source
We need to:
• Provide each worker with the
appropriate interfaces to work with this
information and make decision
9
Scope of Data Sharing
Private (internal use)
• LAN (Intranet)
Public
• WAN (Internet)
10
Why Go Public?
Increase Productivity
• Online transaction
Open business opportunities
• Create partnership
11
Data Management
Centralized System
• Easy to manage
• Can lead to bottleneck problem at peak
times
Distributed System
• Hard to manage
• Provide better performance and
scalability
12
Centralized System
Client 2
Client 1
Client 3
Client 4
Client 5
dB
Server
Client 6
13
Distributed DBMS
Data Partitioning
14
Questions of Concern
What can be shared and what cannot be?
Is Data Privacy guaranteed by using IT
systems?
Is our current system sufficiently useful?
What do we really need?
15
Symmetric Cryptography
http://msdn.microsoft.com/en-us/library/aa480570.aspx
16
Asymmetric Cryptography
http://msdn.microsoft.com/en-us/library/aa480570.aspx
17
Data Restriction
Public
• Information which may or must be open to the general public. It is defined
as information with no existing local, national or international legal
restrictions on access.
• Example: Course Catalog
Sensitive
• Information whose access must be guarded due to proprietary, ethical, or
privacy considerations.
• Example: Date of Birth, Ethnicity
Restricted
• Information protected because of protective statutes, policies or
regulations. This level also represents information that isn't by default
protected by legal statue, but for which the Information Owner has
exercised their right to restrict access.
• Example: Student Academic Record (FERPA)
Purdue University
18
Data Validation
Data validation is the process of ensuring that a program
operates on clean, correct and useful data.
It uses routines, often called "validation rules" or "check
routines", that check for correctness, meaningfulness,
and security of data that are input to the system.
Data validation checks that data are valid, sensible,
reasonable, and secure before they are processed.
19
Data Validation Methods
Format check
• Checks that the data is in a specified format (template), e.g., dates
have to be in the format DD/MM/YYYY.
Data type checks
• Checks if the input data does not match with the chosen data type,
e.g., In an input box accepting numeric data, if the letter 'O' was
typed instead of the number zero, an error message would appear.
Range check
• Checks that data lie within a specified range of values, e.g., the
month of a person's date of birth should lie between 1 and 12.
Limit check
• Unlike range checks, data is checked for one limit only, upper OR
lower, e.g., data should not be greater than 2 (>2).
20
Data Validation Methods (cont.)
Presence check
• Checks that important data are actually present and have not
been missed out, e.g., customers may be required to have their
telephone numbers listed.
Spelling and grammar check
• Looks for spelling and grammatical errors.
Consistency Checks
• Checks fields to ensure data in these fields corresponds, e.g., If
Title = "Mr.", then Gender = "M".
21
Dirty Data
Dirty data refers to inaccurate information/data primarily
collected by means of data capture forms
Dirty data is data that is:
• Misleading
• Incorrect or without generalized formatting
• Containing spelling or punctuation errors (data that is entered in
a wrong field or duplicate data)
22
Causes of Dirty Data
Deliberate distortion of information
• Person could deliberately inserts misleading or fictional data
such as personal information, biographical data which
seems/appears real, it may not be picked up by an administrator
and/or a validation routine due to its appearance
Typographical errors
Formatting issues
• Personal preferences for formatting of the data (such as phone
numbers) could lead to introduction of dirty data
Duplication errors
• Duplicate data may be caused by accidental double submission
on the forms; incorrect data joining; user error(s)
23
Dirty Data Prevention
It is commonly prevented using input masks or validation
rules.
Completely removing dirty data from a data source is
impossible or impractical in some cases.
24
Data Cleansing
Data cleansing or data scrubbing is the act of detecting
and correcting (or removing) corrupted or inaccurate
records from a record set, table, or database.
It refers to identifying incomplete, incorrect, inaccurate,
irrelevant etc. parts of the data and then replacing,
modifying or deleting dirty data.
Data cleansing differs from data validation in that:
• validation means data is rejected from the system at entry and is
performed at entry time, rather than on batches of data.
25
Steps in the Evolution of Data Mining
Evolutionary
Step
Business Question
Enabling
Technologies
Characteristic
s
Data Collection
(1960s)
"What was my total
revenue in the last
five years?"
Data Access
(1980s)
RDBMS, SQL, ODBC
"What were unit
sales in New England
last March?"
Retrospective,
dynamic data
delivery at
record level
Data
Warehousing &
Decision
Support
(1990s)
"What were unit
sales in New England
last March? Drill
down to Boston."
On-line analytic
processing (OLAP),
multidimensional
databases, data
warehouses
Retrospective,
dynamic data
delivery at
multiple levels
Data Mining
(Emerging
Today)
"What’s likely to
happen to Boston
unit sales next
month? Why?"
Advanced algorithms,
multiprocessor
computers, massive
databases
Prospective,
proactive
information
delivery
Computers, tapes,
disks
Retrospective,
static data
delivery
26
http://www.thearling.com/text/dmwhite/dmwhite.htm
Data Storage Performance
Life Cycle of Data
Active
Less Active
Fast
Medium
dB
Historical
Archive
Slow
Per Request
27
Data for Business
RFID Technology
28
Radio Frequency Identification (RFID)
An automatic method, relying on storing
and remotely retrieving data using
devices called “RFID tags”.
29
Types of RFID
Passive
• Does not have internal power supply
• Range (4cm up to a few meters)
RFID backscatter
Active
• Have its own power supply to broadcast signal to reader
• Range of hundreds of meters with 10 years battery lifetime
Semi-passive
• Have its own power for chip but not for broadcast a signal
• greater sensitivity than passive, typically 100 times more30
Example of RFID Tags
RFID in the form of sticker
An RFID tag used for electronic toll collection
31
Implantable RFID Chip
32
Logo of the Anti-RFID Campaign
33