Online Analytical Processing - San Francisco State University

Download Report

Transcript Online Analytical Processing - San Francisco State University

Business Intelligence
Topics
• Import/Export Data
• Chart
• Online Analytical Process, OLAP
– Excel’s Pivot table/Pivot chart
– Data visualization with dashboard
•
•
•
•
Scenario Management
Data warehousing
Data Mining
Big Data
Access Tools for Import/Export
• External Data
– Import
– Export
• Exchange data between:
– Other Access databases
– Excel
– Text file
– XML
– ODBC data sources
Data Exchange with Excel
• Import from Excel:
– The first row of Excel’s list should contain field
names
• Or define a name for the list range.
– External Data/Import/Excel
• Export to Excel:
– External Data/Export/Excel
• Note: Saved imports/exports
Open Database Connectivity (ODBC)
• Provide a standard to retrieve data from a
database.
• It manages one or more "database drivers“
that enables the communication between
database and applications.
• To access a database, we use ODBC facilities to
define a ODBC data source name for the
database.
ODBC Demo
• Define an ODBC data source name:
– Control Panel/Administrative tools/Data
source(ODBC)
• Import from an ODBC data source
– External Data/Import/ODBC data source
Export/Import XML File
• Export:
– External Data/More/XML
– XML Data file
– XML schema file
• Import:
– External Data/Import XML
Export to Word
• Write data to a Word file:
– External Data/More/Word
• Merge data with a Word document
– Select the table or query
– External Data/Word Merge
– Example: Write a letter to students whose GPA is
low and ask them to see their faculty advisor.
Excel’s Tools for Import/Export
• Data/Get External Data
– From Access
– From Web
– From Other Sources
• ODBC
• XML
– Existing Connections
• Saved Queries
• Save Query
• Refresh: Data/Refresh All
– Compare the difference between Copy/Paste and Import
Excel’s Web Query Demo
•
•
•
•
•
•
Data/From Web
Yahoo/Finance/Dow
Save query
Import
Using existing connection
Refresh
Charts
Charting Decision Rules
• An Internet Service Provider charges
customers based on hours used:
– First 10 hours
$15
– Each of the next 20 hours $2 per hour
– Hours over 30 hours
$1 per hour
Comparing Decision Rules
• Plan 2:
– First 20 hours:
– Hours over 20
$20
$1.5
• Plan 3:
– $35 unlimited access.
Charting Functions
• Demand function:
– P = 150 – 6*Q^2
• Supply function:
– P = 10* Q^2 + 2*Q
• Note:
– Positive area
– Value axis maximum/minimum value:
• Format Value Axis
Chart Stock Market Data
• Download Dow Jones Historical Data
– Yahoo/Finance/Dow Jones/Historical Data
• To chart:
– Insert/Chart/Other Charts/Stock chart
Display Chart on Web page
• Google Chart:
– https://developers.google.com/chart/?hl=en
On-Line Analytical Processing (OLAP) Tools
• The use of a set of graphical tools that provides users
with multidimensional views of their data and allows
them to analyze the data using simple windowing
techniques
• OLAP Operations
–
–
–
–
Cube slicing–come up with 2-D view of data
Drill-down–going from summary to more detailed views
Roll-up – the opposite direction of drill-down
Reaggregation – rearrange the order of dimensions
Slicing a data cube
Summary report
Example of drill-down
Starting with summary
data, users can obtain
details for particular
cells
Drill-down with
color added
Excel’s Pivot Table
• Insert/Pivot Table or Pivot Chart
– Drill down, rollup and reaggregation
– Filter
• Pivot Chart
– Filter
– Drilldown, rollup, reaggregation
• Import queries from Access to perform analysis.
– Sales related to: Customer’s location, Rating and
Products
Data Visualization
• Representing data in graphical/multimedia
formats for analysis.
– Web-based “dashboards”
• http://www.dundas.com/
– Dashboard Samples
» (Use FireFox to view)
– Tableau software:
• http://www.tableausoftware.com/
Scenario
• A scenario is an assumption about input variables.
• Excel’s Scenarios is a what-if-analysis tool. A scenario
is a set of values that Microsoft Excel saves and can
substitute automatically in your worksheet.
• You can use scenarios to forecast the outcome of a
worksheet model. You can create and save different
groups of values on a worksheet and then switch to
any of these new scenarios to view different results.
• Data/What If analysis/Scenario
Creating a Scenario
– Add scenario
• Changing cells
– Scenario Summary
• Resulting cells
• Demo: benefit782.xls
Data Warehouse
• Data warehouse is a repository of an
organization's electronically stored data.
• A data warehouse houses a standardized,
consistent, clean and integrated form of data
that:
– sourced from various operational systems in use
in the organization,
– structured in a way to specifically address the
reporting and analytic requirements.
Example:
Transaction Database
CID
Cname
City
OID
ODate
Rating
SalesPerson
Customer
1
Has
M
Order
M
Qty
Has
M
Product
Price
PID
Pname
Analyze Sales Data
Detailed Business Data
• Total sales:
– by product:
• Qty*Price of each detail line
• Sum (Qty*Price)
• Detailed business data: qty*price
• Total quantity sold:
– By product:
• Sum(Qty)
• Detailed business data: Qty
Dimensions for Data Analysis:
Factors relevant to the business data
• Analyze sales by Product
• Analyze sales related to Customer:
– Location: Sales by City
– Customer type: Sales by Rating
• Analyze sales related to Time:
– Quarterly, monthly, yearly Sales
• Analyze sales related to Employee:
– Sales by SalesPerson
Data Warehouse Design
- Star Schema • Dimension tables
– contain descriptions about the subjects of the
business such as customers, employees, locations,
products, time periods, etc.
• Fact table
– contain detailed business data with links to
dimension tables.
Star Schema
Location
Dimension
LocationCode
State
City
Can group by State, City
FactTable
LocationCode
PeriodCode
Rating
PID
Qty
Amount
Product
Dimension
PID
Pname
Category
CustomerRating
Dimension
Rating
Description
Period
Dimension
PeriodCode
Year
Quarter
Define Location Dimension
• Location:
– In the transaction database: City
– In the data warehouse we define Location to be
State, City
• San Francisco -> California, San Francisco
• Los Angeles -> California, Los Angeles
– Define Location Code:
• California, San Francisco -> L1
• California, Los Angeles -> L2
Define Period Dimension
• Period:
– In the transaction database: Odate
– In the data warehouse we define Period to be:
Year, Quarter
• Odate: 11/2/2003 -> 2003, 4
• Odate: 2/28/2003 -> 2003, 1
– Define Period Code:
• 2003, 4 -> 20034
• 2003, 1 -> 20031
The ETL Process
L
T
One,
companywide
warehouse
E
Periodic extraction  data is not completely current in warehouse
The ETL Process
• Capture/Extract
• Transform
– Scrub(data cleansing),derive
– Example:
• City -> LocationCode, State, City
• OrderDate -> PeriodCode, Year, Quarter
• Load and Index
ETL = Extract, transform, and load
Performing Analysis
• Analyze sales:
– by Location
– By Location and Customer Type
– By Location and Period
– By Period and Product
• Pivot Table:
– Drill down, roll up, reaggregation
Data Mining
• Knowledge discovery using a blend of
statistical, Artificial Intelligence, and
computer graphics techniques
• Goals:
– Explain observed events or conditions
– Explore data for new or unexpected relationships
Typical Data Mining Techniques
•
•
•
•
Statistical regression
Decision tree induction
Clustering – discover subgroups
Affinity – discover things with strong mutual
relationships
• Sequence association – discover cycles of evens and
behaviors
• Rule discovery – search for patterns and correlations
Typical Data Mining Applications
• Profiling populations
– High-value customers, credit risks, credit card fraud
•
•
•
•
Analysis of business trends
Target marketing
Campaign effectiveness
Product affinity
– Identifying products that are purchased concurrently
• Up-selling
– Identifying new products and services to sell to a customer
based on critical events
Affinity Analysis:
Market Basket Analysis
• Market Basket Analysis is a modeling technique
based upon the theory that if you buy a certain
group of items, you are more (or less) likely to buy
another group of items.
• The set of items a customer buys is referred to as an
itemset, and market basket analysis seeks to find
relationships between purchases.
• Typically the relationship will be in the form of a rule:
Example:
– IF {beer, no bar meal} THEN {chips}.
Basket Analysis and Cross- Selling
• For instance, customers are very likely to purchase
shampoo and conditioner together, so a retailer
would not put both items on promotion at the same
time. The promotion of one would likely drive sales
of the other.
• A widely used example of cross selling on the
internet with market basket analysis is
Amazon.com's use of suggestions of the type:
– "Customers who bought book A also bought book B", e.g.
Big Data
• Data Size:
– Gigabyte
– Terabyte: Terabyte USB
– Petabyte: Wal-Mart handles more than 1m
customer transactions every hour at more than 2.5
petabytes
– Exabyte: the amount of traffic flowing over the
internet will reach 667 exabytes annually by 2013
•
Big Data Challenges
• Big Data are high-volume, high-velocity,
and/or high-variety information assets that
require new forms of processing to enable
enhanced decision making, insight discovery
and process optimization.
• “3Vs":
– Volume: Size
– Velocity: Processing speed
– Variety:
• Structured: able to fit in a database table
• unstructured data
Applications
• Text mining: deriving high-quality information
from text.
– text categorization, text clustering, concept/entity
extraction, sentiment analysis, etc.
• Web mining:
– Web usage mining
– Web content mining
• Social media mining
– Salesforce Radian6 Social Marketing Cloud
• http://www.youtube.com/watch?v=EH1dcFh_-I4
NoSQL Database
• NotOnlySQL is a broad class of database
management systems identified by nonadherence to the widely used relational
database management system model.
• They are useful when working with a huge
quantity of data when the data's nature does
not require a relational model.