Business System Analysis & Decision Making

Download Report

Transcript Business System Analysis & Decision Making

ISQS 3358, Business Intelligence
Supplemental Notes on the Term
Project
Zhangxi Lin
Texas Tech University
1
Projects




Students will build up a Hadoop system and
explore/visualize a Hadoop based data warehouse.
Students are divided into three cohorts. Cohort 1 uses
Pentaho for data analysis, Cohort 2 uses Tableau for
data analysis, and the third cohort will work on selfselected business intelligence topic.
Each cohort may home 2-4 teams with no more than 12
students in total, and each team is composed of 2-4
members.
Deliverables include a team presentation of 15 minutes
and a term report in 6-10 pages.
Project contents




Each team will identify a big data topic and find needed
data. The dataset is not necessarily to be Big enough,
but representative.
A data warehouse using either SQL Server, or Hadoop is
fine.
Data analysis/visualization must be well done.
The report/presentation will cover the following points:





Business background
Data description
Data model design
ETL
Analytical results
HADOOP/SPARK
Topics
No:
1
Topic
Data warehousing
Focus: Hadoop Data warehouse design
2
Publicly available big data services
Focus: tools and free resources
3
MapReduce & Data mining
Components
HDFS, HBase, HIVE,
NoSQL/NewSQL, Solr
Hortonworks, CloudEra, HaaS,
EC2
Mahout, H2O, MLlib, R, Python
Focus: Efficiency of distributed data/text mining
4
Big data ETL
Kettle, Flume, Sqoop, Impala
Focus: Heterogeneous data processing across
platforms
5
System management:
Focus: Load balancing and system efficiency
6
Application development platform
Focus: Algorithms and innovative development environments
7
Tools & Visualizations
Focus: Features for big data visualization and data utilization.
8
Streaming data processing
Focus: Efficiency and effectiveness of real-time data processing
Oozie, ZooKeeper, Ambari,
Loom, Ganglia, Mesos
Tomcat, Neo4J, Taitan, GraphX,
Pig, Hue
Pentaho, Tableau, Qlik
Saiku, Mondrian, Gephi,
Spark, Storm, Kafka, Avro
- Implementing data warehouse systematically
Data Warehousing Methodology
6
Data Warehouse Development
Methods
Data warehouse development approaches

Kimball Model: Data mart approach


Data marts - EDW
Inmon Model: EDW approach


EDW – Data Marts
Which model is better?



There is no one-size-fits-all strategy to data warehousing
One alternative is the hosted warehouse
7
Comparison

Kimball Model




Kimball’s model follows a bottom-up approach. The Data Warehouse (DW) is
provisioned from Datamarts (DM) as and when they are available or required.
The Datamarts are sourced from OLTP systems are usually relational databases in Third
normal form (3NF).
The Data Warehouse which is central to the model is a de-normalized star schema. The
OLAP cubes are built on this DW.
Inmon Model



Inmon’s model follows a top-down approach. The Data Warehouse (DW) is sourced
from OLTP systems and is the central repository of data.
The Data Warehouse in Inmon’s model is in Third Normal Form (3NF).
The Datamarts (DM) are provisioned out of the Data Warehouse as and when required.
Datamarts in Inmon’s model are in 3NF from which the OLAP cubes are built.
Strengths and Weaknesses

Scalable vs. structural




Kimball’s model is more scalable because of the bottom-up approach and hence you can
start small and scale-up eventually. The ROI is usually faster with Kimball’s model.
Because of this approach it is difficult to created re-usable structures/ ETL for different
data marts.
On the other hand Inmon’s model is more structured and easier to maintain while it is
rigid and takes more time to build. The significant advantage of Inmon’s model is because
the DW is in 3NF; it is easier to build data mining models.
Both Kimball and Inmon models agree and emphasis that DW is the
central repository of data and OLAP cubes are built of de-normalized star
schemas.
In conclusion, when it comes to data modeling, it is irrelevant which camp
you belong to as long as you understand why you are adopting a specific
model. Sometimes it makes sense to take a hybrid approach.
General Data Warehouse
Development Approaches


“Big bang” approach
Incremental approach:
 Top-down incremental approach
 Bottom-up incremental approach
ISQS 6339, Data Mgmt & BI,
Zhangxi Lin
11
“Big Bang” Approach
Analyze enterprise
requirements
Build enterprise
data warehouse
Report in subsets or
store in data marts
ISQS 6339, Data Mgmt & BI,
Zhangxi Lin
12
Incremental Approach
to Warehouse Development



Multiple iterations
Shorter implementations
Validation of each phase
Increment 1
Strategy
Definition
Analysis
Design
Iterative
Build
Production
ISQS 6339, Data Mgmt & BI,
Zhangxi Lin
13
Top-Down Approach
Analyze requirements at the enterprise level
Develop conceptual information model
Identify and prioritize subject areas
Complete a model of selected subject area
Map to available data
Perform a source system analysis
Implement base technical architecture
Establish metadata, extraction, and load
processes for the initial subject area
Create and populate the initial subject area
data mart within the overall warehouse
framework
ISQS 6339, Data Mgmt & BI,
Zhangxi Lin
14
Bottom-Up Approach
Define the scope and coverage of the
data warehouse and analyze the source
systems within this scope
Define the initial increment based on the
political pressure, assumed business
benefit and data volume
Implement base technical architecture
and establish metadata, extraction, and
load processes as required by increment
Create and populate the initial subject
areas within the overall warehouse
framework
ISQS 6339, Data Mgmt & BI,
Zhangxi Lin
15