Clardent Analytics NexJ Product Plan 2014
Download
Report
Transcript Clardent Analytics NexJ Product Plan 2014
DMBOK2 DWBI
New Content
Martin Sykora
2015 April
IRMAC Toronto
Agenda
Introduction
In the beginning
Reports
Data Warehouse – 2 schools
BI Tools v1
DMBOK v1
DMBOK v2 New Content
Data Vault
In a Minute
Drama: Exa Forklift
Yellow Elephants
Visualizing Content
Data Sciences
Virtualization
DMBOK v2 Conceptual Architecture
2
7/16/2015
Introduction
Martin Sykora
• 25 years in data management
• Oracle, SAP, BusinessObjects
• Current Director Analytics at NEXJ
SYSTEMS
• DMBOK2 DWBI and Big Data
Sciences
• Queens Masters of Analytics 2016
•
3
7/16/2015
In the beginning
Google – started ten years later
Apple – first PC with GUI+mouse
Internet – without www
4
7/16/2015
Reports
Simple process complex report
development
Sources read, integrated, translated
and aggregated
Aggregate results stored in a table
Table contents were read, subtotalled and sent to the report file or
printer
Data Warehouse – 2 schools
Reference Data
Historical
Reference Data
Applications
Data Marts
App
App
App
Extract
Extract
Extract
Extract
DM
Analysis*
Exploratory
Analysis*
DW
Enterprise Data Model
Op DM
Operational
Analysis
ODS
Operational
Reports
(per App)
Operational
Source
Systems
DM
DM
Operational
Reports
(integrated)
Data
Presentation
Area
Data Staging
Area
SERVICES:
Clean
Combine
Standardize
Conform
Dimensions
Load
NO QUERIES
Load
Ø
Ø
Ø
Ø
DATA STORE:
Ø Flat Files
Ø Relational
Tables
Ø XML
Datasets
PROCESSING:
Ø Sorting
Ø Sequencing
Data Mart #1
Data Access
Tools
Access
AD-HOC
QUERIES
Access
REPORT
WRITERS
Access
ANALYTIC
APPLICATIONS
Access
MODELS:
Ø Forecasting
Ø Scoring
Ø Data Mining
Data Mart #2
Load
DW BUS
Conformed Dimensions
Raw Detailed Data
Integration & Transformation
App
Inmon
• Corporate Information Factory
• Normalized tables
Data Mart #N
Load
Kimbal
• Marts that satisfy business process
• Dimensional Data Model
• Conformed Dimensions & Facts
BI Tools V1
Focus on tool alignment and usage complexity
DMBOKv1
Data Warehouse loading process (Kimball or Inmon)
Operational Reporting & Analysis (OLAP)
Performance Management (Dashboarding & Scorecarding)
Internet Traffic
What Happens in an Internet Minute?
• 5 Exabytes of data transferred monthly
• An Exabyte is a unit of information
equal to one quintillion (1018) bytes
• 5,000,000,000,000,000,000 bytes or
5x1018
• In non-math speak
• Dime is 1.22 mm thick
• 5 Exa of dimes stacked would reach
from the Earth to the Sun 40,775 times
Drama: Forklift Load
This slide is only for dramatic effect – no
data was harmed during the assembly or
provisioning process
Hadoop - The Yellow Elephant
Scalable, Durable,
Commodity Hardware
Example
•
Branch customer churn e-mail query
Query in place
Hadoop can read many different file
types without transformation
• No need to transport the files to a
processing or database server
• Hadoop requests processed with
MapReduce jobs
•
Apache Hive QL Interface
Invoke a SQL like script
• Creates MapReduce jobs
• Compiled results returned
•
MapReduce
Sends the algorithm to the data
• Applied on best available node-file pair
• Results then compiled
•
Viewing Hadoop Content
Naturally we want to view the data
Very difficult to infer any
relationships
Graphing requires understanding of
the data elements
But we don’t really know what’s
there
How can we pick a chart if without
comprehending the data elements?
Tree Map – Churn: Age by Wealth
Immediately can see that customers 40 or below are leaving
But why is this happening, is this a regional issue?
Geographic– Churn by Wealth
Churn customers scattered across multiple regions
Appears age related, could it be product/service offering?
Box Plot – Churn: Product by Age
Most impact from churn to investment
Actionable insights from data visualizations
Data Sciences
Visualization is a manual process, DS applied mathematical
methods to analyze, process and manage big data results
Decomposed in the following sections
Data Mining Unsupervised learning
Profiling, Data Reduction, Association, Clustering
Predictive Analytics Supervised Learning
Classification, Decision Trees, CHAID, Regression
Advanced Supervised Learning
Ensemble, Neural Networks, Support Vector Machines
Data Scientists typically develop, train and manage many
algorithmic programs referred to as models
The input, or variables/parameters vary from model to model and
the computed outcome have many business consumers
How do we bring these pieces together?
Virtualization
Virtualization refers to technologies designed to provide a
layer of abstraction between computer hardware systems and
the software running on them
Common virtualization technologies
Server virtualization – a single physical server supplies multiple
user environments ideal for resource optimization
Database virtualization – multiple copies of a single database
image ideal for testing activities
Data virtualization – integration of any data from disparate data
sources into coherent data services
17
7/16/2015
Data Virtualization
18
7/16/2015
BIW Release Virtualization
Use virtualization to
• Foster agile delivery
• Prove concepts with business
• Materialize only necessary
components
DMBOKv2 Conceptual Architecture
Thank You
DMBOK2
http://www.dama.org/content/body-knowledge
IRMAC Data Management Education
http://www.irmac.ca/
NEXJ Customer Data Management
http://www.nexj.com/products/financial-services/enterprise-
customer-view/
Questions?
E-mail
LinkedIn
[email protected]
ca.linkedin.com/in/martingsykora
Reference
http://en.wikipedia.org/
Dime_(Canadian_coin) Canada. Value, 0.10 CAD. Mass, 1.75 g.
Diameter, 18.03 mm. Thickness, 1.22 mm
Sun Distance to Earth: 149,600,000 km
http://www.masswerk.at/googleBBS/
http://www.cisco.com/c/en/us/solutions/collateral/service-
provider/ip-ngn-ip-next-generationnetwork/white_paper_c11-481360.html
Annual global IP traffic will surpass the zettabyte (1000
exabytes) threshold in 2016. Global IP traffic willreach
1.1zettabytes per year or 91.3 exabytes (one billion gigabytes)
per month in 2016. By 2018, global IPtrafficwill reach 1.6
zettabytes per year, or 131.6 exabytes per month.