Clardent Analytics NexJ Product Plan 2014

Download Report

Transcript Clardent Analytics NexJ Product Plan 2014

DMBOK2 DWBI
New Content
Martin Sykora
2015 April
IRMAC Toronto
Agenda
 Introduction
 In the beginning
 Reports
 Data Warehouse – 2 schools
 BI Tools v1
 DMBOK v1
 DMBOK v2 New Content
 Data Vault
 In a Minute
 Drama: Exa Forklift
 Yellow Elephants
 Visualizing Content
 Data Sciences
 Virtualization
 DMBOK v2 Conceptual Architecture
2

7/16/2015
Introduction
Martin Sykora
• 25 years in data management
• Oracle, SAP, BusinessObjects
• Current Director Analytics at NEXJ
SYSTEMS
• DMBOK2 DWBI and Big Data
Sciences
• Queens Masters of Analytics 2016
•

3
7/16/2015
In the beginning
 Google – started ten years later
 Apple – first PC with GUI+mouse
 Internet – without www

4
7/16/2015
Reports
 Simple process complex report
development
 Sources read, integrated, translated
and aggregated
 Aggregate results stored in a table
 Table contents were read, subtotalled and sent to the report file or
printer
Data Warehouse – 2 schools
Reference Data
Historical
Reference Data
Applications
Data Marts
App
App
App
Extract
Extract
Extract
Extract
DM
Analysis*
Exploratory
Analysis*
DW
 Enterprise Data Model
Op DM
Operational
Analysis
ODS
Operational
Reports
(per App)
Operational
Source
Systems
DM
DM
Operational
Reports
(integrated)
Data
Presentation
Area
Data Staging
Area
SERVICES:
Clean
Combine
Standardize
Conform
Dimensions
Load
NO QUERIES
Load
Ø
Ø
Ø
Ø
DATA STORE:
Ø Flat Files
Ø Relational
Tables
Ø XML
Datasets
PROCESSING:
Ø Sorting
Ø Sequencing
Data Mart #1
Data Access
Tools
Access
AD-HOC
QUERIES
Access
REPORT
WRITERS
Access
ANALYTIC
APPLICATIONS
Access
MODELS:
Ø Forecasting
Ø Scoring
Ø Data Mining
Data Mart #2
Load
DW BUS
Conformed Dimensions
Raw Detailed Data
Integration & Transformation
App
 Inmon
• Corporate Information Factory
• Normalized tables
Data Mart #N
Load
 Kimbal
• Marts that satisfy business process
• Dimensional Data Model
• Conformed Dimensions & Facts
BI Tools V1
 Focus on tool alignment and usage complexity
DMBOKv1
 Data Warehouse loading process (Kimball or Inmon)
 Operational Reporting & Analysis (OLAP)
 Performance Management (Dashboarding & Scorecarding)
Internet Traffic
What Happens in an Internet Minute?
• 5 Exabytes of data transferred monthly
• An Exabyte is a unit of information
equal to one quintillion (1018) bytes
• 5,000,000,000,000,000,000 bytes or
5x1018
• In non-math speak
• Dime is 1.22 mm thick
• 5 Exa of dimes stacked would reach
from the Earth to the Sun 40,775 times
Drama: Forklift Load
This slide is only for dramatic effect – no
data was harmed during the assembly or
provisioning process
Hadoop - The Yellow Elephant
Scalable, Durable,
Commodity Hardware
 Example
•
Branch customer churn e-mail query
 Query in place
Hadoop can read many different file
types without transformation
• No need to transport the files to a
processing or database server
• Hadoop requests processed with
MapReduce jobs
•
 Apache Hive QL Interface
Invoke a SQL like script
• Creates MapReduce jobs
• Compiled results returned
•
 MapReduce
Sends the algorithm to the data
• Applied on best available node-file pair
• Results then compiled
•
Viewing Hadoop Content
 Naturally we want to view the data
 Very difficult to infer any
relationships
 Graphing requires understanding of
the data elements
 But we don’t really know what’s
there
 How can we pick a chart if without
comprehending the data elements?
Tree Map – Churn: Age by Wealth
 Immediately can see that customers 40 or below are leaving
 But why is this happening, is this a regional issue?
Geographic– Churn by Wealth
 Churn customers scattered across multiple regions
 Appears age related, could it be product/service offering?
Box Plot – Churn: Product by Age
 Most impact from churn to investment
 Actionable insights from data visualizations
Data Sciences
 Visualization is a manual process, DS applied mathematical
methods to analyze, process and manage big data results
 Decomposed in the following sections
 Data Mining Unsupervised learning
 Profiling, Data Reduction, Association, Clustering
 Predictive Analytics Supervised Learning
 Classification, Decision Trees, CHAID, Regression
 Advanced Supervised Learning
 Ensemble, Neural Networks, Support Vector Machines
 Data Scientists typically develop, train and manage many
algorithmic programs referred to as models
 The input, or variables/parameters vary from model to model and
the computed outcome have many business consumers
 How do we bring these pieces together?
Virtualization
 Virtualization refers to technologies designed to provide a
layer of abstraction between computer hardware systems and
the software running on them
 Common virtualization technologies
 Server virtualization – a single physical server supplies multiple
user environments ideal for resource optimization
 Database virtualization – multiple copies of a single database
image ideal for testing activities
 Data virtualization – integration of any data from disparate data
sources into coherent data services
17

7/16/2015
Data Virtualization
18

7/16/2015
BIW Release Virtualization
 Use virtualization to
• Foster agile delivery
• Prove concepts with business
• Materialize only necessary
components
DMBOKv2 Conceptual Architecture
Thank You
 DMBOK2
 http://www.dama.org/content/body-knowledge
 IRMAC Data Management Education
 http://www.irmac.ca/
 NEXJ Customer Data Management
 http://www.nexj.com/products/financial-services/enterprise-
customer-view/
 Questions?
 E-mail
 LinkedIn
[email protected]
ca.linkedin.com/in/martingsykora
Reference
 http://en.wikipedia.org/
 Dime_(Canadian_coin) Canada. Value, 0.10 CAD. Mass, 1.75 g.
Diameter, 18.03 mm. Thickness, 1.22 mm
 Sun Distance to Earth: 149,600,000 km
 http://www.masswerk.at/googleBBS/
 http://www.cisco.com/c/en/us/solutions/collateral/service-
provider/ip-ngn-ip-next-generationnetwork/white_paper_c11-481360.html
 Annual global IP traffic will surpass the zettabyte (1000
exabytes) threshold in 2016. Global IP traffic willreach
1.1zettabytes per year or 91.3 exabytes (one billion gigabytes)
per month in 2016. By 2018, global IPtrafficwill reach 1.6
zettabytes per year, or 131.6 exabytes per month.