Big Data Exploration

Download Report

Transcript Big Data Exploration

Making Hadoop Ready for the Enterprise
Hadoop Summit, June 27, 2013
Anjul Bhambhri
Vice-President, IBM Big Data Development
© 2013 IBM Corporation
Safe area – no graphics here
Big Data is the next Natural Resource
Big Data is the next
Natural Resource
“We have for the first time an economy
based on a key resource (Information)
that is not only renewable, but self-generating.
40 ZB
300x
2005
2020
Source: IDC
Responding to the
increasing Velocity
19 Billion
Collectively
analyzing the
broadening Variety
RFID
sensors and
counting
Source: RFID Forecasts
80% of the
world’s data
is unstructured
Source: IBM Market Information
Running out of it is not a problem, but
drowning in it is.”
— John Naisbitt
Establishing the
Veracity of big
data sources
1 in 3 business leaders don’t trust
the information they use to make
decisions
Source: IBM. BAO for the Intelligent Enterprise
Harvesting any resource requires Mining, Refining and Delivering
Safe area – no graphics here
Safe area – no graphics here
Safe area – no graphics here
Cost efficiently
processing the
growing Volume
What if…
You could detect a neonatal
infections sooner?
Solution
120 children monitored :120K message
per sec, billion messages per day
24 hour
earlier detection of infections
Big Data enabled doctors from University of Ontario to apply neonatal infant
monitoring to predict infection in ICU 24 hours in advance
Constant Contact Transforming
Marketing Campaign
Effectiveness with IBM Big Data
• Analyze 35 billion annual emails to
guide customers on best dates &
times to send emails for maximum
response
Benefits
• 40 times improvement in analysis
performance
• 15-25% performance increase in
customer email campaigns
• Analysis time reduced from hours to
seconds
4
© 2013 IBM Corporation
Automobile and
Manufacturing Quality
Control and Customer
Satisfaction
• In-flexibility and scalability limitations
of existing IT solutions has been a
inhibitor to competitive advantage.
A new solution is needed to
improve quality and operational
efficiency
• Inventory control of parts
• Manufacturing equipment and
assembly line data
• Warranty and services data from
dealers
• Telemetry data from vehicles
• Next generation of Enterprise Data
Warehouse:
5
© 2013 IBM Corporation
New Opportunities with Big Data & Analytics
Transactional &
Application Data
Machine Data
Social Data
Enterprise Content
Big Data and Technology Platform
6
© 2013 IBM Corporation
New Opportunities with Big Data & Analytics
Data Scientist
Business Analyst
User
Roles and Analytics
Big Data and Technology Platform
7
© 2013 IBM Corporation
New Opportunities with Big Data & Analytics
Enrich
info base
Improve customer
interaction
Reduce
risk
Optimize
and monetize
Gain efficiency
and scale
New Outcomes
Roles and Analytics
Big Data and Technology Platform
8
© 2013 IBM Corporation
Emerging Pattern of Big Data Implementation
Ingestion and Real-time Analytic Zone
Ingest
Filter, Transform
Analytics and
Reporting Zone
Correlate, Classify
Warehousing Zone
Query
Engines
Cubes
Data
Sinks
Connectors
Enterprise
Warehouse
Extract, Annotate
Landing and Analytics Sandbox Zone
Hive/HBase
Col Stores
Analytics
MapReduce
Ingest
9
Descriptive,
Predictive
Models
Documents
In Variety
of Formats
Data Marts
Widgets
Discovery,
Visualizer
Search
Indexes,
facets
Models
Metadata and Governance Zone
Repository, Workbench
The 5 Key Use Cases
Big Data Exploration
Find, visualize, understand
all big data to improve
decision making
Enhanced 360o View
of the Customer
Security/Intelligence
Extension
Extend existing customer
views (MDM, CRM, etc) by
incorporating additional
internal and external
information sources
Lower risk, detect fraud
and monitor cyber security
in real-time
Operations Analysis
Data Warehouse Augmentation
Analyze a variety of machine
data for improved business results
Integrate big data and data warehouse
capabilities to increase operational efficiency
Big Data Platform and Application Framework
Analytic Applications
Gather, extract and
explore data using
best of breed
visualization
BI /
Exploration / Functional Industry Predictive Content
BI /
Reporting Visualization
App
App
Analytics Analytics
Reporting
IBM Big Data Platform
Visualization
& Discovery
Cost-effectively
analyze
petabytes of
structured and
unstructured
information
Speed time to value
with analytic and
application
accelerators
Applications &
Development
Systems
Management
Accelerators
Hadoop
System
Stream
Computing
Data
Warehouse
Contextual
Discovery
Analyze streaming
data and large data
bursts for real-time
insights
Index and federated
discovery for
contextual
collaborative insights
Deliver deep insight
with advanced
in-database analytics
and operational
analytics
Govern data quality
and manage
information lifecycle
Information Integration & Governance
Cloud | Mobile | Security
Enterprise Capabilities on Hadoop
Enterprise Capabilities
Visualization & Exploration
Development Tools
Advanced Engines
Connectors
Workload Optimization
Administration & Security
Open source
components
12
IBM-certified
Apache Hadoop
Key Platform Requirements
–
–
–
–
–
–
–
Built-in analytics
Enterprise-grade capabilities
Integrated with enterprise software
Ease of installation and management
Reference hardware configurations
World-class support
Full open source compatibility
Business benefits
– Quicker time-to-value
– Reduced operational risk
– Enhanced business knowledge with flexible
analytical platform
– Leverages and complements existing
software investments
© 2013 IBM Corporation
Big Data needs SQL
Application
• Most existing applications in
the enterprise use SQL
SQL Language
• SQL bridges the chasm
between existing apps and Big
Data
• SQL access to all data stored
in Hadoop
JDBC / ODBC Driver
Hadoop
Big SQL Engine
• Via JDBC/ODBC
Data Sources
• Using rich standard SQL
• Intelligently leverage
Map/Reduce parallelism
OR direct access for
achieving low-latency
13
JDBC / ODBC Server
HiveTables
HBase tables
CSV Files
© 2013 IBM Corporation
Text Analytics: Getting measurable insights
•
•
•
•
Most of the world’s data is in unstructured or semi-structured text.
Social media is rife with discussions about products and services
Company Internal Information is locked in blobs, description fields, and sometimes even
discarded
How do you get a metrics based understanding of facts from unstructured text?
Over 80% of stored information is unstructured*
Structural analysis
Healthcare Analytics: E-Medical records, hospital
reports
Public Sectors Case files, police records, emergency
calls…
Automotive Quality Insight: Tech notes, call logs,
online media
Insurance Fraud: Insurance claims
Mining and visualization
14
Social Media for Marketing: twitter, facebook, blogs,
forums
© 2013 IBM Corporation
How Text Analytics Works
Football World Cup 2010, one team distinguished
themselves well, losing to the eventual champions 10 in the Final. Early in the second half,
Netherlands’ striker, Arjen Robben, had a breakaway,
but the keeper for Spain, Iker Casilas made the save.
Winger Andres Iniesta scored for Spain for the win.
World Cup 2010 Highlights
15
Arjen Robben
Striker
Netherlands
Iker Casilas
Andres Iniesta
Keeper
Spain
Winger
Spain
© 2013 IBM Corporation
Text Analytics Language and Runtime
Offline
General-Purpose Linguistic
Parsers
Dictionaries
Development Environment
Runtime
Dominant Cost is CPU
Select
Extracted
Objects
Dict
Join
AQL Extractor
Select
Role Company
Join
create view Employment as
select R.jobType as jobType,
C.name as companyName
from Company C, Role R
where
Follows(R.jobType, C.name, 0, 20)
and ContainsDict('EmpAssociation.dict',
Dict
Role
…
Join
RightContext(R.jobType,10));
Company
Cost-based
optimization
Text Analytics
Runtime
Input
Documents
Select
Company
Dict
Role
 Declarative SQL-like
language
 Discovery tools for AQL
development
16
 High-throughput
 Small memory footprint
© 2013 IBM Corporation
Enterprise Data
Tools
Business User
Data Scientist
Business Analyst
Developer
Administrator
17
© 2013 IBM Corporation
Security and compliance in Big Data environments
Structure
d
Big Data Platform
Unstructured
• Who is running specific big data
requests?
• What map-reduce jobs are they
running?
• Are these jobs part of an authorized
program list accessing the data?
• Is there an exceptional number of file
permission exceptions?
Streaming
• Taps for Hadoop
Clients
• Collects and streams audit data to Collector
• Provides visibility for HDFS, MapReduce,
RPC, Oozie, HBase, etc.
• Securely stores audit data collected by TAPs
• Provides analytics, reporting & compliance
workflow automation
18
Hadoop Cluster
© 2013 IBM Corporation
Data Archiving and Masking on Hadoop
• Mask confidential data to avoid data
breach & meet privacy compliance
• Cost-effective query-able archiving
• Protect confidential data while preserving analytics
• Support compliance with privacy regulations
• Manage, apply retention policies for compliance
• Enable business users to query on Hot, Warm
and Cold data
Mask
JASON MICHAELS
Before Masking
ROBERT SMITH
After Masking
Mask in Hadoop
Mask in-database
Extract
Data Masking
Mask
Compress
Data Archiving
Load
Hadoop
Database
Query-able
Auditable
Restorable Data
Archive & Purge
19
Complete Business Objects
Data Integrity
Schema, Metadata
Retention Policies
Archive
files
© 2013 IBM Corporation
Introducing pureData for Hadoop
– BigInsights Appliance
Simplified Experience
• Designed for easy and quick deployment
• Built-in tools designed for users to derive value quickly
• Easy connectivity to common data warehouse systems
Built-in Expertise
• Enables ‘what-if analysis’ and advanced analytics
• Supports structured, semi-structured, and unstructured data
• Built-in text processing engine and library of annotators
to analyze large volumes of text-based information
• Data can be used in its native format
eliminating need to pre-define and map structures
Integration by Design
• InfoSphere BigInsights software, cluster management, and
IBM System x® servers
• Automatic parallelization and resource optimization to scale
economically
• Enterprise-class security and platform management
20
© 2013 IBM Corporation
From Getting Starting to Enterprise Deployment:
Enterprise class
InfoSphere BigInsights Brings Hadoop to the Enterprise
21
PureData for Hadoop
Appliance simplicity for the enterprise
* Pre-announced
Enterprise Edition
Sold by # of terabytes managed
Quick Start features
PLUS:
Accelerators
Quick Start Edition
Free download, non-productionEnterprise Integration
Big Sheets
Production support
Text Analytics
Production-ready features
Big SQL
Basic Edition
Workload
Free download
optimization/
Web-based
Query support
mgmt console
Dev tools
Jaql
Apache
Integrated install
Connectors
Hadoop
Mgmt tools
IBM Hadoop
Core
Breadth of capabilities
© 2013 IBM Corporation
© 2013 IBM Corporation
Streams - Real Time Analytics
22
22
© 2013 IBM Corporation
InfoSphere Data Explorer – delivering insights at the point of
impact
InfoSphere
Data Explorer
Data access & integration
Providing unified, real-time
• Index structured &
access and fusion of big
unstructured data—in place
data unlocks greater
• Support existing security
insight and ROI
• Federate to external sources
• Leverage MDM, governance,
and taxonomies
Discovery & navigation
• Clustering & categorization
• Contextual intelligence
• Easy-to-deploy applications
• All at the scale required for
today’s big data challenges
Create unified view
of ALL information
for real-time
monitoring
Increase productivity & Analyze customer data to Identify areas of information
risk & ensure data
unlock true customer value
leverage past work
compliance
increasing speed to market
Improve customer
service & reduce
call times
23
© 2013 IBM Corporation
Organizations are Building Big Data Applications on Data Explorer
Warehouse
Streams
Data in motion
BigInsights
Data at rest
Data Explorer
Semi- & unstructured
enterprise data
24
Data Explorer App Builder
Structured Enterprise
Data
© 2013 IBM Corporation
Get Started on Your Big Data Journey Today
Get Educated
• IBM Big Data: ibm.com/bigdata
• IBMBigDataHub.com
• BigDataUniversity.com
Get Your Hands on Big Data
• Download Quick Start
ibm.co\QuickStart
25
© 2013 IBM Corporation
THINK
26