(PPTX, 2.43MB)

Download Report

Transcript (PPTX, 2.43MB)

Kamanja in Action:
Driving Value through Continuous Decisioning
Customer Advisory Council
April 2016
CONFIDENTIAL | © 2015 LigaData, Inc. All Rights Reserved.
Agenda
• Developments in Big Data | Open source in the bigger picture
• Lambda Architecture & Kamanja | Why it is important & how it
applies to you
• Enabling Faster, Better Analytics | Modelling and Kamanja
• Continuous Decisioning in Action | Live Demo of Kamanja
• Future of Continuous Decisioning | Kamanja Architecture &
Technology Roadmap
• Working Together | Feedback & Innovation for Kamanja Use Cases
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 2
BIG DATA LANDSCAPE (2016)
Open source technologies are central in areas of
greatest change in the big data ecosystem
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 3
a
The explosion of complexity creates pressure to
evolve and innovate
Racing to capitalize on three major
advancements in the data space:
• Big Data (i.e. massive,
inexpensive storage and
distributed computing)
• Real-time processing
• Data Science
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 4
Why Open Source
Open Source Benefits
Cost
Quality
• Leverage robust community
• Higher quality of code
• No vendor lock-in
Security
Freedom
• Control over data and code
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 5
Introducing Lambda Architecture
and Kamanja
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 6
Old ways of thinking are entrenched in the
traditional decisioning architecture
•
Two distinct, unlinked data
processing channels exist in
traditional decisioning
•
Processing of events
through a real-time decision
engine, potentially with
access to an offline data
store
•
An asynchronous offline
process where decision
models are constructed and
optimized
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 7
The core framework of Lambda Architecture is
powerful but has fundamental limitations
•
Enables advanced, real-time analytics through batch- and real-time processing
of big data in parallel
•
Fundamental approach to provide views of the data that optimally combine the
best aspects of batch-processing and real-time
•
Limitations in input/output and model implementation inhibit direct extension to
many classes of applications including continuous decisioning
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 8
The Extended Lambda Architecture is critical to
enable continuous decisioning functionality
1
2
3
6
5
4
1
Decisioning is applied to all data
immediately upon availability
3
Enhancements to the decisioning process
are enabled through continuous feedback of
data and model updates
5
Standard case management reports and workflow
are augmented with advance data visualization,
drill-through capabilities, and search
2
Decisioning leverages all available
data, including data stored in other
layers
4
Actions may include triggering the start of
other processes or sending alerts to a case
management system
6
Models may be built and tested using all available
data and a variety of tools, then quickly and easily
deployed into production
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 9
Lambda Architecture with Kamanja
Leveraging Open Source Big Data Technologies
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 10
Why is continuous decisioning important?
Continuous decisioning is critical when…
•
•
•
•
•
A decision must be made in real time
Decisions should be based upon incoming event data and multiple sources of
stored data
Changes to stored data should immediately impact decision-making
Model creation is complicated and requires access to many data points
Models should adaptively evolve to optimize a decision’s performance
USE CASES
Fraud
Risk
Analysis
Customer Contact
Cyber Crime
Telephony
Interception
Security &
Compliance
Audit &
Customer
churn/
retention
Marketing
Real-Time
Offer
Governance
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 11
LigaData launched and champions Kamanja –
an open source continuous decisioning platform,
hardened
for enterprise reliability requirements, scalable to IoT level data volumes,
enabling low latency use cases.
QUICK STATS
Building a Best in Class Decisioning Engine
• More than 40,000 man hours invested to date
• 116,000 lines of code already written
• 18 releases
COMPLEMENTARY TECHNOLOGIES
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 12
Modelling and Kamanja
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 13
Enabling Faster, Better Analytics
Modelling Approach on Kamanja
• Data Mining Modellers use many tools
• Languages with rich, powerful libraries: R, Python
• Data Mining Software Packages: SAS Enterprise Miner, Salford Systems, Rapid
Miner, KNIME, SPSS (18 produce PMML)
• Default to “combination of many models” (wisdom of jury)
• Kamanja provides one process for production-izing
• Can go into production in hours vs. weeks (focus on training)
• Easier for team to switch between software and algorithms
• Easier to hire (not limited to being an “X shop”)
• Deal with the expected shortage of ~1mm Data Scientists
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 14
Kamanja
Supports a
Diverse
Modelling
Toolset
Matrix of Vendors
and Algorithms
for Continuous
Decisioning
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 15
Modelling Approach on Kamanja:
Model Management
• Requirement: need to manage 10’s to 10k’s of models
• Per data segment (customer, network section, product area)
• Per business strategy within segment (cross sell, attrition, fraud, ….)
• Per deployment within segment (risk mitigation type, media channel)
• Models have a NORMAL LIFECYCLE
• The training data captures one snapshot of the universe
• Broad behavior normally drifts over time (i.e. bull vs. bear market)
• Update by refreshing model training with more current data
• Use A / B testing to transition from old to refreshed model
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 16
Kamanja Demo
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 17
Continuous Decisioning in Action:
Live Kamanja Demo
To address these challenges, LigaData is creating a user
interface for Kamanja that allows
• Easy deployment of new models
• The ability to monitor throughput and performance
intuitively and flexibly.
• Easy filtering and drill-downs
CLICK HERE FOR DEMO
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 18
Kamanja Architecture & Roadmap
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 19
Kamanja Architecture
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 20
Kamanja | Current Feature Set
Enterprise
Readiness
Performance &
Scalability
Models
Integrations &
Interoperability
Ease of Use
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Basic statistics
Metadata change audits
Dual role security (admin vs. non admin)
Leverage Big Data stack
Parallel processing of models & messages
Compiling to JVM
DAG
Languages - Java, Scala, PMML, JSON
Data mining tools – 2 PMML Producers validated – R, KNIME
Databases
• NoSQL DBs
Logs
• Applications
Flat files
• Reporting tools
Social Media data such as Twitter
• HDFS
Streaming data sources - Kafka, MQ
Simplified installation process
Auto migration from older versions
Support for evolving Hadoop stack
Developer utilities (PMML test, clean utility, JSON validation)
UI Prototype
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 21
The Future of Continuous Decisioning |
Near Term Priorities for Kamanja
• Enterprise Readiness | Help clients meet their security, audit
(compliance), and resource efficiency needs
• Increased Performance & Scalability | Enable SLA compliance for
applications
• Extending Model support | Support models from popular data mining
tools as well as models developed in Python language
• Expanded Integrations and Interoperability | Support standard
transports such as Flume, UDP and HTTP as well as standard data
formats such as Avro
• Increased ease of use | Enable more efficient development & testing
of models & develop intuitive web UI to support efficient model
management/development and administration
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 22
The Future of Continuous Decisioning |
Focus of Development for Enterprise Readiness
Enterprise
Readiness
Existing Features
• Basic statistics
• Metadata change audits
• Dual role security (admin
vs. non admin)
Planned Features
• Multi-tenancy
• Encryption & tokenization
• Security
• Auditing/Data Lineage
• Integration with popular monitoring
tools
• Integrate with Resource Managers
(ex. YARN)
• Enable resource sharing across Kamanja installations as well as other Big
Data installations
• Meet enterprise security, audit and monitoring requirements
• Meet uptime requirements of Kamanja based mission critical applications
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 23
Focus of development to enhance performance
and scalability
Key Developments
• Optimized DAG
• Distributed and hieratical cache utilization
• Support for logical partitions
• SLA aware
Rationale
• Enable dynamic scaling by decoupling parallel processing from physical
partitioning
• Support SLA critical applications by providing priority based executions
• Dynamically adjust execution pipelines to optimize performance
• Optimize performance of Kamanja on Hadoop storage
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 24
Focus of development to expand the range of
integration and interoperability
Key Developments
• Inputs - HTTP/UDP end points, Flume, AVRO Format
• Outputs - AVRO Format, Graph DB, Elastic Search
Rationale
• Support wider range of data sources and transports used in the enterprise
• Consume and produce standard formats to enable better interoperability
across systems
• Provide ability to integrate with advanced analytical tools to populate
data/decisions in real time
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 25
Focus of development development to improve
enterprise readiness
Key Developments
• Multi-tenancy
• Encryption & tokenization
• Security
• Auditing/Data Lineage
• Integration with popular monitoring tools
• Integrate with Resource Managers (ex. YARN)
Rationale
• Enable resource sharing across Kamanja installations as well as other Big
Data installations
• Meet enterprise security, audit and monitoring requirements
• Meet uptime requirements of Kamanja based mission critical applications
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 26
Customer Feedback
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 27
CONFIDENTIAL | © 2015 LigaData, Inc. All Rights Reserved.
Supplemental Materials
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 29
Kamanja Roadmap – Current State
Currently available features
Models
• Languages - Java, Scala, PMML, Json,
• Data mining tools – 2 PMML Producers
validated – R, KNIME
Performance/Scalability
• Leverage Big Data stack
• Parallel processing of models &
messages
• Compiling to JVM
• DAG
Enterprise Readiness
• Basic statistics
• Metadata change audits
• Dual role security (admin vs. non
admin)
Integrations & Interoperability
•
•
•
•
•
•
•
•
•
Databases
Logs
Flat files
Social Media data such as Twitter
Streaming data sources - Kafka, MQ
NoSQL DBs
Applications
Reporting tools
HDFS
Ease of Use
• Simplified installation process
• Auto migration from older
versions
• Support for changing Hadoop
stack
• Developer utilities (PMML test,
clean utility, Json validation)
• UI Prototype
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 30
Kamanja Roadmap – Future State
Features targeted over the next 6 months
Models
• Languages - Java, Scala, PMML, Json,
Python
• Data mining tools – 9 PMML Producers
validated – R, KNIME, Rapid Miner, SAS
Enterprise Miner, Spark MLlib, IBM SPSS,
Salford Systems, Tibco, Angoos
Integrations & Interoperability
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Databases
Logs
Flat files
Social Media data such as Twitter
Streaming data sources - Kafka, MQ
NoSQL DBs
Applications
Reporting tools
HDFS
AVRO format
HTTP/UDP end points
Flume
Graph DB
Elastic Search
Performance/Scalability
• Leverage Big Data stack
• Parallel processing of models &
messages
• Compiling to JVM
• DAG
• Optimized DAG
• Distributed and hieratical cache
utilization
• Support for logical partitions
• SLA aware
Ease of Use
• Simplified installation process
• Auto migration from older
versions
• Support for changing Hadoop
stack
• Developer utilities (PMML test,
clean utility, Json validation)
• UI Prototype complete
• Model Management
• Administration/Monitoring
• Rule/Model Development
• IDE Integration
• Model testing & validation
Enterprise Readiness
• Basic statistics
• Metadata change audits
• Dual role security (admin vs. non
admin)
• Multi-tenancy
• Encryption & tokenization
• Security
• Auditing/Data Lineage
• Monitoring
• Integration with popular monitoring
tools
• Integrate with Resource Managers
(ex. YARN)
• No shutdown upgrades
• Support multiple storages (ex.
Cassandra & Hbase; Hbase & Oracle)
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 31
The Future of Continuous Decisioning |
Focus of Development for Models
Models
Existing Features
• Languages - Java, Scala,
PMML, JSON
• Data mining tools – 2 PMML
Producers validated – R,
KNIME
Planned Features
• Additional Languages - Python
• Additional Data mining tools – 9
PMML Producers validated –
Rapid Miner, SAS Enterprise
Miner, Spark MLlib, IBM SPSS,
Salford Systems, Tibco, Angoos
• Support commonly used language for developing custom data mining
models
• Consume models produced by widely used data mining tools to reduce
adoption barriers
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 32
The Future of Continuous Decisioning |
Focus of Development for Ease of Use
Ease of
Use
Existing Features
• Simplified installation process
• Auto migration from older versions
• Support for changing Hadoop stack
• Developer utilities (PMML test, clean
utility, Json validation)
• UI Prototype complete
Planned Features
• IDE Integration
• Model testing & validation
• Model Management
• Administration/Monitoring
• Rule/Model Development
• Simplify model change management process for system admins and
power users
• Enable easier management and monitoring of production systems
• Reduce model development complexity by allow developers to utilize
standard tools and reduce time to market
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 33
The Future of Continuous Decisioning | Focus of
Development for Integrations & Interoperability
Integrations &
Interoperability
Existing Features
• Databases
• Logs
• Data Warehouses
• Flat files
• Social Media data such as Twitter
• Streaming data sources - Kafka, MQ
• NoSQL DBs
• Files
• Applications
• Reporting tools
• HDFS
Planned Features
• AVRO format
• HTTP/UDP end
points
• Flume
• Graph DB
• Elastic Search
• Support wider range of data sources and transports used in the enterprise
• Consume and produce standard formats to enable better interoperability
across systems
• Provide ability to integrate with advanced analytical tools to populate
data/decisions in real time
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 34
The Future of Continuous Decisioning | Focus of
Development for Performance & Scalability
Performance &
Scalability
Existing Features
• Leverage Big Data stack
• Parallel processing of models
& messages
• Compiling to JVM
• DAG
Planned Features
• Optimized DAG
• Distributed and hieratical
cache utilization
• Support for logical partitions
• SLA aware
• Enable dynamic scaling by decoupling parallel processing from physical
partitioning
• Support SLA critical applications by providing priority based executions
• Dynamically adjust execution pipelines to optimize performance
• Optimize performance of Kamanja on Hadoop storage
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 35
Focus of development to increase ease of use
Key Developments
Production
• UI for Model Management
• UI for Administration/Monitoring
Development
• UI for Rule/Model Development
• IDE Integration
• Model testing & validation
Rationale
• Simplify model change management process for system admins and power
users
• Enable easier management and monitoring of production systems
• Reduce model development complexity by allow developers to utilize
standard tools and reduce time to market
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 36
Focus of development to expand model support
Key Developments
• Languages - Python
• Data mining tools – Additional PMML Producers validated – Rapid Miner,
SAS Enterprise Miner, Spark MLlib, IBM SPSS, Salford Systems, Tibco,
Angoos
Rationale
• Support commonly used language for developing custom data mining
models
• Consume models produced by widely used data mining tools to reduce
adoption barriers
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 37
The Journey to Continuous Decisioning
Big Data for Detection and Investigation
Taking action at the soonest possible moment,
based on all incoming events and historical data,
leveraging the most sophisticated predictive models
Data Analysis
Data
Management
Retain the
logs
Meet
compliance
requirements
Retrospective
analysis
Canned
reports
Near Time
Alerting
Streamlined
data pipeline
Custom alerts
Continuous
Decisioning
Real time
alerts
Fully
integrated
w/workflow
Ability to
learn, iterate
models easily
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 40
Continuous Decisioning: Kamanja
Real-time
Ingestion
(Structured /
Unstructured)
Correlate Historical
Reference Data Lake,
Scoring Models
Event Decisioning
Business Rules,
Pattern Analysis
Notifcation
(Visualization,
Alert, Case Mgmt
and Action)
TECH
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 41
LigaData transforms
how enterprises
leverage their data
using open source
Big Data technologies.
WHO WE ARE
Founded by former Yahoo Executives
Led data technology innovation at Yahoo
Grew a $3 billion business by detecting
signals in Web data
Over 40+ patents in data
technologies
WHAT WE DO
Take our deep data experience and focus on
the challenges of the financial services
industry
Implement continuous decisioning for threat
detection and compliance on a robust open
source technology stack
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 42
Open source technologies are causing seismic
shifts in the big data ecosystem
• XX
Need input into positioning of
tech in the areas below
• XX
• XX
Big Data
Real Time Processing
Data Science
R
Python
TensorFlow
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 43
OPTION 2
Kamanja | Current Feature Set
Ease of Use
• Simplified
installation process
• Auto migration from
older versions
• Support for
changing Hadoop
stack
• Developer utilities
(PMML test, clean
utility, Json
validation)
• UI Prototype
Integrations &
Interoperability
• Databases
• Logs
• Flat files
• Social Media data
such as Twitter
• Streaming data
sources - Kafka,
MQ
• NoSQL DBs
• Applications
• Reporting tools
• HDFS
Models
• Languages Java, Scala,
PMML, Json,
• Data mining tools
– 2 PMML
Producers
validated – R,
KNIME
Enterprise
Readiness
• Basic statistics
• Metadata change
audits
• Dual role security
(admin vs. non
admin)
Performance &
Scalability
• Leverage Big
Data stack
• Parallel
processing of
models &
messages
• Compiling to JVM
• DAG
CONFIDENTIAL | © 2016 LigaData, Inc. All Rights Reserved. | 44