(PPTX, 2.43MB)

Download Report

Transcript (PPTX, 2.43MB)

Kamanja in Action:
Driving Value through Continuous Decisioning
Customer Advisory Council
April 2016
© 2015 LigaData, Inc. All Rights Reserved.
Agenda
• Developments in Big Data | Open source in the bigger picture
• Lambda Architecture & Kamanja | Why it is important & how it applies to you
• Enabling Faster, Better Analytics | Modelling and Kamanja
• Continuous Decisioning in Action | Live Demo of Kamanja
• Future of Continuous Decisioning | Kamanja Architecture & Technology Roadmap
• Working Together | Feedback & Innovation for Kamanja Use Cases
© 2016 LigaData, Inc. All Rights Reserved. | 2
BIG DATA LANDSCAPE (2016)
Open source technologies are central in areas of
greatest change in the big data ecosystem
© 2016 LigaData, Inc. All Rights Reserved. | 3
a
The explosion of complexity creates pressure to
evolve and innovate
Racing to capitalize on three major
advancements in the data space:
• Big Data (i.e. massive,
inexpensive storage and
distributed computing)
• Real-time processing
• Data Science
© 2016 LigaData, Inc. All Rights Reserved. | 4
Why Open Source
Open Source Benefits
Cost
Quality
• Leverage robust community
• Higher quality of code
• No vendor lock-in
Security
Freedom
• Control over data and code
© 2016 LigaData, Inc. All Rights Reserved. | 5
Introducing Lambda Architecture
and Kamanja
© 2016 LigaData, Inc. All Rights Reserved. | 6
Old ways of thinking are entrenched in the
traditional decisioning architecture
•
Two distinct, unlinked data
processing channels exist in
traditional decisioning
•
Processing of events
through a real-time decision
engine, potentially with
access to an offline data
store
•
An asynchronous offline
process where decision
models are constructed and
optimized
© 2016 LigaData, Inc. All Rights Reserved. | 7
The core framework of Lambda Architecture is
powerful but has fundamental limitations
•
Enables advanced, real-time analytics through batch- and real-time processing
of big data in parallel
•
Fundamental approach to provide views of the data that optimally combine the
best aspects of batch-processing and real-time
•
Limitations in input/output and model implementation inhibit direct extension to
many classes of applications including continuous decisioning
© 2016 LigaData, Inc. All Rights Reserved. | 8
The Extended Lambda Architecture is critical to
enable continuous decisioning functionality
1
2
3
6
5
4
1
Decisioning is applied to all data immediately upon
availability
2
Decisioning leverages all available data, including
data stored in other layers
3
Enhancements to the decisioning process are enabled
through continuous feedback of data and model updates
5
Standard case management reports and workflow are augmented with
advance data visualization, drill-through capabilities, and search
4
Actions may include triggering the start of other
processes or sending alerts to a case management
system
6
Models may be built and tested using all available data and a variety of
tools, then quickly and easily deployed into production
© 2016 LigaData, Inc. All Rights Reserved. | 9
Lambda Architecture with Kamanja
Leveraging Open Source Big Data Technologies
© 2016 LigaData, Inc. All Rights Reserved. | 10
Why is continuous decisioning important?
Continuous decisioning is critical when…
•
•
•
•
•
A decision must be made in real time
Decisions should be based upon incoming event data and multiple sources of
stored data
Changes to stored data should immediately impact decision-making
Model creation is complicated and requires access to many data points
Models should adaptively evolve to optimize a decision’s performance
USE CASES
Fraud
Risk Analysis
Customer Contact
Cyber Crime
Telephony
Interception
Security &
Compliance
Audit &
Customer
churn/
retention
Marketing
Real-Time Offer
Governance
© 2016 LigaData, Inc. All Rights Reserved. | 11
LigaData launched and champions Kamanja –
an open source continuous decisioning platform,
hardened
for enterprise reliability requirements, scalable to IoT level data volumes,
enabling low latency use cases.
QUICK STATS
Building a Best in Class Decisioning Engine
• More than 40,000 man hours invested to date
• 116,000 lines of code already written
• 18 releases
COMPLEMENTARY TECHNOLOGIES
© 2016 LigaData, Inc. All Rights Reserved. | 12
Modelling and Kamanja
© 2016 LigaData, Inc. All Rights Reserved. | 13
Enabling Faster, Better Analytics
Modelling Approach on Kamanja
• Data Mining Modellers use many tools
• Languages with rich, powerful libraries: R, Python
• Data Mining Software Packages: SAS Enterprise Miner, Salford Systems, Rapid
Miner, KNIME, SPSS (18 produce PMML)
• Default to “combination of many models” (wisdom of jury)
• Kamanja provides one process for production-izing
• Can go into production in hours vs. weeks (focus on training)
• Easier for team to switch between software and algorithms
• Easier to hire (not limited to being an “X shop”)
• Deal with the expected shortage of ~1mm Data Scientists
© 2016 LigaData, Inc. All Rights Reserved. | 14
Kamanja
Supports a
Diverse
Modelling
Toolset
Matrix of Vendors
and Algorithms
for Continuous
Decisioning
© 2016 LigaData, Inc. All Rights Reserved. | 15
Modelling Approach on Kamanja:
Model Management
• Requirement: need to manage 10’s to 10k’s of models
• Per data segment (customer, network section, product area)
• Per business strategy within segment (cross sell, attrition, fraud, ….)
• Per deployment within segment (risk mitigation type, media channel)
• Models have a NORMAL LIFECYCLE
• The training data captures one snapshot of the universe
• Broad behavior normally drifts over time (i.e. bull vs. bear market)
• Update by refreshing model training with more current data
• Use A / B testing to transition from old to refreshed model
© 2016 LigaData, Inc. All Rights Reserved. | 16
Kamanja Demo
© 2016 LigaData, Inc. All Rights Reserved. | 17
Continuous Decisioning in Action:
Live Kamanja Demo
To address these challenges, LigaData is creating a user interface for Kamanja that
allows
• Easy deployment of new models
• The ability to monitor throughput and performance intuitively and flexibly.
• Easy filtering and drill-downs
CLICK HERE FOR DEMO
© 2016 LigaData, Inc. All Rights Reserved. | 18
Kamanja Architecture & Roadmap
© 2016 LigaData, Inc. All Rights Reserved. | 19
Kamanja Architecture
© 2016 LigaData, Inc. All Rights Reserved. | 20
Kamanja | Current Feature Set
Enterprise
Readiness
Performance &
Scalability
Models
Integrations &
Interoperability
Ease of Use
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Basic statistics
Metadata change audits
Dual role security (admin vs. non admin)
Leverage Big Data stack
Parallel processing of models & messages
Compiling to JVM
DAG
Languages - Java, Scala, PMML, JSON
Data mining tools – 2 PMML Producers validated – R, KNIME
Databases
• NoSQL DBs
Logs
• Applications
Flat files
• Reporting tools
Social Media data such as Twitter
• HDFS
Streaming data sources - Kafka, MQ
Simplified installation process
Auto migration from older versions
Support for evolving Hadoop stack
Developer utilities (PMML test, clean utility, JSON validation)
UI Prototype
© 2016 LigaData, Inc. All Rights Reserved. | 21
The Future of Continuous Decisioning |
Near Term Priorities for Kamanja
• Enterprise Readiness | Help clients meet their security, audit (compliance), and
resource efficiency needs
• Increased Performance & Scalability | Enable SLA compliance for applications
• Extending Model support | Support models from popular data mining tools as well as
models developed in Python language
• Expanded Integrations and Interoperability | Support standard transports
such as Flume, UDP and HTTP as well as standard data formats such as Avro
• Increased ease of use | Enable more efficient development & testing of models & develop
intuitive web UI to support efficient model management/development and administration
© 2016 LigaData, Inc. All Rights Reserved. | 22
The Future of Continuous Decisioning |
Focus of Development for Enterprise Readiness
Enterprise
Readiness
Existing Features
• Basic statistics
• Metadata change audits
• Dual role security (admin
vs. non admin)
Planned Features
• Multi-tenancy
• Encryption & tokenization
• Security
• Auditing/Data Lineage
• Integration with popular monitoring
tools
• Integrate with Resource Managers
(ex. YARN)
• Enable resource sharing across Kamanja installations as well as other Big Data installations
• Meet enterprise security, audit and monitoring requirements
• Meet uptime requirements of Kamanja based mission critical applications
© 2016 LigaData, Inc. All Rights Reserved. | 23
Focus of development to enhance performance
and scalability
Key Developments
• Optimized DAG
• Distributed and hieratical cache utilization
• Support for logical partitions
• SLA aware
Rationale
• Enable dynamic scaling by decoupling parallel processing from physical partitioning
• Support SLA critical applications by providing priority based executions
• Dynamically adjust execution pipelines to optimize performance
• Optimize performance of Kamanja on Hadoop storage
© 2016 LigaData, Inc. All Rights Reserved. | 24
Focus of development to expand the range of
integration and interoperability
Key Developments
• Inputs - HTTP/UDP end points, Flume, AVRO Format
• Outputs - AVRO Format, Graph DB, Elastic Search
Rationale
• Support wider range of data sources and transports used in the enterprise
• Consume and produce standard formats to enable better interoperability across systems
• Provide ability to integrate with advanced analytical tools to populate data/decisions in real time
© 2016 LigaData, Inc. All Rights Reserved. | 25
Focus of development to improve enterprise
readiness
Key Developments
• Multi-tenancy
• Encryption & tokenization
• Security
• Auditing/Data Lineage
• Integration with popular monitoring tools
• Integrate with Resource Managers (ex. YARN)
Rationale
• Enable resource sharing across Kamanja installations as well as other Big Data installations
• Meet enterprise security, audit and monitoring requirements
• Meet uptime requirements of Kamanja based mission critical applications
© 2016 LigaData, Inc. All Rights Reserved. | 26
Customer Feedback
© 2016 LigaData, Inc. All Rights Reserved. | 27
© 2015 LigaData, Inc. All Rights Reserved.
Supplemental Materials
© 2016 LigaData, Inc. All Rights Reserved. | 29
Kamanja Roadmap – Current State
Currently available features
Models
• Languages - Java, Scala, PMML, Json,
• Data mining tools – 2 PMML Producers
validated – R, KNIME
Performance/Scalability
• Leverage Big Data stack
• Parallel processing of models &
messages
• Compiling to JVM
• DAG
Enterprise Readiness
• Basic statistics
• Metadata change audits
• Dual role security (admin vs. non
admin)
Integrations & Interoperability
•
•
•
•
•
•
•
•
•
Databases
Logs
Flat files
Social Media data such as Twitter
Streaming data sources - Kafka, MQ
NoSQL DBs
Applications
Reporting tools
HDFS
Ease of Use
• Simplified installation process
• Auto migration from older
versions
• Support for changing Hadoop
stack
• Developer utilities (PMML test,
clean utility, Json validation)
• UI Prototype
© 2016 LigaData, Inc. All Rights Reserved. | 30
Kamanja Roadmap – Future State
Features targeted over the next 6 months
Models
• Languages - Java, Scala, PMML, Json,
Python
• Data mining tools – 9 PMML Producers
validated – R, KNIME, Rapid Miner, SAS
Enterprise Miner, Spark MLlib, IBM SPSS,
Salford Systems, Tibco, Angoos
Integrations & Interoperability
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Databases
Logs
Flat files
Social Media data such as Twitter
Streaming data sources - Kafka, MQ
NoSQL DBs
Applications
Reporting tools
HDFS
AVRO format
HTTP/UDP end points
Flume
Graph DB
Elastic Search
Performance/Scalability
• Leverage Big Data stack
• Parallel processing of models &
messages
• Compiling to JVM
• DAG
• Optimized DAG
• Distributed and hieratical cache
utilization
• Support for logical partitions
• SLA aware
Ease of Use
• Simplified installation process
• Auto migration from older
versions
• Support for changing Hadoop
stack
• Developer utilities (PMML test,
clean utility, Json validation)
• UI Prototype complete
• Model Management
• Administration/Monitoring
• Rule/Model Development
• IDE Integration
• Model testing & validation
Enterprise Readiness
• Basic statistics
• Metadata change audits
• Dual role security (admin vs. non
admin)
• Multi-tenancy
• Encryption & tokenization
• Security
• Auditing/Data Lineage
• Monitoring
• Integration with popular monitoring
tools
• Integrate with Resource Managers
(ex. YARN)
• No shutdown upgrades
• Support multiple storages (ex.
Cassandra & Hbase; Hbase & Oracle)
© 2016 LigaData, Inc. All Rights Reserved. | 31
The Future of Continuous Decisioning |
Focus of Development for Models
Models
Existing Features
• Languages - Java, Scala,
PMML, JSON
• Data mining tools – 2 PMML
Producers validated – R,
KNIME
Planned Features
• Additional Languages - Python
• Additional Data mining tools – 9
PMML Producers validated –
Rapid Miner, SAS Enterprise
Miner, Spark MLlib, IBM SPSS,
Salford Systems, Tibco, Angoos
• Support commonly used language for developing custom data mining models
• Consume models produced by widely used data mining tools to reduce adoption barriers
© 2016 LigaData, Inc. All Rights Reserved. | 32
The Future of Continuous Decisioning |
Focus of Development for Ease of Use
Ease of
Use
Existing Features
• Simplified installation process
• Auto migration from older versions
• Support for changing Hadoop stack
• Developer utilities (PMML test, clean
utility, Json validation)
• UI Prototype complete
Planned Features
• IDE Integration
• Model testing & validation
• Model Management
• Administration/Monitoring
• Rule/Model Development
• Simplify model change management process for system admins and power users
• Enable easier management and monitoring of production systems
• Reduce model development complexity by allow developers to utilize standard tools and reduce time
to market
© 2016 LigaData, Inc. All Rights Reserved. | 33
The Future of Continuous Decisioning | Focus of
Development for Integrations & Interoperability
Integrations &
Interoperability
Existing Features
• Databases
• Logs
• Data Warehouses
• Flat files
• Social Media data such as Twitter
• Streaming data sources - Kafka, MQ
• NoSQL DBs
• Files
• Applications
• Reporting tools
• HDFS
Planned Features
• AVRO format
• HTTP/UDP end
points
• Flume
• Graph DB
• Elastic Search
• Support wider range of data sources and transports used in the enterprise
• Consume and produce standard formats to enable better interoperability across systems
• Provide ability to integrate with advanced analytical tools to populate data/decisions in real time
© 2016 LigaData, Inc. All Rights Reserved. | 34
The Future of Continuous Decisioning | Focus of
Development for Performance & Scalability
Performance &
Scalability
•
•
•
•
Existing Features
• Leverage Big Data stack
• Parallel processing of models
& messages
• Compiling to JVM
• DAG
Planned Features
• Optimized DAG
• Distributed and hieratical
cache utilization
• Support for logical partitions
• SLA aware
Enable dynamic scaling by decoupling parallel processing from physical partitioning
Support SLA critical applications by providing priority based executions
Dynamically adjust execution pipelines to optimize performance
Optimize performance of Kamanja on Hadoop storage
© 2016 LigaData, Inc. All Rights Reserved. | 35
Focus of development to increase ease of use
Key Developments
Production
• UI for Model Management
• UI for Administration/Monitoring
Development
• UI for Rule/Model Development
• IDE Integration
• Model testing & validation
Rationale
• Simplify model change management process for system admins and power users
• Enable easier management and monitoring of production systems
• Reduce model development complexity by allow developers to utilize standard tools and reduce time to
market
© 2016 LigaData, Inc. All Rights Reserved. | 36
Focus of development to expand model support
Key Developments
• Languages - Python
• Data mining tools – Additional PMML Producers validated – Rapid Miner, SAS Enterprise Miner, Spark
MLlib, IBM SPSS, Salford Systems, Tibco, Angoos
Rationale
• Support commonly used language for developing custom data mining models
• Consume models produced by widely used data mining tools to reduce adoption barriers
© 2016 LigaData, Inc. All Rights Reserved. | 37
The Journey to Continuous Decisioning
Big Data for Detection and Investigation
Taking action at the soonest possible moment,
based on all incoming events and historical data,
leveraging the most sophisticated predictive models
Data Analysis
Data
Management
Retain the logs
Meet compliance
requirements
Retrospective
analysis
Canned reports
Near Time
Alerting
Continuous
Decisioning
Real time alerts
Streamlined data
pipeline
Fully integrated
w/workflow
Custom alerts
Ability to learn,
iterate models
easily
© 2016 LigaData, Inc. All Rights Reserved. | 40
Continuous Decisioning: Kamanja
Real-time
Ingestion
(Structured /
Unstructured)
Correlate Historical
Reference Data Lake,
Scoring Models
Event Decisioning
Business Rules,
Pattern Analysis
Notifcation
(Visualization,
Alert, Case Mgmt
and Action)
TECH
© 2016 LigaData, Inc. All Rights Reserved. | 41
LigaData transforms
how enterprises
leverage their data
using open source
Big Data technologies.
WHO WE ARE
Founded by former Yahoo Executives
Led data technology innovation at Yahoo
Grew a $3 billion business by detecting
signals in Web data
Over 40+ patents in data
technologies
WHAT WE DO
Take our deep data experience and focus on
the challenges of the financial services
industry
Implement continuous decisioning for threat
detection and compliance on a robust open
source technology stack
© 2016 LigaData, Inc. All Rights Reserved. | 42
OPTION 2
Kamanja | Current Feature Set
Ease of Use
• Simplified
installation process
• Auto migration from
older versions
• Support for
changing Hadoop
stack
• Developer utilities
(PMML test, clean
utility, Json
validation)
• UI Prototype
Integrations &
Interoperability
• Databases
• Logs
• Flat files
• Social Media data
such as Twitter
• Streaming data
sources - Kafka,
MQ
• NoSQL DBs
• Applications
• Reporting tools
• HDFS
Models
• Languages Java, Scala,
PMML, Json,
• Data mining tools
– 2 PMML
Producers
validated – R,
KNIME
Enterprise
Readiness
• Basic statistics
• Metadata change
audits
• Dual role security
(admin vs. non
admin)
Performance &
Scalability
• Leverage Big
Data stack
• Parallel
processing of
models &
messages
• Compiling to JVM
• DAG
© 2016 LigaData, Inc. All Rights Reserved. | 43