b - Computer Science - University of Georgia

Download Report

Transcript b - Computer Science - University of Georgia

SUPPORTING A MODELING CONTINUUM IN
SCALATION
John A. Miller
Michael E. Cotterell
Stephen J. Buckley
University of Georgia
IBM Thomas J. Watson Research Center
Outline
Introduction
● Big Data Analytics
● Relationship to Simulation Modeling
● Modeling Continuum
● Application to Supply Chain Management
● Conclusions and Future Work
●
Introduction
●
Related Disciplines
–
–
–
–
●
Analytics
Data Mining
Machine Learning
Simulation Modeling
So What's New
–
–
–
–
–
Massive Amounts of Data
Web Accessible Data
Meta-data and Semantics
Availability of Multi-core Clusters
High-level Programming Environments
Era of Big Data
●
Sources of Big Data
Scientific Experiments: Large Hadron Collider
– Business Transactions: IBM Analytics
– Wireless Sensor Networks: Environment
– Social Networks: twitter-2010
– Public: www.google.com/publicdata, www.bigdata–
startups.com/public-data,
www.kdnuggets.com/datasets
●
3Vs of Big Data
–
Volume (TB+), Variety, Velocity (Streams)
Era of Big Data
●
Distributed Data
–
–
–
●
Distributed Databases (e.g., HP Vertica)
Distributed File Systems (e.g., HDFS)
Large Matrices, Sparse Matrices and Graphs
Computational Models for Clusters
–
–
–
–
Map-Reduce (e.g., Hadoop)
Bulk Synchronous Parallel (BSP)
Asynchronous Parallel
Message Passing (e.g., MPI, Akka)
Big Data Analytics in ScalaTion
●
Scala
–
–
–
●
Object-Oriented Functional Language
Java-based, but 3x more concise
Support for
• Parallel Computing (ParArray, .par)
• Distributed Computing (Akka)
ScalaTion
–
Multi-paradigm Modeling using Scala
• Simulation, Analytics, Optimization
–
High-Level and concise like MATLAB and R
Big Data Analytics in ScalaTion
●
Prediction: y = f(x, t; b)
–
–
–
●
Regression (REG),
Nonlinear Regression (NRG),
Neural Nets (NN), ARMA Models
Classification: c = f(x, b)
–
–
–
–
–
Logistic Regression (LRG)+,
k-Nearest Neighbors (kNN),
Naive Bayes (NB), Bayesian Networks (BN),
Support Vector Machines (SVM),
Decision Trees (DT)
+
also used for prediction
Simulation in ScalaTion
●
●
●
●
●
Event-Scheduling (ES)
Process-Interaction (PI)
Activity Models (AM)
State-Transition Models (ST)
System Dynamics (SD)
Big Data and Simulation
●
Relationships
–
–
–
●
Simulation models make data, data make better
simulation models
Analytics: more data rich
Simulation: more knowledge rich
Building Simulation Models
–
Determination of Components
Analysis of Components
• “Small Data Analytics”
–
How will “Big Data” impact this process?
–
Modeling Continuum: Structural
Richness
Hierarchical Models
Gen Linear Mod
NB kNN REG ARMA
Prob Graph Models
NN
BN
low
high
ES
ST
SD
AM
Simulation Models
PI
Analytics and Simulation
Low fidelity
approx
Analytics
Techniques
Complex System or
Process
Data extraction
Statistics
Optimizers
Induction
Calibration
Output
Knowledge
Ontologies
Model building
Simulation
Models
High fidelity
approx
Application to Supply Management
●
●
●
Forecasting
–
Time-dependent predictive analytics techniques
–
Forecasts feed supply change process
–
Satisfy demand on a continuing basis
Simulation
–
Simulate various scenarios (changes in
Supply/Demand, etc.) to determine effects
–
Use both forecasting and simulation to make decisions
Three Case Studies
–
To illustrate the point
IBM Europe PC Study
●
Item
IBM Asset Management Tool
●
Item
IBM Pandemic Business Impact
Modeler
●
Item
Conclusions
●
Impact of Big Data
–
●
●
●
Must effectively handle and utilize massive data
Role of Simulation in Big Data
–
Organizing data
–
Generating/evaluating scenarios
–
Supporting better decision making
Role of Big Data in Simulation
–
Increasing model richness/fidelity
–
Better model calibration
–
Hybrid systems
Emerging Discipline of Data Science
Future Work
●
Featured Minitrack at WSC 2014
–
–
–
Big Data Analytics and Decision Making
Leverage the 3Vs to make better decisions
Applications areas:
•
Atomic physics, weather, power grids,
traffic networks, urban populations, etc.
Questions