Big Data Analytics and Machine Learning in Aerospace

Download Report

Transcript Big Data Analytics and Machine Learning in Aerospace

L A N G L E Y
R E S E A R CH
C E N T E R
Big Data Analytics and Machine Learning in Aerospace
Manjula Ambur, Lin Chen, Charles Liles,
Robert Milletich, Daniel Sammons, Ted Sidehamer, and Jeremy Yagle
NASA Langley Research Center
March 17, 2016
1
Outline
• NASA Overview
• Big Data Analytics and Machine Learning Background
• Vision of Virtual Experts/Assistants
• Data Intensive Scientific Discovery
• Knowledge Analytics/Cognitive Computing
• Projects towards Virtual Assistants
• Aerospace Data Assistants and Algorithms
• Aerospace Knowledge Assistants and Software
• Progress, Insights, and Challenges
• Team Acknowledgements
2
NASA Vision and Centers
NASA Vision: “To reach for new heights and reveal the unknown so that what we do and learn will benefit all humankind”
3
NASA Langley Technical Areas
Mission: NASA Langley is a research, science, technology and development center that provides game-changing innovations to enable
NASA to make significant contributions to the nation
Aerosciences
Measurement Systems
Systems Analysis & Concepts
Entry, Descent & Landing
Atmospheric Characterization
Intelligent Flight Systems
Advanced Materials & Structural Systems
What is Big Data Analytics and Machine Intelligence?
Big Data Analytics
Data whose scale, diversity and
complexity requires new techniques and
algorithms to manage it and extract value
and knowledge
Four V’s of Big Data Analytics
Volume:
Velocity:
Variety:
Veracity:
scale of data
analysis of streaming data
numerous forms of data
mitigating uncertainty of data
Machine Learning
Algorithms capable of learning from both
data and human interaction to enable
insights and make predictions
Machine Intelligence
An autonomous entity that can observe
and act upon its environment and make
decisions like humans
5
Why Big Data is a Big Deal
Convergence of Factors : Data, Technology, and Thinking
Machine Learning
Deluge of Data
Compute Power
Data-based
decisions
6
Why Big Data is a Big Deal: Huge Investments
Federal Research:
~ $1 Billion
Universities: Data Analytics
Programs
Transforming Medical Diagnosis
and Research
Google, Microsoft, IBM, Facebook:
Big Investments : ~ Many Billions
Brain Initiatives: EU and US
Being used at Boeing, GE,
Lockheed Martin, DOE, NASA…
7
Why Big Data is a Big Deal: Real World Examples
GE Maintenance
Air Traffic Management
Aerodynamics in Formula One
Racing
Oncology advisor
Recommendations: Google,
Amazon, Netflix
CERN
Asteroids and Stars
8
NASA Langley Comprehensive Digital Transformation
Vision: Catalyst to Enable Transformative Solutions to NASA Mission Challenges




Goals:
Accelerated Scientific Innovations and Discoveries
Focused, Relevant Research and Technology Development
Intelligent and Rapid Engineering and System designs
Virtual Analysis, Design, and Verification of Aerospace
Systems and Science Instruments
Core Capabilities - Emphasis Areas
Modeling & Simulation (M&S)
• Integrated analysis and design of complex systems
• Facilitate improved physics-based discipline tools
• Optimally combine testing and M&S
High Performance Computing (HPC)
• Next generation software development
• Rapid Compute power for M&S and BDA&MI
• Architecture for real-time analysis and design
Big Data Analytics & Machine Intelligence (BDA&MI)
• Rapid synthesis of global scientific info. for new insights
• Data intensive scientific discoveries for advanced designs
• Virtual Experts: Human-machine symbiosis
Advanced Information Technology
• Open, secure collaboration for synergy
• Networks handle burgeoning data
• Data governance, architecture, and management
Collaboration and Partnership is Paramount -NASA, OGA, Industry, Universities
9
Big Data Analytics & Machine Intelligence Capability
Vision: Virtual Research and Design Partner
Enable NASA employees to achieve greater scientific discoveries and systems innovations
10
LaRC 2035
Vision: Virtual Research and Design Partner
Goal: Productivity x3 by 2035
Enable NASA employees to achieve significantly greater scientific and engineering
discoveries, and systems innovation and optimization
• Able to quickly digest the latest research innovations and leverage insights
• Deep analysis of world-wide multimedia scientific information and data enabling
discovery of trends, unobvious relationships, and possible paths with evidence
• Ability to ask engineering design-related questions and get reliable answers
• Process modeling and simulation data in real time for effective/efficient testing
• Accelerated ideation & design; increase research productivity
Research & Design
Faster and Smarter with
Minimal Time/Effort
11
Two Key Areas for Virtual Partner/Assistants
Deriving new insights, correlations, and discoveries not
otherwise possible from diverse experimental and
computational data
The Fourth Paradigm
Obtaining insights, identifying trends, aiding in discovery,
and finding answers to specific questions by mining
knowledge from scholarly, web, and multimedia content
Cognitive Computing
12
Data Intensive Scientific Discovery Projects
Aerospace Data Assistants
13
Data Intensive Scientific Discovery
Emergence of a Fourth Paradigm…
A thousand years ago:
Experimental Science
Description of natural phenomena; Galileo
Last few hundred years: Theoretical Science
Newton’s Laws, Maxwell’s Equations
Last few decades:
Computational Science
Simulation of complex phenomena
Today:
Data Intensive Science
Discoveries and insights from Data
Source: The Fourth Paradigm: Data-Intensive Scientific Discovery by Microsoft Research
14
Aerospace Data Assistants: Projects/Pilots
Data Intensive Scientific Discovery (DISD)
Deriving new insights, correlations, and discoveries
from diverse experimental and computational data sets
Anomaly Detection in the NonDestructive Evaluation of Materials
Predicting Flutter
from Aeroelasticity Data
Rapid Exploration of Aerospace
Designs
Cognitive Assessment of Crew State
Monitoring
Climate Data Fusion and Analysis
Entry Descent Landing Trajectory
Analysis
15
Aerospace Data Assistants Projects – 1 of 2
Anomaly Detection in Non-Destructive
Evaluation of Materials Images
Develop techniques and algorithms to automatically detect anomalies during
the nondestructive evaluation of materials
Predicting Flutter from
Aeroelasticity Data
Develop methods to automatically detect the onset of flutter during wind
tunnel testing
Goals
• Significantly reduce SME analysis time and assist experts in discovering
additional anomalies
• Help to design better material compositions and structures
Goals
• Find new ways of predicting flutter in the time domain
• Identify non-traditional predictor variables and unseen patterns
• Better understand precursors to flutter and improve configurations
Techniques
• Two-Dimensional Regression designed to detect anomalous pixels
• Convolutional Neural Networks to classify the image data
Techniques
• Piecewise Regression to locate peaks, track coalescence of structural modes
• Time Series Motifs to identify signatures in the data that could represent
precursors to flutter
Accomplishments & Next Steps
• Algorithms are validated with real data sets and further enhanced
• Deliver a tool with a good UI for SMEs to use as an ‘Assistant’ for anomaly
detection of composite materials analysis
Accomplishments & Next Steps
• Peak detection tested with multiple datasets
• Several significant time series motifs detected
• Testing with synthetic data for validation of algorithms
Aerospace Data Assistants Projects – 2 of 2
Cognitive Assessment of Crew State Monitoring
Rapid Exploration of Aerospace Designs
Build classification models for predicting cognitive state using
physiological data collected during flight simulations
Develop a generalized machine learning platform to be used for
analyzing mod-sim data for design optimization
Goals
• Identify unsafe cognitive states in aircrew real-time
• Apply results for more effective pilot training
Techniques
• Ensemble of machine learning tools (deep neural network, gradient
boosting, random forest, support vector machine, decision tree)
• Data pre-processing using detrending and power spectral density
Accomplishments & Next Steps
• Initial data mapping, statistical analysis, and signals processing
• Explore combining multiple signal models using ensembling
• Developing models from test subjects data from different days
Goals
• Provide surrogate modeling to explore the trade space of aerospace
vehicle designs with easy-to-use web interface
• Use fast machine learning models instead of computationallyintensive code for rapid exploration and optimization
Techniques
• Supervised machine learning algorithms, SVM, and Neural Networks
trained on labeled data
Accomplishments & Next Steps
• Python 2.7 with SKLearn algorithms are being used
• Web interface using PHP being developed for SME use
Aerospace Data Analytics: Challenge of Physics-Based Algorithms
Current State
Being Developed
SME-defined
subset of data
All Data
SME relies on traditional methods to pre-select
data; Requires expertise and is time-consuming
Data
Analytics
Team
and
SMEs
working
together
Algorithms that mimic SME knowledge:
• Validate the algorithm
• Save SME time
Application of algorithms to data,
and to other legacy datasets
Yields New Insights
Being Developed
Data Mining techniques
to detect patterns and
correlations which will
be validated by SMEs
Long Term Vision
Virtual Expert
Autonomous Assistant to SME that analyzes all
possible data and augments decision making 18
Data Intensive Scientific Discovery Projects
Aerospace Data Assistants
Algorithms and Techniques
19
Linear regression
Application 1:
Non-Destructive Evaluation (NDE) Image analysis
Goal:
Automate delamination detection
Method: Fit data with linear regression and detect outlier
regions. Regression performed on 1D and 2D signals; Uses C
++ and R
Application 2:
Aeroelastic Flutter Data Analytics
Goal:
Detect precursors and onset of aeroelastic flutter
Method: Fit best quadratics between structural modes to
detect mode coalescence; Uses MatLab
Top: Linear regression of 1D-signals for anomaly detection in carbon
fiber; Bottom: Mode identification in flutter time-series data using linear
regression
Time Series Motifs
Scott, Robert C., et al. "Aeroservoelastic Wind-Tunnel Test of the SUGAR Truss Braced Wing
Wind-Tunnel Model." 56th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials
Conference. 2015.
Application:
Pattern mining of time domain Aeroelastic Flutter Data
Goal:
Identify flutter precursors to:
• Create a dictionary of motifs for a given configuration
• Classify data for use with machine learning algorithms
that will support a real-time ‘Flutter Assistant’
Method: Application of the Motif Enumeration (MOEN)
algorithm created by Dr. Abdullah Mueen; Open-source
framework
Justification: MOEN has been successfully applied to
research problems in other scientific domains including
robotics, biology, and seismology
In order to detect motifs across the various sensor signals, a given sensor’s
output (Signal A) is compared to another sensor (Signal B) by creating a
composite signal (Signal A/B).
The algorithm is then applied to the composite signal to detect the motifs
(above right) common to both sensors. Significant motifs are
Identified by a physics-based selection process and then validated
by SMEs.
Deep learning:
convolutional neural network
(CNN)
Application: NDE Image Analysis to segment
delaminations
From Wang, Changhan, et al. "A unified framework for automatic wound segmentation and analysis with deep convolutional
neural networks." Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE.
IEEE, 2015.
Method: Convolutional encoder/decoder neural
network; end-to-end training to map raw data to
segmentation; Using Caffe
Justification: Very successful in medical image
analysis such as wound segmentation (top right)
Results on Experimental Data
Results on Simulated Data
Artificial neural networks
Application 1: Crew State Monitoring
Goal: Build classification models capable of
accurate, real-time prediction of aircrew
cognitive state using physio data collected
during flight simulations
EEG ECG Respiration Rate Galvanic Skin Response Eye Tracking
Method: ANN trained to classify cognitive
state
Feature Generation
Input Layer
Application 2: Rapid Exploration of Aerospace
Designs (READ)
Hidden Layer
Goal: Build classification / regression models
on user-uploaded data for aerospace designs
Method: train ANNs on labeled data, use
trained models for prediction and visualization
Output Layer /
Classification
“Normal”
State
Channelized
Attention
Diverted
Attention
Startle /
Surprise
Ensemble of Machine Learning Techniques
Application 1: Non-Destructive Evaluation (NDE)
Image analysis
Goal: Automate delamination detection
Method: Combine several machine learning
models into overall prediction using regression
to determine if sample contains a delamination;
Using MatLab and Python Sci Kit
Application 2: Crew State Monitoring
Goal: Build classification models capable of
accurate, real-time prediction of aircrew
cognitive state using physio data collected
during flight simulations
Method: Utilize 2-level Meta Model combining
multiple classification algorithms to improve
classification accuracy; Using Theono & Python
Random
forests
Extremely
random forests
Fully grow k
independent
classification
trees and
combine
predictions
Similar to
random forests
but split per
node is also
randomized
Ada Boost
Fit consecutive
weak learners
based on
classification
tree stumps
and combine
predictions
Gradient
boosting
k – nearest
neighbors
Similar to Ada
Boost with
different loss
function
Identify k
closed points to
test sample
based on
distance metric
EEG ECG Respiration Rate Galvanic Skin Response Eye Tracking
Feature Generation
Level 1 Models
Artificial Neural Network
Gradient Boost Classifier
Random Forest
Level 2 Meta Model
Artificial Neural Network
“Normal” State Channelized Attn
Diverted Attn
Startle /Surprise
Machine Learning Languages and Libraries
Many robust, open-source tools are available and being used - Available for most languages
Have enterprise license for MATLAB and our scientists and engineers use it
Initially, use languages team members are comfortable with
• Solution to problem is more important than language/library
• Allows for efficient exploration of solutions
Once solutions are found, re-implement into single language
• Initial investment leads to significant time-savings (e.g. debugging) down the road
Neural
Networks and
Deep Learning
Regression
Ensemble
Learning
Time Series Motifs
Python
Theano/Keras
-
Scikit-Learn,
XGBoost
-
MATLAB
-
MATLAB
Functions
-
Mex C# Wrapper
Lua
Torch
-
-
-
C/C++/C#
Caffe
Developed inhouse
-
Open source code
Data Intensive Scientific Discovery Projects
Aerospace Knowledge Assistants
26
Two Key Areas for Virtual Partner/Assistants
Deriving new insights, correlations, and discoveries not
otherwise possible from diverse experimental and
computational data
The Fourth Paradigm
Obtaining insights, identifying trends, aiding in discovery,
and finding answers to specific questions by mining
knowledge from scholarly, web, and multimedia content
Knowledge Assistants/Cognitive Computing
27
Knowledge Assistants Using Watson Content Analytics
Key Capabilities
Carbon Nanotubes Research
Analysis of 130,000 articles from a
20-year time span
Autonomous Flight Research
Analysis of 4,000 articles integrating
scholarly and web content
Space Radiation Research
Analysis of 1,000 articles of NASA
research from Human Research Program
Human Machine Teaming, Uncertainty
Quantification, Vehicle Design being worked
• Digest and analyze thousands of articles
without reading
• Rapidly identify groupings, trends,
connections, and experts
• Explore technology gaps that could be
leveraged
• Identify cross-domain leverages and
research
• Successfully demonstrated value and developed robust expertise and capability
• Buying licensed content is a challenge ; Working with NASA content, open content
and individual researchers collections
Knowledge Assistants: Deep Analytics Examples
Automated Document Clustering and
Trend Analysis
Expert Networks
Software: Utilizing IBM Watson Content Analytics
algorithms –K means is a scalable means of clustering large
datasets; Statistical techniques combined with semantic
techniques with visualization provides the analytics power
29
Cognitive Technologies for Aerospace
Using Watson Discovery Advisor
Cognitive Technologies for Aerospace
1. Understand
scientific and domain language
4. Compose and visualize
information at large
Goal: Accelerate the discovery of new insights by
synthesizing information in seconds, and providing
answers with evidence
Develop and Demonstrate two Proof of Concepts Application to our aerospace domains:
• Aerospace Innovation Advisor Proof of Concept
• Program Linkage: CAS
• Example Topics: Hybrid Electric Propulsion; On Demand
Mobility
3. Generate new hypothesis
and discoveries
2. Adapt and learn quickly
from inquiries, results, and
iteration
• Pilot Advisor Proof of Concept
• Program Linkage: SASO
• Flight deck expert system for Root Cause Analysis and advise
ARC, AFRC, and JSC are also investigating use; LaRC is connected
with those efforts
6
Partnerships and Education
Education & Outreach
Partnerships
Universities
•
•
•
•
•
GA Tech: Data Analytics and ML for systems integration
MIT: Computer Science and AI Lab : ML algorithms
University of New Mexico : ML algorithms
ODU: Machine Learning
University of Michigan: Confluence of Mod Sim, HPC, &
Big Data
• Carnegie Mellon: Machine Learning
• University of Washington: Big Data in Aerospace Program
• ~ 14 Seminars & ~ 5 Workshops
• Machine Learning and Analytics
Courses – MatLab; Deep Learning..
• Websites:
• Big Data
Machine Learning Toolbox
Knowledge Analytics
o
Agencies and Industry
•
•
•
•
NASA – Ames, Glenn, JSC, HQ
OGA - IARPA; DOE; ...........
IBM: Cognitive Computing Technologies
Boeing, Lockheed ,….
o
o
Lunch & Learn Sessions
31
Progress and Challenges
Technical community sees value;
Strong collaboration with SMEs
Motivated and multi skilled
team
Diversified Portfolio
Mission-Focused Use Cases
Leveraging Collaborations
& Open Source Tools
Lot of hype and
misperceptions
Understanding Problem
Domain and Data
Applications in Aerospace In
early stages – Treading path..
Mix of Research, Experimentation,
Iteration, and Persistence needed
32
Next Steps
Build Stronger
Linkages/advocacy to Missions
Synergy with Mod-Sim,
HPC & Adv. IT
Formal Collaborations with
Universities, Industry and OGAs
Productize algorithms for
SMEs Use
Broad Buy-in and
Understanding of Value
Frameworks and
Methodology to help Expand
the capability;
Suite of ML tools for broad use
33
Acknowledgements – Big Data Analytics Team
Data Analytics and Machine Learning Expertise:
Manjula Ambur, Lin Chen, Christina Heinich, Charles Liles, Robert Milletich,
Daniel Sammons, Ted Sidehamer, and Jeremy Yagle
Subject Matter Expertise:
Danette Allen, Damodar Ambur, Dale Arney, Trey Arthur, Randy Bailey, Eric Burke,
Jeff Cerro, Kyle Ellis, Christie Funk, Dana Hammond, Angela Harrivel, Jeff Herath,
Jon Holbrook, Patty Howell, Lisa Le Vie, Constantine Lukashin, Alan Pope, Brandi
Quam, Cheryl Rose, Jamshid Samareh, Mark Sanetrik, Rob Scott, Lisa ScottCarnell, Steve Scotti, Walt Silva, Mia Siochi, Chad Stephens, Scott Striepe, Marty
Waszak, Bill Winfree, and Kristopher Wise
8