e-Science in Engineering and Industry

Download Report

Transcript e-Science in Engineering and Industry

Pattern Matching in DAME using
AURA technology
Jim Austin, Robert Davis, Bojian Liang, Andy Pasley
University of York
Overview
•
•
•
•
•
•
Context
AURA technology
DAME pattern matching problem
AURA solution
Search performance
Next steps
Distributed Aircraft Maintenance Environment - DAME
Context
• Vibration data from all engines in flight
• Detection of unusual vibration patterns
– Novelties, anomalies
– Automatic or manual
 Search for similar vibration behaviour
– Need to search large volumes of historical vibration
data
• Investigate search results and associated data
– Service data records
– CBR tools: Sheffield
Distributed Aircraft Maintenance Environment - DAME
AURA technology
• AURA
– Proven technology for searching large data sets
– Ability to scale and maintain performance
– Easily parallelised
• Examples
– Address matcher
– Molecular matcher
• Operation
– Vectors compared to stored examples
– Uses bit level comparison methods
– Correlation Matrix Memory operations
Distributed Aircraft Maintenance Environment - DAME
AURA architecture
Input pattern
Search
Store
Data Adaptor
Candidate Selector
binary
AURA
SearchEngine
Output pattern
Store
Result
Indexer
Results
Candidate Engine
Indexes or Data
(Back check)
Distributed Aircraft Maintenance Environment - DAME
AURA storage & recall
Input pattern
binary
AURA
SearchEngine
Output pattern
Correlation Matrix Memories
2 1 0 2 0 0 0
*
*
Distributed Aircraft Maintenance Environment - DAME
AURA software
• AURA re-designed
– To improve performance of the AURA library in terms of both
memory usage and search times
• 3 fold reduction in memory
• 3 fold reduction in search time
– To make the library easy to use
• Simple API
• Typically only 4 or 5 API calls used
• Enable implementation as an OGSI GT3 service
– To engineer the library to commercial software standards
• Comprehensive user guide and reference manual
Distributed Aircraft Maintenance Environment - DAME
Pattern matching problem
• Vibration data from sensors forms Z-mod data.
• Tracked orders extracted from Z-mod data
Amplitude
Frequency
Tracked
order
Time
Time
Distributed Aircraft Maintenance Environment - DAME
Pattern matching problem
• Novelty or anomaly identified in tracked order data
by feature detectors
Forms Query subsequence
Distributed Aircraft Maintenance Environment - DAME
Pattern matching problem
• Search for sub-sequences similar to the query in a
large volume of tracked order data.
–
–
–
–
Need to investigate all possible alignments
Benchmark method is sequential scan
Noisy data: imprecise matching required
Various possible similarity measures
• Euclidian distance
• Correlation
Distributed Aircraft Maintenance Environment - DAME
AURA solution
Encoded
Query
AURA Search
Engine
Encoded
Time Series
Candidate Matches
Query
Time Series
AURA
Backcheck
Stored
Time series
Results
Distributed Aircraft Maintenance Environment - DAME
AURA solution
• Encoding: reduction in dimensionality
– e.g. from 100pts to 10 values.
• Approximate search
– From ~ 1,000,000s of alignments down to ~1000s of
candidate matches
• Backcheck
– From ~1000s candidate matches to 100 or fewer results
Distributed Aircraft Maintenance Environment - DAME
Encoding technique
Y-Axis
• Piecewise Aggregate Approximation
• Values encoded using integer bins
X-Axis
Distributed Aircraft Maintenance Environment - DAME
Search efficiency
• Approximate search using AURA
– Fast method of discarding poor matches
– AURA search typically an order of magnitude or more faster
than sequential scan.
– Candidate matches typically <1% of total.
– Back check stage very efficient due to reduction in volume
of data
• typically 1% or less of processing time for full sequential scan.
Distributed Aircraft Maintenance Environment - DAME
Data size
• Assume
–
–
–
–
Fleet of 100 aircraft, 4 engines each
Flying 10 hours per day
5 data points per tracked order per second
4 bytes per data point
• Totals
– approx. 100 GigaBytes per year per tracked order
– Roughly 10 tracked orders of interest so…
• Total approx. 1 TeraByte per year
Distributed Aircraft Maintenance Environment - DAME
Search performance
• Deployed system assumptions
– 100 CPUs 2GHz each with 1GByte RAM.
• One per aircraft
– Each search needs to check 25,000,000,000 alignments of
the query per year of tracked order data.
• Sequential scan
– Measured at approx. 2 seconds for 5,000,000 alignments of
a 100 data point query (one CPU).
– Extrapolates to approx. 500 seconds to search 5 years of
data assuming 1 CPU per aircraft
– This is too slow!  Need to support multiple searches and
searches on more than one tracked order.
Distributed Aircraft Maintenance Environment - DAME
Search performance
• Using AURA and PAA based approach
– Search time reduced by approx an order of magnitude.
– Can search 5 years of data for 100 aircraft in approx:
50 seconds
– Believe this to be a workable solution

– But response times potentially slower than this
• Need to handle a number of searches in parallel
• Communications and other overheads
Distributed Aircraft Maintenance Environment - DAME
Next steps
•
Technology
– Refine similarity measures and encoding methods.
•
Architecture
– Develop additional services to distribute and organise the
search
– Support multiple searches in parallel
•
Measurement
– Perform scaling trials on engine data
– Obtain better estimates of overall performance
•
•
Multiple searches
Overheads
Distributed Aircraft Maintenance Environment - DAME