157 Finding Classes in Flight Data
Download
Report
Transcript 157 Finding Classes in Flight Data
INNOVATIVE RESEARCH
Characteristics in Flight Data
Characteristics in Flight Data
Estimation with Logistic Regression
and Support Vector Machines
ICRAT 2004
Claus Gwiggner, LIX, Ecole Polytechnique Palaiseau
Gert Lanckriet, EECS, University of California, Berkeley
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Flow Management and Planning Differences
Time
slots are distributed
among aircraft to avoid
congestion
•In reality, delays, re-reroutings,
etc. lead to missed time slots
•Not the same number of
aircraft than planned arrive in
sectors:
•safety, lost capacity
Planning Differences
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Related Work
Factors/Causes [ATFM Study, PRR]
Slot
adherence, flight plan quality, in-flight change of
route, ....
Simulations [Ky, Stortz]
Random
noise on departure times
Reactionary Delay [Toulouse Study]
microscopic
model of departure times
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Unknown
Real situation at sector entries
interplay
of factors
compensations of delays
...
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Program
Problem Formulation
Simple Characteristics
Binary Classification
Conclusion
Future Work
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Planning Differences
Planning Differences = Regulated Demand – Real Demand
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
General Problem Formulation
Find 'regularities' of planning differences, useful to
improve the current planning procedure
Why?
Safety, suboptimal used capacity
How?
MACRO
approach: relations between flows, not single
deviations from flight plans
Daily basis, not extreme situations
How?
Data analysis
141
days of week-day data
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Today's Question
Are planning differences of different sectors the 'same'?
if
yes: any model can be greatly simplified
if no: what are the differences?
Difficulty
24
dimensions: one variable for each hour
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Comparison of Planning Differences
No visible regularities in both sectors ...
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Mean and Standard Deviation
...but similar mean and standard deviation over the time
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Hypothesis Tests
H0: same underlying distribution ...
reject
on 1 % level
assumes that statistical properties do not vary over time
.... but what are the characteristics?
e.g.
'if high peaks at noon => sector 1'?
Find a rule that tells whether a sequence of values
belongs to sector 1
Classification
problem
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
(Binary) Classification
Probabilistic
'what
is the probability
that a new item belongs
to sector 1?'
Logistic
Regression
Geometric
'on
which side of the
boundary lies the new
item?'
Support
Vector Machines
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Comparison
Linear Logistic Regression vs SVMs
linear
vs non-linear
simple vs mathematically sophisticated
traditional vs state-of-the-art
probabilistic vs geometric
Common points [Hastie et. al 2003], [Friedman 2003]
SVM
estimator of class probabilities
logistic regression induces linear boundaries
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Experiments on ...
Data from 4 sectors in Upper Berlin airspace
Raw
Data (random permutations)
Data where number of instances in both classes are
balanced
In total 8 experiments conducted
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Model Selection
Report Estimated Prediction Error (EPE)
Model Selection:
Cross-Validation
[Stone 1974]
Wilcoxon-Mann-Whitney Test
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Parameters of SVMs
Kernel functions
Linear,
Gauss, Poly, Linear CN, Gauss CN, Poly CN
Kernel parameters
Cost Function
1
Norm, 2 Norm
In total over 800 combinations possible
best
one estimated by cross validation
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Results
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Summary
characteristics in high dimensional data
comparison of a very simple and a very complicated
classification method
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Conclusions
There are systematic differences between different
sectors
SVMs do not promise major improvement
no
more than 4% better than logistic regression
Linear Prediction is possible
Expected prediction errors around 15 %
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Future Work
(black box) prediction not satisfactory
Better understanding of the underlying processes
reasons
for the differences
model of the probability distribution of planned traffic and
realized traffic
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Questions ?
• Thanks for your attention!
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Results
Is Week End?
Sector
UR1
UR2
UR3
UR4
Raw
Bal+Perm Variable Sel Random
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Known: Causes for Planning Differences
Departure Slot adherence
Inconsistent profile
# over-deliveries
Regulations too late
Weekday, Season
Weather
CASA implementation
time
Slot tolerance window
Missing flight plans
Incorrect flight plan
In flight change of route
information
Source: Independent Study for the Improvement of ATFM, Final Report, 2000
Priorities:
Very High
High
Medium
Unknown
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Little known: Dynamics of Planning Differences
Sector 1
Sector 2
'Error'
Propagation
Sector n
...
X: time
Y: Number of planning differences
Related Work: Simulation studies, reactionary delay studies
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Summary Motivation
Are planning differences unpredictable?
Or are there hidden 'regularities'?
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Possible Research Questions
Propagation over the network
Dependence on traffic density, sector complexity, ...
...
Characteristics
Comparison of different sectors
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Notation
A sector is represented as a vector of 24 variables, one
for each hour
An instance is a value for this vector
An instance belongs to class 1 or -1; dependent on the
sector from which it was drawn
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Binary Classification
●
Given:
Instances
●
●
from sectors 1 and -1
Question:
a rule to decide for a new instance to which sector
it might belong
Example:
if 'high peaks at noon' then class 1
Decision trees
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Geometric and Probabilistic Approaches
Geometric
Instances
are points in
Euclidean space
Rules are class boundaries
Problem:
overlapping
classes
Probabilistic
Classes
have underlying
probability distribution
Rules are class-probabilities
Problem:
example: Instances are 2 dimensional
which distribution?
EUROCONTROL EXPERIMENTAL CENTRE