157 Finding Classes in Flight Data

Download Report

Transcript 157 Finding Classes in Flight Data

INNOVATIVE RESEARCH
Characteristics in Flight Data
Characteristics in Flight Data
Estimation with Logistic Regression
and Support Vector Machines
ICRAT 2004
Claus Gwiggner, LIX, Ecole Polytechnique Palaiseau
Gert Lanckriet, EECS, University of California, Berkeley
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Flow Management and Planning Differences
 Time
slots are distributed
among aircraft to avoid
congestion
•In reality, delays, re-reroutings,
etc. lead to missed time slots
•Not the same number of
aircraft than planned arrive in
sectors:
•safety, lost capacity
Planning Differences
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Related Work

Factors/Causes [ATFM Study, PRR]
 Slot
adherence, flight plan quality, in-flight change of
route, ....

Simulations [Ky, Stortz]
 Random

noise on departure times
Reactionary Delay [Toulouse Study]
 microscopic
model of departure times
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Unknown

Real situation at sector entries
 interplay
of factors
 compensations of delays
 ...
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Program
Problem Formulation
 Simple Characteristics
 Binary Classification
 Conclusion
 Future Work

EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Planning Differences
Planning Differences = Regulated Demand – Real Demand
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
General Problem Formulation

Find 'regularities' of planning differences, useful to
improve the current planning procedure
 Why?
Safety, suboptimal used capacity
 How?
 MACRO
approach: relations between flows, not single
deviations from flight plans
 Daily basis, not extreme situations
 How?
Data analysis
 141
days of week-day data
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Today's Question

Are planning differences of different sectors the 'same'?
 if
yes: any model can be greatly simplified
 if no: what are the differences?

Difficulty
 24
dimensions: one variable for each hour
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Comparison of Planning Differences

No visible regularities in both sectors ...
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Mean and Standard Deviation
...but similar mean and standard deviation over the time
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Hypothesis Tests

H0: same underlying distribution ...
 reject
on 1 % level
 assumes that statistical properties do not vary over time

.... but what are the characteristics?
 e.g.
'if high peaks at noon => sector 1'?
 Find a rule that tells whether a sequence of values
belongs to sector 1
 Classification
problem
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
(Binary) Classification

Probabilistic

 'what
is the probability
that a new item belongs
to sector 1?'
 Logistic
Regression
Geometric
 'on
which side of the
boundary lies the new
item?'
 Support
Vector Machines
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Comparison

Linear Logistic Regression vs SVMs
 linear
vs non-linear
 simple vs mathematically sophisticated
 traditional vs state-of-the-art
 probabilistic vs geometric

Common points [Hastie et. al 2003], [Friedman 2003]
 SVM
estimator of class probabilities
 logistic regression induces linear boundaries
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH

Experiments on ...
Data from 4 sectors in Upper Berlin airspace
 Raw
Data (random permutations)
 Data where number of instances in both classes are
balanced

In total 8 experiments conducted
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Model Selection
Report Estimated Prediction Error (EPE)
 Model Selection:

 Cross-Validation
[Stone 1974]
 Wilcoxon-Mann-Whitney Test
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Parameters of SVMs

Kernel functions
 Linear,
Gauss, Poly, Linear CN, Gauss CN, Poly CN
Kernel parameters
 Cost Function

1

Norm, 2 Norm
In total over 800 combinations possible
 best
one estimated by cross validation
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Results
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Summary

characteristics in high dimensional data

comparison of a very simple and a very complicated
classification method
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Conclusions
There are systematic differences between different
sectors
 SVMs do not promise major improvement

 no

more than 4% better than logistic regression
Linear Prediction is possible
Expected prediction errors around 15 %
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Future Work
(black box) prediction not satisfactory
 Better understanding of the underlying processes

 reasons
for the differences
 model of the probability distribution of planned traffic and
realized traffic
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Questions ?
• Thanks for your attention!
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH

Results
Is Week End?
Sector
UR1
UR2
UR3
UR4
Raw
Bal+Perm Variable Sel Random

 
 
 


 
 
 


 
 
 


 
 
 

EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Known: Causes for Planning Differences
Departure Slot adherence
Inconsistent profile
# over-deliveries
Regulations too late
Weekday, Season
Weather
CASA implementation
time
Slot tolerance window
Missing flight plans
Incorrect flight plan
In flight change of route
information
Source: Independent Study for the Improvement of ATFM, Final Report, 2000
Priorities:
Very High
High
Medium
Unknown
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Little known: Dynamics of Planning Differences
Sector 1
Sector 2
'Error'
Propagation
Sector n
...
X: time
Y: Number of planning differences
Related Work: Simulation studies, reactionary delay studies
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Summary Motivation
Are planning differences unpredictable?
 Or are there hidden 'regularities'?

EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Possible Research Questions
Propagation over the network
 Dependence on traffic density, sector complexity, ...
 ...
 Characteristics
 Comparison of different sectors

EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Notation
A sector is represented as a vector of 24 variables, one
for each hour
 An instance is a value for this vector
 An instance belongs to class 1 or -1; dependent on the
sector from which it was drawn

EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Binary Classification
●
Given:
Instances
●
●
from sectors 1 and -1
Question:
a rule to decide for a new instance to which sector
it might belong
Example:
if 'high peaks at noon' then class 1
Decision trees
EUROCONTROL EXPERIMENTAL CENTRE
INNOVATIVE RESEARCH
Geometric and Probabilistic Approaches

Geometric
 Instances
are points in
Euclidean space
 Rules are class boundaries
 Problem:
overlapping
classes

Probabilistic
 Classes
have underlying
probability distribution
 Rules are class-probabilities
 Problem:
example: Instances are 2 dimensional
which distribution?
EUROCONTROL EXPERIMENTAL CENTRE