An adaptive modular approach to the mining of sensor network data
Download
Report
Transcript An adaptive modular approach to the mining of sensor network data
An adaptive modular approach to the
mining of sensor network data
G. Bontempi, Y. Le Borgne (1)
{gbonte,yleborgn}@ulb.ac.be
Machine Learning Group
Université Libre de Bruxelles – Belgium
(1) Supported by the COMP2SYS project, sponsored by the HRM program of the European Community
(MEST-CT-2004-505079)
Outline
Wireless sensor networks: Overview
Machine learning in WSN
An adaptive two-layer architecture
Simulation and results
Conclusion and perspective
Y. Le Borgne
2
Sensor networks : Overview
Goal : Allow for a sensing task over an
environment
Desiderata for the nodes:
Autonomous power
Wireless communication
Computing capabilities
Y. Le Borgne
3
Smart dust project
Smart dust: Get mote size down to 1mm³
Berkeley - Deputy dust (2001)
6mm³
Solar powered
Acceleration and light sensors
Optical communication
Low cost in large quantities
Y. Le Borgne
4
Current available sensors
Crossbow : Mica / Mica dot
uProc: 4Mhz, 8 bit Atmel RISC
Radio: 40 kbit 900/450/300 MHz or
250 kbit 2.5GHz (MicaZ 802.15.4)
Memory: 4K RAM / 128 K Program Flash /
512 K Data Flash
Power: 2 x AA or coin cell
Intel : iMote
uProc: 12Mhz, 16 bit ARM
Radio: Bluetooth
Memory: 64K SRAM / 512 K Data Flash
Power: 2 x AA
MoteIV : Telos
uProc: 8Mhz, 16 bit TI RISC
Radio: 250 kbit 2.5GHz (802.15.4)
Memory: 2 K RAM / 60 K Program Flash /
512 K Data Flash
Power: 2 x AA
Y. Le Borgne
5
Applications
Wildfire monitoring
Ecosystem monitoring
Earthquake monitoring
Precision agriculture
Object tracking
Intrusion detection
…
Y. Le Borgne
6
Challenges for…
Electronics
Networking
Systems
Data bases
Statistics
Signal processing
…
Y. Le Borgne
7
Machine learning and WSN
Local scale
Spatio-temporal correlations
Local predictive model identification
Can be used to:
Reduce sensor communication activity
Predict values for malfunctioning sensors
Y. Le Borgne
8
Machine learning and WSN
Global scale
The network as a a whole can achieve
high level tasks
Sensor network <-> Image
Y. Le Borgne
9
Supervised learning and WSN
Classification (Traffic type classification)
Prediction (Pollution forecast)
Regression (Wave intensity, population
density)
Y. Le Borgne
10
A supervised learning
scenario
Ѕ: Network of S sensors
x(t)={s1(t),s2(t),…sS(t)} snapshot at time t
y(t)=f(x(t))+ε(t) the value associated to S
at time t (ε standing for noise)
Let DN be a set of N observations
(x(t),y(t))
Goal : Find a model that predicts y for any
new x
Y. Le Borgne
11
Centralized approach
High transmission overhead
Y. Le Borgne
12
Two-layer approach
Use of compression
Reduce transmission overhead
Spatial correlation induces low loss in compression
Reduction of learning problem dimensionality
Y. Le Borgne
13
Two-layer adaptive approach
PAST : Online compression
Lazy learning : Online learning
Y. Le Borgne
14
Compression : PCA
PCA:
Transform the set of n input variables
m variables , m<n.
Linear transformation :
,
Variance preserving maximization
, into a set of
Solution :
m first eigenvectors of x correlation matrix, or
Minimization of
Y. Le Borgne
15
PAST – Recursive PCA
Projection approximation subspace
tracking [YAN95]
Online formulation:
Low memory requirement and
computational complexity :
O(n*m)+O(m²)
Y. Le Borgne
16
PAST Algorithm
Recursive formulation: [HYV01]
Y. Le Borgne
17
Learning algorithm
Lazy learning: K-NN approach
Storage of observation set:
When a query q is asked, takes the k nearest neighbours
to q:
Builds a local linear model:
Computes the output at
Y. Le Borgne
, such that
by applying
18
How many neighbours?
•y=sin(x)+e
•e : Gaussian noise
with σ=0.1
•What is the y value
at x=1.5?
Y. Le Borgne
19
How many neighbours?
•K=2 : Overfitting
Y. Le Borgne
20
How many neighbours?
•K=2 : Overfitting
•K=3 : Overfitting
Y. Le Borgne
21
How many neighbours?
•K=2: Overfitting
•K=3: Overfitting
•K=4: Overfitting
Y. Le Borgne
22
How many neighbours?
•K=2: Overfitting
•K=3: Overfitting
•K=4: Overfitting
•K=5: Good
Y. Le Borgne
23
How many neighbours?
•K=2: Overfitting
•K=3: Overfitting
•K=4: Overfitting
•K=5: Good
•K=6: Underfitting
Y. Le Borgne
24
Automatic model selection
([BIR99],[BON99],[BON00])
Starting with a low k, local models are
identified
Their quality is assessed by a leave one out
procedure
The best model(s) are kept for computing the
prediction
Low computational cost
PRESS statistics (ALL74)
Recursive least squares ([GOO84])
Y. Le Borgne
25
Advantages of PAST and lazy
No assumption on the process
underlying data
On-line learning capability
Adaptive with non-stationarity
Low computational and memory costs
Y. Le Borgne
26
Simulation
Modeling wave propagation phenomenon
Helmholtz equation:
k is the wave number
•2372 sensors
•30 k values between 1
and 146; 50 time instants
•1500 Observations
•Output k is noisy
Y. Le Borgne
27
Test procedure
Prediction error measurement
Normalized Mean Square Error (NMSE)
10-fold cross-validation (1350/150)
Example of
learning curve:
Y. Le Borgne
28
Experiment 1
Centralized configuration
Comparison PCA / PAST for 1 to 16 first
principal components
Y. Le Borgne
29
Results
m
1
2
3
4
5
6
8
12
16
NMSE
PCA
0.621 0.266
0.181 0.144 0.138 0.134 0.133 0.124 0.116
NMSE
PAST
0.782 0.363
0.257 0.223 0.183 0.196 0.132 0.124 0.115
•Prediction accuracy similar if number of
principal components sufficient
Y. Le Borgne
30
Clustering
The number of clusters involves a tradeoff between
The routing costs between clusters and
gateway
The final prediction accuracy
The robustness of the architecture
Y. Le Borgne
31
Experiment 2
Partitioning into geographical clusters
P varies from P(2) to P(7)
2 main components for each cluster
Ten-fold cross-validation – 1500 data
Example of
P(2) partitioning
Y. Le Borgne
32
Results
P(2)
NMSE
P(3)
P(4)
P(5)
P(6)
P(7)
0.140 0.118 0.118 0.118 0.116 0.114
•Comparison of P(2) (Top) and
P(5) (bottom) error curves
•As number of cluster increases:
•Better accuracy
•Faster convergence
Y. Le Borgne
33
Experiment 3
Simulation: at each time instant
Probability 10% for a sensor failure
Probability 1% for a supernode failure
Recursive PCA and lazy learning deals
efficiently with input space dimension
variations
Robust with random sensor malfunctioning
Y. Le Borgne
34
Results
P(2)
NMSE
P(3)
P(4)
P(5)
P(6)
P(7)
0.501 0.132 0.119 0.116 0.116 0.117
•Comparison of P(2) (Top) and
P(5) (bottom) error curves
•The number of clusters
increases the robustness
Y. Le Borgne
35
Experiment 4
Time varying changes in sensor
measures
2700 time instants
Sensor response decreases linearly from a
factor 1 to a factor 0.4
A temporal window:
Only the last 1500 measures are kept
Y. Le Borgne
36
Results
•Due to the concept drift, the
fixed model (in black) becomes
outdated
•The lazy characteristic of the
proposed architecture can deal
with this drift very easily
Y. Le Borgne
37
Conclusion
Architecture:
Yielding good results compared to batch
equivalent
Computationally efficient
Adaptive with appearing and disappearing
units
Handling easily non-stationarity
Y. Le Borgne
38
Future work
Extensions of tests to real-world data
Improvement of clustering strategy
Taking costs (routing/accuracy) into consideration
Making use of ad-hoc feature of the network
Test of other compression procedures
Robust PCA
ICA
Y. Le Borgne
39
References
Smart Dust project: http://www-bsac.eecs.berkeley.edu/archive
/users/warneke-brett/SmartDust/
Crossbow: http://www.xbow.com/
[BON99] G.Bontempi. Local Techniques for Modeling, Prediction and
Control. PhD Thesis, IRIDIA- Université Libre de Bruxelles, 1999.
[YAN95] B. Yang. Projection Approximation Subspace Tracking, IEEE
Transactions on Signal Processing, 43(1):95-107,1995.
[ALL74] D.M. Allen. 1974. The relationship between variable and data
augmentation and a method of prediction. Technometrics, 16, 125-127
[GOO84] G.C. Goodwin & K.S. Sin. 1984. Adaptive filtering Prediction and
Control. Prentice-Hall.
[HYV01] Independent Component Analysis. A. Hyvarinen, J. Karhunen, E.
Oja. 2001.
Y. Le Borgne
40
References on lazy learning
[BIR99] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets the
recursive least square algorithm. In M. S. Kearns, S.a. Solla, and D.a.
Cohn, editors, NIPS 11, pages 375-381, Cambridge,1999, MIT Press.
[BON99] G. Bontempi, M.Birattari, and H.Bersini. Local learning for iterated
time-series prediction. In I. Bratko and S. Dzeroski, editors, Machine
Learning : Proceedings of the 16th International Conference, pages 3238, San Francisco, CA, 1999. Morgan Kaufmann Publishers.
[BON00] G. Bontempi, M.Birattari, and H. Bersini. A model selection
approach for local learning. Artificial Intelligence Communications,
121(1), 2000.
Y. Le Borgne
41
Thanks for your attention!
Y. Le Borgne
42