IoT and Machine Learning

Transcript IoT and Machine Learning

International Workshop on Big Data Applications and Principles
Madrid
By Ajit Jaokar
Sep 2014
@ajitjaokar
[email protected]
0
Copyright : Futuretext Ltd. London
Ajit Jaokar
-
www.opengardensblog.futuretext.com
World Economic Forum
Spoken at MWC(5 times), CEBIT, CTIA, Web 2.0,
CNN, BBC, Oxford Uni, Uni St Gallen, European
Parliament. @feynlabs – teaching kids Computer
Science. Adivsory – Connected Liverpool
1
Copyright : Futuretext Ltd. London
Ajit Jaokar
-
Machine Learning for IoT and Telecoms
futuretext applies machine learning techniques to complex problems in the
IoT (Internet of Things) and Telecoms domains.
We aim to provide a distinct competitive advantage to our customers through
application of machine learning techniques
Philosophy:
Think of NEST. NEST has no interface. It’s
interface is based on ‘machine learning’ i.e. it
learns and becomes better with use. This will be
common with ALL products and will determine
the competitive advantage of companies. Its a
winner takes all game! Every product will have a
‘self learning’ interface/component and the
product which learns best will win!
2
Copyright : Futuretext Ltd. London
Ajit Jaokar
-
•
IoT
•
Machine Learning
•
IoT and Machine Learning
•
Case studies and applications
3
Copyright : Futuretext Ltd. London
Ajit Jaokar
-
www.futuretext.com
@AjitJaokar
[email protected]
4
Copyright : Futuretext Ltd. London
Image source: Guardian
Image source: Guardian
5
Copyright : Futuretext Ltd. London
6
Copyright : Futuretext Ltd. London
Ajit Jaokar
IOT - THE INDUSTRY- STATE OF PLAY
7
Copyright : Futuretext Ltd. London
Ajit Jaokar
State of play - 2014
• Our industry is exciting – but mature - Now a two horse race for
devices with Samsung around 70% of Android
• Spectrum allocations and ‘G’ cycles are predictable - 5G around 2020
• 50 billion connected devices by 2020
• ITU world Radio communications Conference, November 2015.
• IOT has taken off .. not because of EU and Corp efforts – but because of
Mobile, kickstarter, health apps and iBeacon and ofcourse NEST(acquired
by Google)
8
Copyright : Futuretext Ltd. London
Ajit Jaokar
Stage One: Early innovation 1999 - 2007
Regulatory innovation – net neutrality - Device innovation (Nokia
7110 and Ericsson t68i) - Operator innovation (pricing, bundling,
Enterprise) - Connectivity innovation (SMS, BBM)
Content innovation (ringtones, games, EMS, MMS) - Ecosystem
innovation (iPhone)
Stage two: Ecosystem innovation - iPhone and
Android (2007 – 2010)
Social innovation - Platform innovation - Community
innovation - Long tail innovation - Application
innovation
9
Copyright : Futuretext Ltd. London
Ajit Jaokar
Phase three: Market consolidation – 2010 - 2013
And then there were two ...
Platform innovation and consolidation
Security innovation
App innovation
Phase four – three dimensions – 2014 ..
Horizontal apps (iPhone and Android)
Vertical (across the stack) – hardware, security, Data
Network – 5G and pricing
10
Copyright : Futuretext Ltd. London
Ajit Jaokar
Many of the consumer IOT cases will happen with iBeacon in the next
two years
11
Copyright : Futuretext Ltd. London
Ajit Jaokar
And 5G will provide the WAN connectivity 5G - Source – Ericsson
12
Copyright : Futuretext Ltd. London
Ajit Jaokar
Samsung Gear Fit named “Best Mobile Device” of Mobile World
Congress
Notification or Quantification? – Displays (LED, e-paper,
Mirasol, OLED and LCD) - Touchscreen or hardware controls? Battery life and charging
13
Copyright : Futuretext Ltd. London
Ajit Jaokar
Hotspot 2.0
14
Copyright : Futuretext Ltd. London
Ajit Jaokar
Three parallel ecosystems
IOT is connecting things to the Internet – which is not the same as
connecting things to the cellular network!
The difference is money .. and customers realise it
IOT local/personal (iBeacon, Kickstarter, Health apps)
M2M – Machine to Machine
IOT – pervasive(5G, Hotspot 2.0)
Perspectives
• 2014 – 2015(radio conf) – 2020(5G, 2020)
• 2014 – iBeacon (motivate retailers to open WiFi)
• Hotspot 2.0 – connect cellular and wifi worlds
• Default wifi and local world?
• Operator world – (Big)Data, Corporate, pervasive apps – really happen
beyond 2020
• So 5G will be timed well. The ecosystems will develop and they will be
connected by 5G
15
Copyright : Futuretext Ltd. London
Ajit Jaokar
IOT – INTERNET OF THINGS
16
Copyright : Futuretext Ltd. London
As the term Internet of Things implies (IOT) – IOT is about Smart
objects
For an object (say a chair) to be ‘smart’ it must have three things
- An Identity (to be uniquely identifiable – via iPv6)
- A communication mechanism(i.e. a radio) and
- A set of sensors / actuators
For example –
the chair may have a pressure sensor indicating that it is occupied
Now, if it is able to know who is sitting – it could co-relate more data by
connecting to the person’s profile
If it is in a cafe, whole new data sets can be co-related (about the venue,
about who else is there etc)
Thus, IOT is all about Data ..
IoT != M2M (M2M is a subset of IoT)
17
Copyright : Futuretext Ltd. London
Sensors lead to a LOT of Data (relative to mobile) .. (source David
wood blog)
By 2020, we are expected to have 50 billion connected devices
To put in context:
The first commercial citywide cellular network was launched in Japan by NTT
in 1979
The milestone of 1 billion mobile phone connections was reached in 2002
The 2 billion mobile phone connections milestone was reached in 2005
The 3 billion mobile phone connections milestone was reached in 2007
The 4 billion mobile phone connections milestone was reached in February
2009.
Gartner: IoT will unearth more than $1.9 trillion in revenue before 2020; Cisco thinks
there will be upwards of 50 billion connected devices by the same date; IDC estimates
technology and services revenue will grow worldwide to $7.3 trillion by 2017 (up
from $4.8 trillion in 2012).
18
Copyright : Futuretext Ltd. London
So, 50 billion by 2020 is a large number
Smart cities can be seen as an application domain of IOT
In 2008, for the first time in history, more than half of the world’s
population will be living in towns and cities.
By 2030 this number will swell to almost 5 billion, with urban growth
concentrated in Africa and Asia with many mega-cities(10 million +
inhabitants).
By 2050, 70% of humanity will live in cities.
That’s a profound change and will lead to a different management approach
than what is possible today
Also, economic wealth of a nation could be seen as – Energy +
Entrepreneurship + Connectivity (sensor level + network level +
application level)
Hence, if IOT is seen as a part of a network, then it is a core component of
GDP.
19
Copyright : Futuretext Ltd. London
Ajit Jaokar
Machine Learning
20
Copyright : Futuretext Ltd. London
What is Machine Learning?
Mitchell's Machine Learning
Tom Mitchell in his book Machine Learning “The field of machine learning is c
oncerned with the question of how to construct computer
programs that automatically improve with experience.”
formally:
“A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.”
Think of it as a design tool where we need to understand:
What data to collect for the experience (E)
What decisions the software needs to make (T) and
How we will evaluate its results (P).
A programmers perspective:
Machine Learning involves:
a) Training of a model from data
b) Predicts/ Extrapolates a decision
c) Against a performance measure.
21
Copyright : Futuretext Ltd. London
What Problems Can Machine Learning Address? (source Jason
Brownlee)
● Spam Detection:
● Credit Card Fraud Detection
• Digit Recognition:
● Speech Understanding:
● Face Detection:
• Product Recommendation:
● Medical Diagnosis:
● Stock Trading:
• Customer Segmentation
• Shape Detection
.
22
Copyright : Futuretext Ltd. London
Types of Problems
•Classification: Data is labelled meaning it is assigned a class, for example
spam/nonspam or fraud/nonfraud. The decision being modelled is to
assign labels to new unlabelled pieces of data. This can be thought of as a
discrimination problem, modelling the differences or similarities between groups.
•Regression: Data is labelled with a real value rather
than a label. Examples that are easy to understand are time series data like the price of
a stock over time. The decision being modelled is the relationships between
inputs and outputs.
Clustering: Data is not labelled, but can be divided into groups based on
similarity and other measures of natural structure in the data.
An example from the above list would be organising pictures by faces without
names, where the human user has to assign names to groups, like iPhoto on the Mac.
●Rule Extraction: Data is used as the basis for the extraction of
propositional rules (antecedent/consequent or ifthen).
Often necessary to work backwards from a Problem to the algorithm and then work with
Data. Hence, you need a depth of domain experience and also algorithm experience
23
Copyright : Futuretext Ltd. London
What Algorithms Does Machine Learning Provide?
Regression
Instance-based Methods
Decision Tree Learning
Bayesian
Kernel Methods
Clustering methods
Association Rule Learning
Artificial Neural Networks
Deep Learning
Dimensionality Reduction
Ensemble Methods
24
Copyright : Futuretext Ltd. London
An Algorithmic Perspective
Marsland adopts the Mitchell definition of Machine Learning in his book Mach
ine Learning: An Algorithmic Perspective.
“One of the most interesting features of machine learning is that it lies on the
boundary of several different academic disciplines, principally computer
science, statistics, mathematics, and engineering (multidisciplinary).
…machine learning is usually studied as part of artificial intelligence, which
puts it firmly into computer science …understanding why these algorithms
work requires a certain amount
of statistical and mathematical sophistication that is often missing from
computer science undergraduates.”
25
Copyright : Futuretext Ltd. London
Definition of Machine Learning
A onesentence definition is:
“Machine Learning is the training of a model from data that generalizes a
decision against a performance measure.”
1) Training a model suggests training examples.
2) A model suggests state acquired through experience.
3) Generalizes a decision suggests the capability to
make a decision based on inputs and anticipating unseen inputs in the future
for which a decision will be required.
4)Finally, against a performance measure suggests a targeted need and
directed quality to the model being prepared.
26
Copyright : Futuretext Ltd. London
Key concepts
Data
Instance: A single row of data is called an instance. It is an observation
from the domain.
Feature: A single column of data is called a feature. It is an component of an observation an
d is also called an attribute of a data instance. Some features may be inputs to a model (the
predictors) and others may be outputs or the features to be predicted.
Data Type: Features have a data type. They may be real or integer valued
or may have a categorical or ordinal value. You can have strings, dates,
times, and more complex types, but typically they are reduced to real or categorical values w
hen working with traditional machine learning methods.
Datasets: A collection of instances is a dataset and when working with
machine learning methods we typically need a few datasets for different purposes.
Training Dataset: A dataset that we feed into our machine learning
algorithm to train our model. Testing Dataset: A dataset that we use to validate the accurac
y of our model but is not used to train the model. It may be called the validation dataset.
27
Copyright : Futuretext Ltd. London
Learning
Machine learning is indeed about automated learning with algorithms.
In this section we will consider a few highlevel concepts about learning.
Induction: Machine learning algorithms learn through a process called
induction or inductive learning. Induction is a reasoning process that makes
generalizations (a model) from specific information (training data).
Generalization: Generalization is required because the model that is
prepared by a machine learning algorithm needs to make predictions or
decisions based on specific data instances that were not seen during training.
OverLearning: When a model learns the training data too closely and does not generalize, this i
s called overlearning.result is poor performance on data other than the training dataset. This is also call
ed overfitting.
UnderLearning: When a model has not learned enough structure from the database because the
learning process was terminated early, this is called under28
Ltd. London
learning.The result is good generalization but poor performance on allCopyright
data,: Futuretext
including
the tra
Online Learning: Online learning is when a method is updated with data
instances from the domain as they become available.
Online learning requires methods that are robust to noisy data but can
produce models that are more in tune with the current state of the domain
Offline Learning: Offline learning is when a method is created on
preprepared data and is then used operationally on unobserved data.
The training process can be controlled and can tuned carefully because the
scope of the training data is known.
The model is not updated after it has been prepared and performance may de
crease if the domain changes.
Supervised Learning: This is a learning process for generalizing on
problems where a prediction is required.A "teaching process" compares predictions by t
he model to known answersand makes corrections in the model.
Unsupervised Learning: This is a learning process for generalizing the
structure in the data where no prediction is required. Natural structures are
identified and exploited for relating instances to each other.
29
Copyright : Futuretext Ltd. London
Technique
Classification
Applicability
Algorithms
Most commonly used
technique for predicting a
specific outcome such as
response / no-response, high /
medium / low-value
customer, likely to buy / not
buy.
Logistic Regression —classic
statistical technique but now
available inside the Oracle
Database and supports text
and transactional data
Naive Bayes —Fast, simple,
commonly applicable
Support Vector Machine—
Next generation, supports text
and wide data
Decision Tree —Popular,
provides human-readable
rules
Source: Oracle
30
Copyright : Futuretext Ltd. London
Regression
Technique for predicting
a continuous numerical
outcome such as customer
lifetime value, house
value, process yield rates.
Multiple Regression —
classic statistical
technique but now
available inside the
Oracle Database and
supports text and
transactional data
Support Vector Machine
—Next generation,
supports text and wide
data
Attribute Importance
Ranks attributes
according to strength of
relationship with target
attribute. Use cases
include finding factors
most associated with
customers who respond to
an offer, factors most
associated with healthy
patients.
Minimum Description
Length—Considers each
attribute as a simple
predictive model of the
target class
Source: Oracle
31
Copyright : Futuretext Ltd. London
Anomaly Detection
Identifies unusual or
suspicious cases based on
deviation from the norm.
Common examples include
health care fraud, expense
report fraud, and tax
compliance.
One-Class Support Vector
Machine —Trains on
"normal" cases to flag
unusual cases
Clustering
Useful for exploring data and
finding natural groupings.
Members of a cluster are
more like each other than
they are like members of a
different cluster. Common
examples include finding
new customer segments, and
life sciences discovery.
Enhanced K-Means—
Supports text mining,
hierarchical clustering,
distance based
Orthogonal Partitioning
Clustering—Hierarchical
clustering, density based
Expectation Maximization—
Clustering technique that
performs well in mixed data
(dense and sparse) data
mining problems.
Source: Oracle
32
Copyright : Futuretext Ltd. London
Association
Finds rules associated with Apriori—Industry standard
frequently co-occuring
for market basket analysis
items, used for market
basket analysis, cross-sell,
root cause analysis. Useful
for product bundling, instore placement, and defect
analysis.
Feature Selection and Extraction
Produces new attributes as
linear combination of
existing attributes.
Applicable for text data,
latent semantic analysis,
data compression, data
decomposition and
projection, and pattern
recognition.
Non-negative Matrix
Factorization—Next
generation, maps the
original data into the new
set of attributes
Principal Components
Analysis (PCA)—creates
new fewer composite
attributes that respresent
all the attributes.
Singular Vector
Decomposition—
established feature
extraction method that has
a wide range of
applications.
Source: Oracle
33
Copyright : Futuretext Ltd. London
Recap machine learning with IoT
Supervised Learning
In supervised learning, a labeled training set (i.e., predefined inputs and
known outputs) is used to build the system model. This model is used to
represent the learned relation between the input, output and system
parameters
K-nearest neighbor (k-NN): This supervised learning algorithm classifies a data
sample (called a query point) based on the labels (i.e., the output values) of the near
data samples. For example, missing readings of a sensor node can be predicted
using the average measurements of neighboring sensors within specific
diameter limits. There are several functions to determine the nearest set of nodes. A
simple method is to use the Euclidean distance between different sensors
Decision tree (DT): It is a classification method for predicting labels of data by iterating
the input data through a learning tree During this process, the feature properties are
compared relative to decision conditions to reach a specific category. For example, DT
provides a simple, but efficient method to identify link reliability in WSNs by
identifying a few critical features such as loss rate, corruption rate, mean time to
failure (MTTF) and mean time to restore (MTTR).
34
Copyright : Futuretext Ltd. London
Neural networks (NNs): This learning algorithm could be constructed by
cascading chains of decision units (e.g., perceptrons or radial basis
functions) used to recognize non-linear and complex functions . In WSNs,
using neural networks in distributed manners is still not so pervasive due to
the high computational requirements for learning the network weights, as
well as the high management overhead. However, in centralized solutions,
neural networks can learn multiple outputs and decision boundaries at once
which makes them suitable for solving several network challenges using the
same model.
Support vector machines (SVMs): It is a machine learning algorithm that learns to
classify data points using labeled training samples . For example, one approach for
detecting malicious behavior of a node is by using SVM to investigate temporal and
spatial correlations of data. To illustrate, given WSN's observations as points in the
feature space, SVM divides the space into parts. These parts are separated by as wide
as possible margins (i.e., separation gaps), and new reading will be classified based on
which side of the gaps they fall on as shown
35
Copyright : Futuretext Ltd. London
Bayesian statistics: Unlike most machine learning algorithms, Bayesian
inference requires a relatively small number of training samples One
application of Bayesian inference in WSNs is assessing event consistency
(θ) using incomplete data sets (D) by investigating prior knowledge about
the environment.
Unsupervised Learning
Unsupervised learners are not provided with labels (i.e., there is no output vector).
Basically, the goal of an unsupervised learning algorithm is to classify the sample set
into different groups by investigating the similarity between them. this theme of learning
algorithms is widely used in node clustering and data aggregation problems
36
Copyright : Futuretext Ltd. London
K-means clustering: The k-means algorithm is used to recognize data into
different classes (known as clusters). This unsupervised learning algorithm
is widely used in sensor node clustering problem due to its linear complexity
and simple implementation. The k-means steps to resolve such node
clustering problem are (a) randomly choose k nodes to be the initial
centroids for different clusters; (b) label each node with the closest centroid
using a distance function; (c) re-compute the centroids using the current
node memberships and (d) stop if the convergence condition is valid (e.g., a
predefined threshold for the sum of distances between nodes and their
perspective centroids), otherwise go back to step (b).
37
Copyright : Futuretext Ltd. London
Principal component analysis (PCA): It is a multivariate method for data
compression and dimensionality reduction that aims to extract important
information from data and present it as a set of new orthogonal
variables called principal components . For example, PCA reduces the
amount of transmitted data among sensor nodes by finding a small set of
uncorrelated linear combinations of original readings. Furthermore, the PCA
method simplifies the problem solving by considering only few conditions in
very large variable problems (i.e., tuning big data into tiny data
representation)
Reinforcement Learning : Reinforcement learning enables an agent (e.g., a sensor
node) to learn by interacting with its environment. The agent will learn to take the best
actions that maximize its long-term rewards by using its own experience. The most
well-known reinforcement learning technique is Q-learning. an agent regularly
updates its achieved rewards based on the taken action at a given state.
38
Copyright : Futuretext Ltd. London
Ajit Jaokar
IoT and Machine Learning
39
Copyright : Futuretext Ltd. London
Basic idea of machine learning is to build a mathematical model based on
training data(learning stage) – predict results for new data(prediction
stage) and tweak the model based on new conditions
What type of model? Predicitive, Classification, Clustering, Decision
Oriented, Associative
IoT and Machine Learning

On one hand - IoT creates a lot of contextual data which complements existing
processes

On the other hand – the Sheer scale of IoT calls for unique solutions
Types of problems:
•
Apply existing Machine Learning algorithms to IoT data
•
Use IoT data to complement existing processes
•
Use the scale of IoT data to gain new insights
•
Consider some unique characteristics of IoT data (ex streaming)
40
Copyright : Futuretext Ltd. London
IoT : from traditional computing to ..
Gone from making Smart things smarter(traditional computing) to
a) Making dumb things smarter .. and
b) living things more robust
3 Domains:
Consumer, Enterprise, Public infrastructure
1) Consumer – bio sensors(real time tracking), Quantified self – focussing on
benefits
2) Enterprise – Complex machinery (preventative maintenance), asset efficiency –
reducing assets, increasing efficiency of existing assets. More from transactions to
relationships(real time context awareness).
3) Public infrastructure(Dynamically adjust traffic lights). Dis-economies of
scale(bad things also scale in cities) – Thanks John Hagel III
41
Copyright : Futuretext Ltd. London
Three key areas:
a) Move from exception handling to patterns of exceptions over time.(are
some exceptions occurring repeatedly? Do I need to redsign my product, Is that a
new product?) –
b) Move from optimization to disruption – ownership to rental ship (Where are all
these dynamic assets?)
c) Move to self learning: Robotics: From assembly line to self learning
robots(Boston Dynamics), autonomous helicopters
Four examples of differences:
Sensor fusion - Deep Learning - Real time - Streaming
42
Copyright : Futuretext Ltd. London
Sensor fusion

Sensor fusion is the combining of sensory data or data derived from
sensory data from disparate sources such that the resulting information
is in some sense better than would be possible when these sources were
used individually. The data sources for a fusion process are not specified
to originate from identical sensors. Sensor fusion is a term that covers a
number of methods and algorithms, including: Central Limit Theorem,
Kalman filter, Bayesian networks, Dempster-Shafer
Example: http://www.camgian.com/ http://www.egburt.com/
43
Copyright : Futuretext Ltd. London
Deep learning

Google's acquisition of DeepMind Technologies

In 2011, Stanford computer science professor Andrew Ng founded
Google’s Google Brain project, which created a neural network trained
with deep learning algorithms, which famously proved capable
of recognizing high level concepts, such as cats, after watching just
YouTube videos--and without ever having been told what a “cat” is.

A smart-object recognition algorithm that doesn’t need humans
http://www.kurzweilai.net/a-smart-object-recognition-algorithm-that-doesnt-needhumans A feature construction method for general object recognition (Kirt Lillywhite,
Dah-JyeLee n, BeauTippetts, JamesArchibald)
44
Copyright : Futuretext Ltd. London
Real time:
Beyond ‘Hadoop’ (non hadoopable) the BDAS stack
BDAS Berkeley data analytics stack
Spark – an open source, in-memory, cluster computing framework.
Integrated with Hadoop(can work with files stored in HDFS)
Written in Scala
45
Copyright : Futuretext Ltd. London
Real time (Stream processing)
46
Copyright : Futuretext Ltd. London
47
Copyright : Futuretext Ltd. London
Spark – an open source, in-memory, cluster computing framework.
Integrated with Hadoop(can work with files stored in HDFS)
Written in Scala
Spark comes with tools: Interactive query analysis (Shark), Graph
processing and analysis (Bagel) and Real-time analysis (Spark Streaming).
RDDs(Resilient Distributed Data sets): are the fundamental data
objects used in Spark..RDDs are distributed objects that can be cached
in-memory, across a cluster of compute nodes.
Scales to 100s of nodes. Can
achieve second scale latencies
48
Copyright : Futuretext Ltd. London
Source: Tathagata Das (TD) UC Berkeley
49
Copyright : Futuretext Ltd. London
50
Copyright : Futuretext Ltd. London
51
Copyright : Futuretext Ltd. London
Survey paper: Machine Learning in Wireless Sensor Networks: Algorithms, Strategies,
and Applications Mohammad Abu Alsheikh et al School of Computer Engineering,
Nanyang Technological University, Singapore 639798
Event detection and Query processing
Monitoring can be classified as: event-driven, continuous, or query-driven
Fundamentally, machine learning offers solutions to restrict query areas
and assess event validity for efficient event detection and query
processing mechanisms.
Advantages:
•
•
•
•
Optimize limited resources ex storage and processing
Assess accuracy using simple classifiers.
Narrow down the search region (avoid flooding the network)
More than a threshold detection (simplest case)
Event recognition through Bayesian algorithms: Krishnamachari and Iyengar
Use of WSNs for detecting environmental phenomenon in a distributed manner.
Readings will be considered as faulty if their values exceed a specific threshold.
This study employs decentralized Bayesian learning that detects up to 95 percent
of the faults, and will result in recognizing the event region.
52
Copyright : Futuretext Ltd. London
Zappi et al
A real-time approach for activity recognition using WSNs that
accurately detects body gesture and motion.
• Initially, the nodes, that are spread throughout the body, detect the
organ motion using an accelerometer sensor with three axis measurements
(positive, negative and null), where these measurements are used by a
hidden Markov model (HMM) to predict the activity at each sensor.
• Sensor
activation
and
selection
rely
on
the
sensor's
potential
contributions in classifier accuracy (i.e., select the sensors that provide the
most informative description of the gesture).
• To generate a final gesture decision, a naive Bayes classifier is used to
combine the independent node predictions so as to maximize the posterior
probability of the Bayes theorem.
53
Copyright : Futuretext Ltd. London
54
Copyright : Futuretext Ltd. London
Forest fire detection through neural network:
WSNs were actively used in fire detection and rescue systems
Yu et al. presented a real-time forest fire detection scheme based on a
neural network method .
Data processing will be distributed to cluster heads, and only important
information will be aggregated to a final decision maker.
Although the idea is creative and beneficial to the environment, the
classification task and system core are hardly interpretable when
introducing such systems to decision makers.
55
Copyright : Futuretext Ltd. London
Query processing through k-nearest neighbors:
K-nearest neighbor query is considered as a highly effective query
processing technique in WSNs. Winter et al. developed an in-network query
processing solution using the k-nearest neighbor algorithm, namely the “K-NN
Boundary Tr ee” (KBT) algorithm. Each node that is aware of its location will determine
its k-NN search region whenever a query is received from the application manager.
Jayaraman et al. extended the query processing design. “3D-KNN” is a
query processing scheme for WSNs that adopts the k-nearest neighbor
algorithm. This approach restricts the query region to bound at least knearest nodes deployed within a 3D space. In addition, signal-to-noise ratio
(SNR) and distance measurements are used to refine the k-nearest
neighbor.
The primary concerns of such k-NN-based algorithms for query processing
are the requirement of large memory footprint to store every collected
sample and the high processing delay in large scale sensor networks.
56
Copyright : Futuretext Ltd. London
Distributed event detection for disaster management using decision
tree: Bahrepour et al. developed decision tree-based event detection and
recognition for sensor network disaster prevention systems. The main
application of this decentralized mechanism is the fire detection in
residential areas. The final event detection decision is made by using a
simple vote from the highest reputation nodes.
57
Copyright : Futuretext Ltd. London
Query optimization using principal component analysis (PCA):
Malik et al. optimized traditional query processing in WSNs using data
attributes and PCA, thus reducing the overhead of such a process. PCA has
been used to dynamically detect important attributes (i.e., dominant
principal components) among the whole correlated data set.
proposed algorithm in four fundamental steps.
• SQL request, which contains the human intelligible attributes, is sent to
the database management and optimization system. Here, the original
query is optimized where the high-variance components are
extracted from historical data using the PCA algorithm
• The optimized query is diffused to the wireless sensor network to extract
the sensory data. Later, the original attributes (i.e., human intelligible
attributes) can be extracted from the optimized attributes by reversing
the process of PCA.
58
Copyright : Futuretext Ltd. London
Supposedly, the algorithm guarantees 25 percent improvement in energy
saving of the network nodes while achieving 93 percent of accuracy rates.
However, this enhancement is at the cost of accuracy of the collected data
(as some of the data components will be ignored). Therefore, this solution
may not be ideal for the applications with high accuracy and precision
requirements.
59
Copyright : Futuretext Ltd. London
Localization and Objects Targeting
Localization is the process of determining the geographic coordinates of
network's nodes and components.
Position awareness of sensor nodes is an important capability, since most
sensor network operations are typically based on the location
In most large scale systems, it is financially infeasible to use global
positioning system (GPS) hardware in each node for this purpose. Moreover,
GPS service may not be available in the observed environment (e.g.,
indoor). Relative location measurement is sufficient for certain uses.
However, by using the absolute locations for a small group of nodes, relative
locations can be transformed into absolute ones
60
Copyright : Futuretext Ltd. London
Sensor nodes may encounter changes in their location after deployment
(e.g., due to movement). be summarized as follows:
The benefits of using machine learning algorithms in sensor node
localization process are
•
Converting the relative locations of nodes to absolute ones using
few anchor points. This will eliminate the need for range measurement
hardware to obtain distance estimations.
•
In surveillance and object targeting systems, machine learning can be
used to divide the monitored sites into a number of clusters,
where each cluster represents specific location indicator.
61
Copyright : Futuretext Ltd. London
Bayesian node localization: Morelande et al.
[21] used a Bayesian
algorithm to develop a localization scheme for WSNs using only few
anchor points. This study focuses on the enhancement of progressive
correction, which is a method for predicting samples from likelihoods to
get closer to the posterior likelihood. The idea of using the Bayesian
algorithm for localization is appealing as it can handle incomplete data
sets by investigating prior knowledge and probabilities.
62
Copyright : Futuretext Ltd. London
Robust location-aware activity recognition: Lu and Fu addressed
the problem of sensor and activity localization in smart homes without
direct sensing.
The activities of interest include using the phone, listening to the music,
using the refrigerator, studying, etc.
The proposed framework, named “Ambient Intelligence Compliant
Object” (AICO), uses multiple naive Bayes classifiers to determine
the resident' s current location and evaluate the reliability of the
system by detecting any malfunctioned sensors.
The designers must predefine a set of supported activities in advance.
There are also unsupervised machine learning algorithms for automatic
feature extraction such as the deep learning methods and the
non-negative matrix factorization algorithm where activities are
determined without prior training
63
Copyright : Futuretext Ltd. London
Localization based on neural network: Shareef et al
compared three localization schemes that are based on different types of
neural networks. In particular, this study considers WSN localization
using multi-layer perceptron (MLP), radial basis function (RBF),
and recurrent neural networks (RNN). In summary, the RBF neural
network results in the minimum error at the cost of high resource
requirements. In contrast, MLP consumes the minimum computational
and memory resources.
64
Copyright : Futuretext Ltd. London
Security and Anomaly Intrusion Detection
Save node's energy and significantly expand WSN lifetime by preventing
the transmission of the outlier, misleading data.
Enhance network reliability by eliminating faulty and malicious readings. In
the same way, avoiding the discovery of unexpected knowledge that will be
converted to important, and often critical actions.
Online learning and prevention (without human intervention) of
malicious attacks and vulnerabilities.
Sensor measurement
(e.g., temperature,
pressure ...etc.)
Anomalies readings
Expected readings
SENSOR’S LOCATION
indicator
65
Copyright : Futuretext Ltd. London
Outlier detection using Bayesian belief network: Janakiram et al.
used Bayesian belief networks (BBNs) to develop an outlier detection
scheme. Given that the majority of node's neighbors will have similar
readings (i.e., temporal and spatial correlations), it is reasonable to use
this phenomenon to build conditional dependencies among nodes'
readings. BBNs infer the conditional relationships among the
observations to discover any potential outliers in the collected data.
Furthermore, this method can be used to evaluate missing values.
Outlier detection using k-nearest neighbors: Branch et al.
developed an in-network outlier detection method in WSNs using knearest neighbors. Moreover, any missing nodes' readings will be
replaced by the average value of the k-nearest nodes. However, such
non-parametric, k-NN-based algorithm requires large memory to store
every collected readings from the monitored environment.
66
Copyright : Futuretext Ltd. London
Quality of Service, Data Integrity and Fault Detection
advantages:
•
Different
machine learning
classifiers
are
used
to
recognize
different types of streams, thus eliminating the need for flow-aware
management techniques.
•
The requirements for QoS guarantee, data integrity and fault detection
depend on the network service and application. Machine learning
methods are able to handle much of this while ensuring efficient
resource utilization, mainly bandwidth and power utilization.
67
Copyright : Futuretext Ltd. London
QoS estimation using neural network:
Snow et al. introduced a method to estimate a sensor network
dependability metric using a neural network method. Dependability is a
metric that represents availability, reliability, maintainability, and
survivability of a sensor network. Several attributes are used to estimate
such a metric including mean time between failure (MTBF) and mean time
to repair (MTTR).
Moustapha and Selmic introduced a dynamic fault detection model for
WSNs. This model captures the nodes' dynamic behavior and their effects
on other nodes. In addition, neural network learning, which is trained
using back-propagation method, was used for node identification and
fault detection. This study results in an effective nonlinear sensor model
that suits applications with fault detection requirements.
68
Copyright : Futuretext Ltd. London
Air quality monitoring using neural networks: Postolache et al.
proposed a neural networks-based method for measuring air pollution levels
using inexpensive gas sensor nodes, while eliminating the effects of
temperature and humidity on sensor readings. This solution detects the air
quality and gas concentration using neural networks implemented using
JavaScript (JS). As a result, the solution is able to distribute processing
between web server and end user computers (i.e., a combination of client
and server side scripts).
Intelligent lighting control using neural networks: Gao et al. introduced a new
standard for lighting control in smart building using the neural network algorithm. A
radial basis function (RBF) neural network is used to extract a new mathematical
expression, called “Illuminance Matrix” (I- matrix), to measure the degree of
illuminance in the lighted area. Fundamentally, in the field of lighting control,
converting the collected data from the photosensors to a form that is suitable for digital
signal processing is a crucial issue and can highly affect the performance of the
developed system. The article shows that using the I-matrix scheme can achieve about
60% more accuracy compared to the standard methods.
69
Copyright : Futuretext Ltd. London
Ajit Jaokar
-
www.futuretext.com
@AjitJaokar
[email protected]
70
Copyright : Futuretext Ltd. London

IoT and Machine Learning

Transcript IoT and Machine Learning

Directory