Slides (ppt file) - Sensys

Download Report

Transcript Slides (ppt file) - Sensys

Multi-Terminal Information Theory Problems in
Sensor Networks
Gregory J Pottie
Professor, Electrical Engineering Department
Associate Dean, Research and Physical Resources
Deputy Director, NSF Center for Embedded Networked Sensing (CENS)
UCLA Henry Samueli School of Engineering and Applied Science
[email protected]
G. Pottie, Sensys, November 7, 2003
Outline
•
•
•
•
Context and general issues
Basic tools of information theory
Multi-terminal information theory
Research domains
–
–
–
–
–
Data fusion
Cooperative communication
Sensor network scalability
Network synchronization
Distributed large-scale systems
G. Pottie, Sensys, November 7, 2003
Sensor Network Operation
Cooperative communication
Data fusion
Routing
Basic goal: detection/identification of point or
distributed sources subject to distortion constraints,
and timely notification of end user
G. Pottie, Sensys, November 7, 2003
Basic Information Theoretic Concepts
• Typical Sets (of sufficiently long sequences of i.i.d. variables):
• Has probability nearly 1
• The elements are equally probable
• The number of elements is nearly 2nH
Xn
W
source
channel
encoder
channel
p(y|x)
Yn
• Aim of communications system:
• Minimize errors due to noise in channel
decoder

• Maximize data rate
• Minimize bandwidth and power (the resources)
• Shannon Capacity establishes the fundamental limits
G. Pottie, Sensys, November 7, 2003
Wˆ
Jointly Typical Sequences
Xn
Yn
X1n
X2n
Output set in general larger due to additive noise;
Output images of inputs may overlap due to noise
G. Pottie, Sensys, November 7, 2003
Basic Information Theoretic Concepts
Xn
W
source
channel
encoder
channel
p(y|x)
Yn
decoder
Wˆ
• Capacity C is the max mutual information I(X;Y) wrt
p(x); that is, choose
the set X leading to largest mutual information.
• Capacity C is the largest rate at which information can be transmitted
without error
• Jointly typical set: from among the typical input and output sequences,
choose the ones for which 1/n log p(xn,yn) close to H(X,Y)
• Size of jointly typical set is about 2nI(X,Y), thus there are about this number
of distinguishable signals (codewords) in Xn
• These codewords necessarily contain redundancy--size of set is smaller
than the alphabet would imply; sequences provide better performance
than isolated symbols if properly chosen.
G. Pottie, Sensys, November 7, 2003
Gaussian Channel Capacity
• Discrete inputs to channel, and channel adds noise with Gaussian
distribution (zero mean, variance N)
• Input sequence (codeword) power set to P
• Capacity is maximum I(X;Y) over p(x) such that EX2 satisfies power
constraint
• C = 1/2 log(1+P/N) bits per transmission.
• The more usual form is to consider a channel of bandwidth W and noise
power spectral density No. Then C = W log(1+P/NoW) bits per second.
G. Pottie, Sensys, November 7, 2003
Rate Distortion: Lossy Source Coding
• Rate distortion function R(D) can be interpreted as
• The minimum rate at which a source can be represented subject to
a distortion D=d(X,Y)
• The minimum distortion that can be achieved given a maximum rate
constraint R
• Interesting dual results to Capacity
• Spend coding effort on distortion-typical set; rest are don’t cares
• Applies to compression of real-valued sequences
R
Achievable region
D
G. Pottie, Sensys, November 7, 2003
Universal Source Coding
• Divide sequence into
distortion-typical (interesting)
and distortion-atypical
(uninteresting) sets
Distortion typical
set
• Index for distortion typical set
of small length--consumes our
coding effort; atypical set is
large, but coding scheme not
critical
Atypical set
• Require systematic means of
classifying sequences as
typical (promotion mechanism
and distance measure)
• Gold washing algorithm:
typical set, plus candidates
G. Pottie, Sensys, November 7, 2003
Source/Channel Coding Separation
• For single link, separately performing source and channel coding
achieves optimal rates
• Separate optimization greatly reduces theoretical complexity
• Classes of codes have been identified that get very close to
respective Shannon limits
• Joint source/channel coding can reduce latency or overall complexity, but
infrequently used since application-specific
G. Pottie, Sensys, November 7, 2003
Multi-Terminal Information Theory
• The preceding discussion assumed a single transmitter and receiver
• Multi-terminal information theory considers maximization of mutual
information for the following possibilities:
• Multiple senders and one receiver (the multiple access channel)
• One sender and multiple receivers (the broadcast channel)
• One sender and one receiver, but intervening transducers that can
assist (the relay channel)
• Composite combinations of these basic types
• Bayes estimation also aims to maximize mutual information, except the
senders do not cooperate and usually there is a fidelity constraint:
• One sender and multiple receivers (the data fusion problem)
• Multiple senders and receivers (the source separation problem)
• Delay and resource usage may also be included
G. Pottie, Sensys, November 7, 2003
Gaussian Multiple Access Channel
• m transmitters with power P sharing the same noisy channel
•
C(P/N)=1/2 log(1+P/N) bits per channel use for isolated sender
• then the achievable rate region is
Ri  C(P /N)
Ri  R j  C(2P /N)
m
 R  C(mP /N)
i
i1
• The last inequality dominates when rates are the same
• Capacity increases with more users (there is more power)
• Result is dual to Slepian-Wolf encoding of correlated sources
G. Pottie, Sensys, November 7, 2003
Gaussian Broadcast Channel
• One sender of power P and two receivers, one with noise N1 and one with
noise N2, N1 < N2
R1  C(P /N1 )
R2  C((1  )P /(P  N 2 ))
where 0    1
• The two codebooks are coordinated to exploit commonality of information
transmitted, otherwise capacity does not exceed simple multiplexing
G. Pottie, Sensys, November 7, 2003

Relay Channel
• One sender, one relay, and one receiver; relay transmits X1 based only on
its observations Y1
Y1:X1
Y
X
C  sup min{ I(X, X1;Y),I(X;Y,Y1 | X1)}
p(x, x1 )
• Combines a broadcast channel and a multiple access channel
• Networks are comprised of multiple relay channels that may further
induce delay
G. Pottie, Sensys, November 7, 2003
General Multi-Terminal Networks
• m nodes, with node j with associated transmission variable X(j), and
receive variable Y(j)
• Node 1 transmits to node m; what is the maximum achievable rate?
(X1,Y1)
(Xm,Ym)
• Bounds derived from information flow across multiple cut sets
• generally not achievable
G. Pottie, Sensys, November 7, 2003
Costs of Source-Channel Separation
• Source-channel coding separation theorem fails because capacity of
multiple access channels increases with correlation, while source
encoding eliminates correlation
• Greatly complicates search for optimal codes; raises question of
whether joint coding would be worth it
• Gastpar has considered asymptotic cost of separate rate-distortion and
channel coding
• Compare:
• Network rate-distortion coding, followed by cooperative transmission
• Joint rate-distortion and channel coding
• Potentially exponentially better performance for joint source and channel
coding, in limit the number of nodes n observing a Gaussian source with
comparable SNR goes to infinity.
• Bound, not a prescription for how to do this!
G. Pottie, Sensys, November 7, 2003
Now let it move…
• Nodes move within bounded region according to some random
distribution; what is capacity subject to energy constraint on messages?
Node 1
Node m
Time 2
Time 1
• Answer depends on delay constraint; eventually they will collide implying
near-zero path loss and thus unbounded capacity
• Other questions:
• Probability the nodes have connecting path of required rate
• Probability of message arriving in required delay
G. Pottie, Sensys, November 7, 2003
Some Recent Research for Sensor
Networks
• Data fusion in sensor networks
• N-helper problem
• Cooperative communications in sensor networks
• Scalability of sensor networks
• Sensing for distributed sources
• Network synchronization and rate distortion
• Systems design
G. Pottie, Sensys, November 7, 2003
General Assumptions
• Objective of network is to solve some (multi-) hypothesis problem, subject
to a set of fidelity criteria, and convey the result to some end-user, subject
to resource constraints
• Consequence: fidelity criteria and resource constraints allow
meaningful optimization questions to be posed
• Communications is more costly than signal processing
• Consequence: long distance communication is to be avoided, if
possible
• Justification: Shannon capacity and Maxwell’s equations are
fundamental; SP power cost follows Moore’s Law
• Signals decay with distance of propagation
• Consequence: local distributed algorithms become feasible
• Justification: true for all natural propagation media
G. Pottie, Sensys, November 7, 2003
Rate Distortion and Data Fusion
• Can identify resource use (energy/number of bits transmitted) with rate,
decision reliability (false alarm rate, missed detection prob) with distortion
• Operate at different points on rate distortion curve depending on values
of cost function
• Location of fusion center, numerical resolution, number of sensors,
length of records, routing, distribution of processing all affect R(D)
G. Pottie, Sensys, November 7, 2003
A Simple Algorithm
• Nodes activated to send requests for information from other nodes
based on SNR
• If above threshold T, decision is reliable, and suppress activity by
neighbors
• Otherwise, increase likelihood of requesting help based on
proximity to T
• In likelihood, higher SNR nodes form the cluster
• Bits of resolution related to SNR (e.g., for use in maximal ratio
combining)
3
2
1
3
G. Pottie, Sensys, November 7, 2003
1: high SNR; initiates
2: activated, and requests
further information
3: SNR too low to
respond
Optimal Fusion and Information Theory
• Bayes estimator maximizes the likelihood FX(x|z) where x is the state of
nature and z is the set of observables.
• Define Zr={z(1),z(2),…,z(r)}=set of observations to time r, then recursive
form of the estimator is
FX (x | Z r ) 
FZ r (z(r) | x)FX (x | Z r )
r1
FZ (z(r) | Z )
• A variety of classical estimators then maximize the likelihoods based on
particular assumptions regarding the priors
• Fusion: typically weighted combinations of likelihoods to produce
 decision; as sensors may be very different, question of optimal
weighting scheme
G. Pottie, Sensys, November 7, 2003
Likelihood Opinion Pool
Sensor 1
F(Z1r|x)
Sensor 2
F(Z2r|x)
.
.
.
P
F(x|Zr)
F(x)
Prior information
Sensor n
F(Znr|x)
The hard part: determination of the various likelihoods
G. Pottie, Sensys, November 7, 2003
Likelihood Opinion Pool
• Combine using the recursive rule:
F(x |{Z r })  F(x |{Z r1}) F(z j (r) | x)
j
• Taking logarithms on each side, followed by expectations one obtains:

 F(z j (r) | x) 


r
r1
E{ln[ F(x | Z )]}  E{ln{ F(x | Z )]}   E ln 

r1 

F(z
(r)
|
Z
)



j
j



• Which can be interpreted as posterior information=prior
information+observation information; thus can deal in summations of
mutual information obtained from different sensor types (e.g., video plus
audio).
G. Pottie, Sensys, November 7, 2003
Designing for Detection
• In digital communications, choose modulation for ease of estimation of
decision variables and subsequent selection of most likely signal
(hypothesis); we design signals for separability
• In sensor networks, have no control over nature, but we can control:
• Density and locations of sensors
• Sensor types
• These can be manipulated in same way, given a fusion strategy, to ease
signal separability or achieve Nyquist sampling of source features.
• This can also be done adaptively as we learn more about the sources
and the propagation environment (in general, reduce model uncertainty):
• Add sensors, and/or change types (e.g., new deployment)
• Move sensors
• Articulate directional elements
G. Pottie, Sensys, November 7, 2003
Networked Info-Mechanical Systems
NIMS Adaptable Infrastructure
¥3-Dim ensional Environm ent Access
¥Mechanically Reconfigurable
¥Energy Delivery
¥Netw orking
Instrum ented
Cablew ay
NIMS Low Energy
Modular Shuttle
Payload
ÒPacketÓ
Sw itching
Mass-Balanced
Vertical Payload
Transport
Articulated
NIMS Sensor
Payload
Sensing
And
Sam pling
Deploym ent
and
Harvesting
G. Pottie, Sensys, November 7, 2003
The n-helper Gaussian Scenario
X
Y1
Y2
Gateway/Fusion center
Y3
…
Yn
• Multiple sensors observe event and generate correlated
Gaussian data. One data node (X) is the main data source (e.g.
closest to phenomenon), and the n additional nodes (Y1 - Yn) are
the ‘helpers’.
• The Problem: What codes and data rates so that gateway/datafusion center can reproduce the data from the main node using
the remaining nodes as sources of partial side information,
subject to some distortion criterion.
G. Pottie, Sensys, November 7, 2003

Main Result
• We do not care about reproduction of the Y variables; rather they act as
helpers to reproduce X
• This problem was previously solved for the 2-node case
• Key to extension: treat Yk|Yk-1,..X as single new helper Pk.
• Our solution: for an admissable rate (Rx,R1,…,Rn), and for some Di’s>0,
the n-helper system data rates can be fused to yield an effective data
rate (wrt source X) satisfying the following rate distortion bound:

1  x2 n
2
2
2R k
Rx (Dx )  log   1 XPk  XPk  2 
2 Dx k1

• where 2 is the variance and  is the correlation (straightforward but
tedious to calculate as n increases).
G. Pottie, Sensys, November 7, 2003
Comments
• Other source distributions analytically difficult, but many are likely to be
convex optimizations
• Generalization would consider instances of relay/broadcast channels in
conveying information to fusion center with minimum energy
• Many sensor network detection problems are inherently local: even
though expression may be complicated, the number of helpers will
usually be small due to decay of signals as power of distance
• Numerical results for Gaussian sources indicate a small number of
helpers lead to significant improvement; rapidly diminishing returns
after four or so for typical propagation conditions.
• Suggests that source/channel coding separation might in fact be
good enough for many practical situations (especially above the
local interaction)
G. Pottie, Sensys, November 7, 2003
Problem Definition of Cooperative
Communication
• Many low-power and low-cost wireless sensors cooperate with each
other to achieve more reliable and higher rate communications
• The dominant constraint is the peak power, the bandwidth is not the
main concern
• Multiplexing (FDMA, TDMA, CDMA, OFDM) is the standard approach.
Each sensor has an unique channel
• We focus on schemes where multiple sensors occupy the same
channel
G. Pottie, Sensys, November 7, 2003
Example: Space-Time Coding
• N transmit antennas and N receive antennas
• Channel transition matrix displays independent Rayleigh (complex
Gaussian) fading in each component
• With properly designed codes, capacity is N times that of single
Rayleigh channel
• Note this implicitly assumes synchronization among Tx and Rx array
elements--requires special effort in sensor networks
• A coordinated transmission, not a multiple access situation.
G. Pottie, Sensys, November 7, 2003
Context
• Cooperative reception problem very similar to multi-node fusion
problem; same initiation procedure required to create the cluster,
however we can choose channel code.
• Cooperative transmission and reception similar to multi-target multinode fusion, but more can be done: beacons, space-time coding
• Use to overcome gaps in network, communicate with devices
outside of sensor network (e.g. UAV)
G. Pottie, Sensys, November 7, 2003
Channel Capacity
• Channel state information:
known at transmitter side, and at both sides
• If channel state information is known at the transmit side,
RF synchronization can be achieved
• Channels:
AWGN and fading channels with unequal path loss
• General formula
det As  det Ar
C  log 2
det Au
Ar  E(rr )  No  InR  G  As (G  As ) H
H
As is the transmitted signal covariance matrix
Ar is the received signal covariance matrix
G. Pottie, Sensys, November 7, 2003
1
2
1
2
Channel Capacity(cont’d)
A
det Au  det E(uu H )  det s
GAs
AsH G H
Ar
 det As  det( Ar  GAs  As 1  As G H )  det( No  InR )
• Receive diversity:
C  log 2 1  SNR  Gn1
2
,
where SNR = the ratio of transmit power to the receiver noise power
• Transmit diversity:

2
C  log 2 1   SNRi  Gi 
i


• Combined transmit-receive diversity C 
nT
 log
k  nT ( nR 1)
• RF synchronization
  nT
C  log 2 1    SNRi  Gi
  k 1
G. Pottie, Sensys, November 7, 2003
2



2



2
1  SNR   22 k
Comments
• Capacity is much higher if phase synchronization within transmitter and
receiver clusters can be achieved
• Have investigated practical methods for satellite/ground sensors
synchronization
• Beacons (e.g. GPS) can greatly simplify the synchronization problem for
ground/ground cooperative communications
• Recent network capacity results do not take into account possibilities for
cooperation by nodes as transmitter/receiver clusters
G. Pottie, Sensys, November 7, 2003
Capacity in Ad Hoc Networks
• Received signal power decays with distance, and
transmission power is limited
• Frequency re-use is possible; sophisticated
antenna/MIMO systems improve the constant
• Nodes generate traffic, and can relay traffic from
other nodes
– If did not generate traffic, then higher node density implies
greater network capability (improved re-use)
• All nodes alike
– We will also relax this later
G. Pottie, Sensys, November 7, 2003
Transport Capacity of Wireless Networks
• n Nodes within some fixed
region A, with max radio
range R, bandwidth W,
generating data.
• Source-destination pairs
random; per node
transport capacity is then
O(W / n ) bits - m/s - node
G. Pottie, Sensys, November 7, 2003
Transport Capacity of Wireless Networks II
• Note this is achieved by using simple relay strategy:
one link at a time without cooperation in transmission
or reception (Gupta-Kumar); but bad news continues
even with optimal cooperation (Gastpar-Vetterli)
• The inverse square root of n behavior can be roughly
explained by average number of links increasing in a
path of a given length, each of which must deal with
more traffic to be carried, with the same bandwidth.
G. Pottie, Sensys, November 7, 2003
Scaling in Ad Hoc Networks
• The only solution when everyone generates traffic is to
add more resources as n increases
• Traditional approach: communication hierarchy where
we add new resources at each layer
– Each level is limited in numbers
– Traffic is aggregated and carried on set of trunks of increasing
bandwidth and thus capacity
– Higher levels are longer distance, also limiting latency
G. Pottie, Sensys, November 7, 2003
Scaling in Sensor Networks
• Elements not only generate traffic but can process
data
• Do not necessarily want or need to send raw
information to distant users with same probability as
near neighbors
• Key to scalability is to change the source-destination
pair distribution to local communication (in limit, most
nodes in fact send nothing)
• Key to proof is to separately consider densities of
sources, sensors and communication relays, and pose
problem as extraction of information to within particular
fidelity (rate distortion)
G. Pottie, Sensys, November 7, 2003
Scalability for Point Sources in Sensor
Networks
• Cooperative rate distortion coding
results in most communication
being local; more nodes do not
necessarily result in more traffic
under distortion criterion
• More relays reduce frequency reuse distance and thus
interference; capacity can
increase without bound
• Thus more nodes increase
likelihood of extracting information
at desired fidelity
G. Pottie, Sensys, November 7, 2003
Comments
• Number of bits a sensor reports is a complicated function of
density
– Low density: report nothing if SNR too low
– High density: may need to report only decision
– Moderate density: many nodes may need to locally cooperate with
mix of raw data and decision likelihoods
• Far away powerful sources will activate many nodes with similar
SNR, but a small subset of nodes will be sufficient to make
decisions
– Design objective will be to minimize resources required to suppress
node activity
G. Pottie, Sensys, November 7, 2003
Scalability for Distributed Sources
• To estimate parameters of a field
(e.g., to get isotherm map)
information increases until
achieve desired spatial sampling
• After this extra nodes contribute
no additional information, but can
increase communication
resource
– Image processing analogy: specify
pixel size
• Parameters to describe local
field can be compact compared
to raw data, for given level of
distortion
G. Pottie, Sensys, November 7, 2003
Practical Implementation
• Dense network; in neighborhood have mix of nodes
with different ranges, operating in separate bands
• Locally route towards the longer range links; they act
as traffic attractors, causing number of hops at any
given layer to be small
• Cooperative communication among nodes would serve
mainly to assure reliability of paths towards next level
of hierarchy
• Result is a (largely) standard overlay hierarchical
network
• Any cross-layer optimization (e.g., joint source-channel
coding) is confined to the local neighborhood, since
this is where most of the resources are consumed in
any scalable solution.
G. Pottie, Sensys, November 7, 2003
Network Synchronization
• Synchronism is needed for wide set of purposes in
sensor networks:
– Coordination of power down/up for energy savings
– Time stamping of data
– Coherent combining in communication or sensing (cooperative
comm., fusion, position location)
• Traditional approaches assume receivers/processors
always on, and provide same precision everywhere by
locking oscillators
• Sensor networks are different
– Do not need same level of synchronism at all times and
everywhere
– Do need to save energy
G. Pottie, Sensys, November 7, 2003
Synchronization and Rate Distortion
• Clocks are not explicitly locked; rather record
differences of time scales to allow explicit conversion.
• References are passed either on a schedule or on
demand for post facto synchronization
• Frequency and precision of updates (the rate) depends
on local accuracy requirement Dtj (the time distortion)
• Would like to bound rate subject to accuracy
requirements and acceptable delays in achieving
synchronism
• Very similar issues for position localization
G. Pottie, Sensys, November 7, 2003
Implications of Signal Locality
• Severe decay of signals with distance (second to fourth power)
• Mutual information to source dominated by small set of nodes
• Cooperative communication clusters for ground to ground
transmission will likely be small
• Implications:
• Local processing is good enough for many situations; do not need
to convey raw data over long distances very frequently
• Consequently, lowest layers of processing/network formation, etc.
are the most important, since most frequently invoked (“typical”)
• Practical example:
• Specialized local transmission schemes (e.g., for forming ad hoc
clusters), but long range might use conventional methods such as
TCP/IP
G. Pottie, Sensys, November 7, 2003
Hierarchy in Sensor Networks
• For dealing with the network as a whole, number of
variations of topology are immense
• Distributed algorithms exploiting locality of events
• Use of ensembles for deriving bounds
• In between, considers layers of hierarchy, each of which
may be amenable to a conventional optimization technique
G. Pottie, Sensys, November 7, 2003
Information Processing Hierarchy
Note difficulty of fully separating networking, database and
signal processing problems
transmit decision
beamforming
human
observer
query for more base station
high resolution
information processing
cue
high power
low false alarm rate
low duty cycle
low power
high false alarm rate
high duty cycle
G. Pottie, Sensys, November 7, 2003
Some Research Challenges
• Minimal energy to obtain reliable decision in a distributed
network
• Minimal (average) delay in conveying information
through network
• Density and source separability trades
• Model uncertainty and methods for reducing its effects
• how do we know that we don’t know?
• Role of hierarchy; how much leads to what kinds of
changes in information theoretic optimal behavior
• At small scale can use brute force, at large scale can
use ensembles; what can we do in between?
• Exploitation of signal locality: what is the spatial domain
over which cross-layer optimization is useful
G. Pottie, Sensys, November 7, 2003
References
• T. Cover and J. Thomas, Elements of Information
Theory. Wiley: 1991.
• G. Pottie and W. Kaiser, “Wireless Integrated Network
Sensors,” Commun. ACM, May 2000
• M. Ahmed, Y-S. Tu, and G. Pottie, “Cooperative
detection and communication in wireless sensor
networks,” 38th Allerton Conf. On Comm., Control and
Computing, Oct. 2000.
• Visit www.cens.ucla.edu technical reports section for a
variety of related papers and theses
G. Pottie, Sensys, November 7, 2003