Sensors, Databases and Flash Storage

Download Report

Transcript Sensors, Databases and Flash Storage

Social Network Analysis
&
Network Optimization
@ Dept. of Computer & Communication Engineering, University of Thessaly
@ Dept. of Informatics, Aristotle University
Koblenz, February 18th, 2008
Dimitrios Katsaros, Ph.D.
Outline of the talk
•A few words about my research
•My latest results: Social Network Analysis for
Network Optimization
•Web (2
nd
round review @ IEEE Transactions on Knowledge & Data Engineering)
•PRIMITIVE: Community Identification
•PROTOCOL: Content Outsourcing
•GOAL: Latency Reduction
•Wireless Multimedia Sensor Nets (2
nd
round review @ ACM Mobile Networks & Applications)
•PRIMITIVE: Important Sensor Nodes Identification
•PROTOCOL: Cooperative Caching
•GOAL: Latency Reduction
•Why Collective Intelligence?
2
Research areas: Ultimately  ???
Mobile/Pervasive Computing
Web
Overlay Nets
Caching &
Air-Indexing
Ad Hoc
Pervasive
Peer-to-Peer
Networks
Content-Based MIR
Web
Broadcasting &
Data Dissemination
Cooperative Caching &
Sensor Node Clustering &
Distributed Indexing &
Coverage/Connectivity &
Flash storage &
Sensors
Web Ranking &
Search Engines
Social Network Analysis
Information Retrieval
4
Social Network Analysis
• A social network is a social structure to describe
social relations (wikipedia)
• The history of Social Network is older than
everybody who is here
• More than 100 years (Cooley 1909, Durkheim
1893)
• Focusing on small groups
• Information Techniques give it a new life
[book: Stanley Wasserman & Katherine Faust]
1. Mathematical Representation
2. Structural & Locational Properties
3. Roles & Positions
4. Dyadic & Triadic Methods
5
Social Network Analysis
[Stanley Wasserman & Katherine Faust]
1. Mathematical Representation
2. Structural & Locational Properties
1. Centrality
1. Betweenness Centrality
3. Roles & Positions
4. Dyadic & Triadic Methods
6
Betweenness Centrality
• Let σuw= σwu denote the number of shortest paths
from u  V to w  V (by definition, σuu= 0)
• Let σuw(v) denote the number of shortest paths from
u to w that some vertex v  V lies on
• The Betweenness Centrality index NI(v) of a
vertex v is defined as:
• Large values for the NI index of a node v indicate
that this node can reach others on relatively short
paths, or that v lies on considerable fractions of
shortest paths connecting others
7
The NI index in sample graphs
In parenthesis, the NI index
of the respective node; i.e.,
7(156): node with ID 7 has
NI equal to 156.
Nodes with large NI:
Articulation nodes (in
bridges), e.g., 3, 4, 7, 16, 18
With large fanout, e.g., 14,
8, U
Therefore: geodesic nodes
8
The NI index in a localized algorithm
• For any node v, the NI indexes of the nodes in N12(v)
calculated only for the subgraph of the 2-hop (in general,
k-hop) neighborhood reveal the relative importance of the
nodes in N12
• For a node u (of the 2-hop neighbourhood of a node v), the
NI index of u will be denoted as NIv(u)
9
Betweenness Centrality in …
• [WEB] Performing graph clustering and
recognizing communities in Web site
graphs
• [WIRELESS MULTIMEDIA SENSOR
NETWORKS] Recognizing (in a
distributed fashion) important sensor
nodes, the mediators, that coordinate
cooperative caching decisions
10
Community Identification
& Content Outsourcing for
the Web
The need for content outsourcing
12
CiBC Method
• Target:
d out (C )
s
d in (C )
is true
• CiBC method:
• Building cliques and clusters around representative (pole)
nodes (with low CB)
• Earlier methods have
• Defined “hard communities”: node deg(inCom)>deg(outCom)
• exploited “edge betweenness” to perform hierarchical
agglomerative clustering
13
CiBC Method
Phase 1:
NI Computation -O(nm)
Phase 2:
Initialization of cliques
O(n)
8
9
7
5
6
10
1
2
11
3
0
4
ID
NI index
10
20.68
2
19.61
6
11.38
1
10.28
7
2.06
0
1.73
9
0.99
8
0.99
4
0.75
5
0.00
11
0.00
14
CiBC Method
8
9
Phase 2:
Initialization of cliques
O(n)
7
5
6
10
1
2
11
3
0
4
ID
NI index
10
20.68
2
19.61
6
11.38
1
10.28
7
2.06
0
1.73
9
0.99
8
0.99
4
0.75
5
0.00
11
0.00
15
CiBC Method
8
9
Phase 2:
Initialization of cliques
O(n)
7
5
6
10
1
2
11
3
0
4
ID
NI index
10
20.68
2
19.61
6
11.38
1
10.28
7
2.06
0
1.73
9
0.99
8
0.99
4
0.75
5
0.00
11
0.00
16
CiBC Method
8
9
Phase 2:
Initialization of cliques
O(n)
7
5
6
10
1
2
11
3
0
4
ID
NI index
10
20.68
2
19.61
6
11.38
1
10.28
7
2.06
0
1.73
9
0.99
8
0.99
4
0.75
5
0.00
11
0.00
17
CiBC Method
8
9
Phase 2:
Initialization of cliques
O(n)
7
5
6
10
1
2
11
3
0
4
ID
NI index
10
20.68
2
19.61
6
11.38
1
10.28
7
2.06
0
1.73
9
0.99
8
0.99
4
0.75
5
0.00
11
0.00
18
CiBC Method
A
8
9
7
Phase 3:
Clique Merging &
Creation of Communities
B
5
6
10
A
B
C
D
A
3
3
0
0
B
3
3
1
1
C
0
1
3
4
D
0
1
4
3
1
2
11
C
D
3
0
Complexity: O(l2)
l is the number
of cliques
4
19
CiBC Method
A
8
9
7
Phase 3:
Clique Merging &
Creation of Communities
B
5
6
10
A
B
C
D
A
3
3
0
0
B
3
3
1
1
C
0
1
3
4
D
0
1
4
3
1
2
11
C
D
3
0
4
3
4
20
CiBC Method
A
8
9
7
Phase 3:
Clique Merging &
Creation of Communities
B
5
A
B
C
A
3
3
0
B
3
3
2
C
0
2
10
6
10
1
2
11
C
3
0
4
21
CiBC Method
A
8
9
7
Phase 3:
Clique Merging &
Creation of Communities
B
5
A
B
C
A
3
3
0
B
3
3
2
C
0
2
10
6
10
1
2
11
C
3
0
4
22
CiBC Method
A
8
9
7
Phase 3:
Clique Merging &
Creation of Communities
C
A
9
2
C
2
10
5
6
10
Phase 4:
Check constraints
A
1
2
11
C
3
0
4
23
CiBC vs. Clique Percolation Method, LRU
24
Cooperative Caching in
Wireless Multimedia Sensor
Networks
The NICoCa protocol
• Each node is aware of its 2-hop neighborhood
• Uses NI to characterize some neighbors as mediators
• A node can be either a mediator or an ordinary node
• Each sensor node stores
• the dataID, and the actual multimedia datum
• the data size, TTL interval
• for each cached item, the timestamps of the K most
recent accesses
• each cached item is characterized either as O (i.e., own)
or H (i.e., hosted)
26
The cache discovery protocol (1/2)
A sensor node issues a request for a multimedia
item
• Searches its local cache and if it is found (local
cache hit) then the K most recent access
timestamps are updated
• Otherwise (local cache miss), the request is
broadcasted and received by the mediators
• These check the 2-hop neighbors of the requesting
node whether they cache the datum (proximity
hit)
• If none of them responds (proximity cache miss),
then the request is directed to the Data Center
27
The cache discovery protocol (2/2)
When a mediator receives a request, searches its
cache
• If it deduces that the request can be satisfied by a
neighboring node (remote cache hit), forwards the
request to the neighboring node with the largest
residual energy
• If the request can not be satisfied by this mediator
node, then it does not forward it recursively to its own
mediators, since this will be done by the routing
protocol, e.g., AODV
• If none of the nodes can help, then requested datum is
served by the Data Center (global hit )
28
Cache vs. hits (MB files & uniform
access) in a sparse WMSN (d = 4)
HYBRID: appears at:
L. Yin and G. Cao, “Supporting cooperative
caching in ad hoc networks”, IEEE Transactions
on Mobile Computing, 5(1):77-89, 2006
29
Cache vs. hits (MB files & uniform
access) in a dense WMSN (d = 7)
HYBRID: appears at:
L. Yin and G. Cao, “Supporting cooperative
caching in ad hoc networks”, IEEE Transactions
on Mobile Computing, 5(1):77-89, 2006
30
Evolution of cyberspace …
Collective
Intelligence Net
Semantic Web
WWW
Semantic Web + Pervasive Computing
WWW + Broadband + WIFI + grid computing
Unicode + XML + RDF + Ontologies
Internet + Multimedia + URL + HTTP + HTML
Internet
Servers + Telecom Networks + PCs + TCP-IP + e-mail + FTP
PC
Computers + Micro-chips + Application Software + WYSIWYG
Interfaces
Computer
Transistors+Formal Logic+Digital Coding+ Program. Languages
31
Why Collective Intelligence?
• Users/ devices generate data at an unprecedented rate
•
•
•
•
•
Blogs
Tags
Sensor measurements
Web pages
Rankings by search engines
• They are “opinions” or “votes”
• Under some conditions: group IQ > individual IQ
• [So far] Opinion/Vote fusion:
• PageRank (i.e., collective linking preferences)
• Metasearching (ranked list merging)
• Collaborative filtering (what is interesting from what other people
say, what people like you say)
• …..
32
Collective Intelligence challenges
• Statistical analysis of social networks
• Identification of influential opinions
and/or producers (e.g., bloggers)
• Discover social context to provide
personalization searching
• Opinion spam
• Bias filtering
33
Collective Intelligence challenges
•
•
•
•
•
Finding high-quality content
Opinion mining
Dealing with controversies (e.g., in Wikipedia)
Metadata from data analysis
Storage of metadata
MOST IMPORTANTLY
• In Centralized and/or Distributed settings
34
Thank you
for your attention!
Questions?
References
Our work
• D. Katsaros , G. Pallis, K. Stamos, A. Sidiropoulos, A. Vakali, Y.
Manolopoulos. CDNs Content Outsourcing via Generalized
Communities. IEEE Transactions on Knowledge and Data
Engineering, (under second round review), December, 2007.
• N. Dimokas, D. Katsaros, and Y. Manolopoulos, “Cooperative
Caching in Wireless Multimedia Sensor Networks” ACM
Mobile Networks and Applications, (under second round review),
February, 2008.
Competing methods
• [CPM community identification method] G. Palla, I.Derenyi,
I.Farkas, and T.Vicsek. Uncovering the overlapping community
structure of complex networks in nature and society. Nature,
435(7043):814–818, 2005.
• [Hybrid cooperative caching method] L. Yin and G. Cao.
Supporting cooperative caching in ad hoc networks. IEEE
Transactions on Mobile Computing, 5(1):77–89, 2006.
36