Database describing Complex Networks, Internet and WWW

Download Report

Transcript Database describing Complex Networks, Internet and WWW

COevolution and Self-organization
In dynamical Networks
COSIN
Database describing
Complex Networks,
Internet and WWW
CR4 – Ecole Polytechnique Fédérale de Lausanne (EPFL)
ex-Université de Lausanne (UNIL)
Fabrizio Coccetti
Centro Studi e Ricerche e Museo Storico della Fisica “Enrico Fermi”
Compendio Viminale – Via Panisperna
Rome
14 May 2004
Fabrizio Coccetti - Centro Fermi
1
Agenda
 CR4 node presentation,
funding and affiliation
 Overview of CR4 tasks and
collaboration to other WPs
 The new COSIN Web Site
 Database of collected data
(WWW and internet)
 CR4 contributions to other work packages
14 May 2004
Fabrizio Coccetti - Centro Fermi
2
CR4 - Structure
Paolo De Los Rios
Assistant Prof. - Tenure Track
Thomas Petermann
David Gfeller
Ph.D. Student (May 2002 - due March 2005)
6 months visitor (February 2004 – July 2004)
Claudio Valerio
Diploma Student (due February 2005)
Fabrizio Coccetti
Researcher
Museo Storico della Fisica e Centro Studi e Ricerche “Enrico Fermi” – Roma
14 May 2004
Fabrizio Coccetti - Centro Fermi
3
Source of Founding
Since COSIN has been signed before January 1st 2004, the source of funding
is not the European Commission but the Swiss Confederation, through the Federal
Office for Education and Science (OFES) under contract 02.0234.
Due to internal Swiss delays, the 24th month of COSIN corresponds actually to the
21st month for CR4.
14 May 2004
Fabrizio Coccetti - Centro Fermi
4
Change of affiliation
CR4 sits in the Institute of Theoretical Physics of the EPFL.
On October 1st 2003 the whole Physics, Chemistry and Mathematics
departments of the University of Lausanne have switched affiliation to
the Ecole Polytechnique Federale de Lausanne (EPFL).
This change of affiliation is the object of a forthcoming contract amendment within COSIN.
COSIN accounts have been closed at UNIL on September 30th 2003.
COSIN funds have been transferred from UNIL to EPFL on January 6th 2004.
There has been a three months gap filled “somehow” to pay for personnel (mainly loans from EPFL).
14 May 2004
Fabrizio Coccetti - Centro Fermi
5
CR4 Tasks
D12 – Database describing complex networks, internet and www
During the 2nd year CR4 has also contributed to
WP1: Mathematical Tools for Complex Systems
WP4: Dynamics of social networks
WP5: Models for communication networks
Re-design of the COSIN Web Site
14 May 2004
Fabrizio Coccetti - Centro Fermi
6
Re-designing the COSIN Web Site
 Coherent links from
all the partner nodes
 Proper structure of
the website
14 May 2004
Fabrizio Coccetti - Centro Fermi
7
Contents !!!!
Usable !!!!
Keywords for
profane surfers
Specific links for
specialists or
people interested
Nice look
Keep it update
Starting point to:
•reach all the
nodes
•main results
•understand the
project
•news
14 May 2004
Fabrizio Coccetti - Centro Fermi
8
Work
Packages
point directly
to Web Pages
maintained by
partner nodes
14 May 2004
Fabrizio Coccetti - Centro Fermi
9
Remote pages
have coherent
structure and
appearance
14 May 2004
Fabrizio Coccetti - Centro Fermi
10
All the
deliverables can
be
straightforward
downloaded from
the main site
14 May 2004
Fabrizio Coccetti - Centro Fermi
11
Publications are
organized on a
per year base,
most of them
point to a PDF
version.
Still missing:
•Better check of
the publications
(duplicates)
•Improve the
structure
14 May 2004
Fabrizio Coccetti - Centro Fermi
12
D12 – Database of Collected Data
The database is at the moment composed of various (but small
amount of) data, some collected locally, some by other consortia.
 Internet
 World-Wide-Web
 Protein Networks
 Miscellaneous:
Food Webs, Social Networks, U.S. patents, …
Data available at www.cosin.org/data.html
14 May 2004
Fabrizio Coccetti - Centro Fermi
13
The data acquisition problem
In 2001 the data collection community was already
growing but still based on small efforts by few groups.
It has developed, now, in large consortia dedicated to the
task.
Indeed, it has been proved
(by CR4 and CR8: T. Petermann and P. De Los Rios,
Exploration of Scale-Free Networks, Eur. Phys. J. B, in
press (2004); A. Barrat et al. 2004, in preparation)
that measurements from one or a few network nodes can
indeed skew the data.
The overlap of many different measurements is necessary
to recover the correct network structure.
This is beyond COSIN capabilities.
14 May 2004
Fabrizio Coccetti - Centro Fermi
14
Solutions
 Large consortia (CAIDA, LANRL) overcome these
problems and are giving public access to their data.
 More generally the database will also develop into a
collection of useful links.
 We will devote more efforts to context-oriented WWW
data (see sets in the database), that have not yet
attracted the great attention of the data-collection
community.
 Collaboration with other consortia or institution
14 May 2004
Fabrizio Coccetti - Centro Fermi
15
Possible collaboration
 PingER, BW to the World (SLAC)
 Gloperf (Globus Alliance)
 TTM (RIPE)
 AMP (NLANR)
 Skitter (CAIDA)
 Evergrow
14 May 2004
Fabrizio Coccetti - Centro Fermi
16
World Wide Web Data
We are collecting data using a robotic interface
to Google (available to the public) and a
Crawler (it will be available to the public, after
we have published some results) .
The data in our database represent portion of
the WWW where connected pages are related
by the same words in their contents.
We believe these data to be relevant to people
interested in detecting cyber communities.
14 May 2004
Fabrizio Coccetti - Centro Fermi
17
Obtain list of URL
from google
searching for a word
(phrase)
Check if the page
contains the word
(phrase)
Count links
Follow the links
Repeat
14 May 2004
Fabrizio Coccetti - Centro Fermi
18
14 May 2004
Fabrizio Coccetti - Centro Fermi
19
1 level depth
14 May 2004
Fabrizio Coccetti - Centro Fermi
20
Internet Data
Some data have been collected locally by the
traceroute command.
Some data have been collected by a machine in
Milan (GARR) using the PINGER engine.
14 May 2004
Fabrizio Coccetti - Centro Fermi
21
Ping Data
 The PINGER engine was used to collect data
from Milan (GARR) to the world
 Every 30 min, 11 ping packets, two sizes
(100b and 1000b), you can estimate the
Capacity of paths (variable packet size
technique)
 One possible development:
Merge the PINGER engine with a traceroute
engine: weighted graphs
14 May 2004
Fabrizio Coccetti - Centro Fermi
22
Variable Size Packets
d
Q
T  2  B  2
v
C
1
1
link
2(Q  Q )
C 
T  T
2
1
link
d
Q
T  2  B  2
v
C
2
2
1
2
link
14 May 2004
Fabrizio Coccetti - Centro Fermi
23
PingER
PingER dimensions
(beginning of 2004)
 36 monitoring sites,
12 nazioni
 822 remote sites,
in 80 nazioni
Collaboration for Pinger 2
(PERL module written
by F.Coccetti)
Needs database support
Project born at SLAC (1995)
by the IEPM (Internet End-to-end Performance Monitoring) group
14 May 2004
Fabrizio Coccetti - Centro Fermi
24
IEPM-BW to the World
IEPM-BW dimensions
(beginning of 2004)
 7 monitoring networks
 SLAC, FNAL, NIKHEF,
Internet2, Manchester UK,
Univ.Michigan, INFN Mi
Project born at SLAC (2001)
(BABAR)
Authors: C.Logg, L.Cottrell,
J.Williams, M.Bhargava,
F.Coccetti, I-Heng Mei,
Maxim Grigoriev
14 May 2004
Fabrizio Coccetti - Centro Fermi
25
Protein Networks
Protein-protein interaction networks are another
domain where network tools are intensively
used to detect relevant protein modules.
The data in our database represent a small
portion of the data at the Database
for Interacting Proteins (DIP), which is the
most complete and updated repository
of protein interaction data, covering various
different organisms.
Data at DIP are free to download and use.
14 May 2004
Fabrizio Coccetti - Centro Fermi
26
Miscellaneous Data
Some more data are available in our database
concerning Food Webs, Social Networks
(actor collaboration network)
Keep this section to display:
- data collected to make COSIN publications
- links to databases
14 May 2004
Fabrizio Coccetti - Centro Fermi
27
CR4 contributions to other Work Packages (1)
WP4: Dynamics of social networks
Stimulated by the observation that the sizes of the email folders of few
uncorrelated people show the same statistical (algebraic) distribution, we
have developed a model where social relations reinforce in time by
establishing preferential exchange pairs of partners, giving a rationale
for the observed distributions.
G. Caldarelli, F. Coccetti and P. De Los Rios
Preferential Exchange: Strengthening connection in complex networks
Phys. Rev. E submitted.
14 May 2004
Fabrizio Coccetti - Centro Fermi
28
CR4 contributions to other Work Packages (2)
WP1: Mathematical Tools for Complex Systems
We have developed new approximation schemes to better keep into
account spatial and temporal correlation on regular lattices and networks,
based on techniques borrowed from equilibrium statistical physics (such
as the Cluster Variation Method)
T. Petermann and P. De Los Rios
Cluster approximations for epidemic processes:
a systematic description of correlations beyond the pair level.
Journal of Theoretical Biology, in press (2004)
T. Petermann and P. De Los Rios
Role of clustering and grid-like ordering in epidemic spreading
Physical Review E, in press (2004)
14 May 2004
Fabrizio Coccetti - Centro Fermi
29
CR4 contributions to other Work Packages (3)
WP1: Mathematical Tools for Complex Systems (continue)
We have rigorously shown that when applying a dichotomy-based
method to identify communities and sub-communities in networks, just
as in classifying species and sub-species in habitats (usual taxonomy),
the method itself imposes an inverse square power-law behaviour for the
community-size distribution
G. Caldarelli, C. Caretta Cartozo, P. De Los Rios and V.D.P. Servedio
The widespread occurrence of the inverse square-law distribution in
social sciences and taxonomy
Phys. Rev. E, 69 035101 (2004).
14 May 2004
Fabrizio Coccetti - Centro Fermi
30
CR4 contributions to other Work Packages (4)
WP5: Models for communication networks
We have worked toward a better characterization of real networks, with
special attention to the Internet, to develop models that are at the same
time simple enough to be analytically tractable, but rich enough to take
into account such important features such as intrinsic relevance of nodes
and rewiring of the network links.
G. Caldarelli, A. Capocci and P. De Los Rios
Quantitative Description and Modeling of Real Networks
Phys. Rev. E 68, 047101 (2003)
G. Caldarelli, P. De Los Rios and L. Pietronero
Generalized Network Growth: from Microscopic Strategies
to the Real Internet
Phys. Rev. E, submitted
14 May 2004
Fabrizio Coccetti - Centro Fermi
31
D13 – Library of software tools
We have collected and developed a number of software tools to
analyze the Internet at AS and IP levels
MRTGv6: a Linux (by now) Multi Router Traffic Grapher for IPv6
Hermes: a tool to visualize relationships between Internet Service Providers
BGPlay: a Java applet for monitoring inter-AS routing instabilities
Netkit:
an open source virtual Networking lab
Torque:
a toolkit for investigating changes in the relationships between AS’s
NetML:
an XML based language to interface with Netkit
NetHunter: discovery and visualization of the Internet topology at IP level
Tools available at www.dia.uniroma3.it/~cosin/Tools.htm with
full documentation (thanks to CR2)
14 May 2004
Fabrizio Coccetti - Centro Fermi
32