Transcript CENTRE

CENTRE
Cellular Network’s Positioning Data Generator
Fosca Giannotti
Andrea Mazzoni
Puntoni Simone
Chiara Renso
KDD-Lab
KKD-Lab
KDD-Lab
KDD-Lab
Why to generate data?

Trouble in finding



Due to ITC Companies reticence
…and for legal and privacy reasons
Need to have ad-hoc datasets


To improve algorithm development
To have a tools for validation and testing
phases
CENTRE:

CEllular Network Trajectory Reconstruction
Environment:

A positioning data (LOG) generation
Environment aimed to Mobile technology

Developed as tool of GeoPKDD projects
GSM technology
GeoPKDD: Geographic Privacy-Aware
Knowledge Discovery & Delivery
The Idea

To generate positional mobile data (LOG) by
the simulation of the event deriving from:



Trajectories of hypothetical mobile network’s
users that travel on territory
The resulting survey of this movements using
synthetic ad-hoc GSM coverage (the set of
BTSs)
So we can analyze the set of LOGs and
recontruct trajectories of mobile network’s
users
Motivation

With this model we want to reach:



More rigorous and realistic semantic of
generating data.
Possibility to compare synthetic
trajectories with reconstructed one.
Chance of validate mining and
knowledge discovery algorithms results
with synthetic trajectories.
CENTRE architecture
What CENTRE do…


Then
weweoverlap
set of antennas
First of all
generate aasequence
of spatio-temporal points
represent a trajectory. We can customize:
represented
by circles of their coverage

Starting point
areas:

Velocity




Agility
Direction
Groups of behavior
Infrastructures, ect.
LOG extraction

So LOG is represented by a tuple:
( Obj_ID, BTS_ID, TimeStamp, d)


Where: of extraction:
Result
1.

2.

3.
4.
Obj_ID is the identifier of
LOG
at time
observed
objecttt2 (P2)

{Cell1,
tt2, d12}
BTS_ID
is theBTS1,
identifier
of
antenna that made this survey
LOG
at time tt3 (P3)
TimeStamp is the time of survey
{Cell1, BTS1,
d13},
D is a evaluation
of tt3,
distance
 object
{Cell1,toBTS2,
tt3, d23},
from
the center
of BTS


{Cell1, BTS3, tt3, d33}
LOG at time tt4 (P4)

{Cell1, BTS2, tt4, d24}
Dataset
Trajectories reconstruction

Once LOG are produced and stored, we forget
about synthetic trajectories and try to reconstruct
these only from:


LOG collection
Synthetic coverage
Information types

Reconstruction was
performed considering all
LOGs produced on a
single temporal instant
for a single trajectoty

The number of LOGs
with same time and same
device identificator
3 LOGs
(id_cell) represent the
number of simultaneous
relevations
1 LOG
2 LOGs
Recontruction method

When we have:


Only one relevation: our point may be inside the entire
antenna covered area, so we take antenna center as point
positions
With two or more relevations: point may be only inside the
intersection area of them, so we take centroid of this area
as point position
Reconstructed trajectories dataset
And now …examples! 
…
……… 
Now we work on…

Make new extensions to main generation
engine


In order to test and validate spatial KD
algorithms with more efficiency and accuracy.
Change old code (that was derived from
GSTD code)


Introducing improvements on class structures
Introducing new data characterization specially
on spatial and temporal aspects
Multiple generation engines


The Idea is to develop
extensions to main
engine every time we
need new features to
test and validate KD
algorithms.
And use each time the
best implementation
on sinthetic trajectories
production engine
depending of type of
data we need to obtain
Density based clustering

We have seen that for best results with
this algorithm is useful to have a
simple method for:


create clusters and
identify relation between objects and
clusters.
Attraction engine



For this particular type of algorithm we
are developing a new engine
extension that use an attraction-like
mechanism.
Each objects chooses and tries to
reach its next attraction area.
When it reaches its destination area
chooses another one, and so on…
Cluster construction

A cluster if formed by a set of objects
that are forced to pass through a
sequence of areas.
…a simple example



In this scenario we can
see one object that
every time chooses a
region with a
completely random
order.
Chosen a region, and a
point on it, the object
tries to reach this point.
…and so on
Others improvements

Formalization of some concepts (at code
level):




Spatio-temporal data
Spatio-temporal object
Trajectory
and a real measures in data values:



Positions are expressed in meters
Velocities are expressed in meters/seconds
Times are expressed in seconds
Conclusions



Nowadays work is in progress, and we hope
to test as soon as possible a Density Based
Algorithm on this new generation engine
Contextually we also work on a engine for
testing Temporal and Sequential Frequent
Pattern Algorithm
And also to improve generator use, through
simplification of number and form of
parameters, graphical interface, ect.