Multi-Relational Data Mining

Download Report

Transcript Multi-Relational Data Mining

iMAP: Indirect Measurement
of Air Pollution with
Cellphones
Murat Ali Bayır
Research Assistant
Ubiquitous Computing Laboratory
Department of Computer Science and Engineering
University at Buffalo
[email protected]
Murat Ali Bayir, Apr. 08
1
OUTLINE
Motivation and Air Pollution Exposure Estimation
Problem
Mobility Profiler Framework and The Data Set
Mobility Path Construction
Air Pollution Estimation
Experimental Results
Murat Ali Bayir, Apr. 08
2
Air Pollution Exposure Estimation Problem
Why Air Pollution Exposure Estimation Problem is Important?
 The
researchers state that two million premature deaths
annually are attributable to air pollutants. The death ratio is
even high in more developed countries [Brundtland 02].
 Acute and chronic air pollutant exposures increase risks of
cardiovascular and respiratory diseases [Brook 07],
exacerbate, asthma among children [Sarnat 07], and
increase risks of neonatal death, low birthweight [Sarnat 07
and Sorensen 99].
Murat Ali Bayir, Apr. 08
3
Air Pollution Exposure Estimation Problem
 To
Previous Approaches
Estimate Air pollution exposure, previous approach
[Adar 07] uses residential information. To illustrate if an
individual works 9 hours per day. These approaches
assumes that an individual stays at work address 9 hours
and remaining 15 hours at home address. After this
assumption, the average air pollution that current person
exposured is estimated by using air pollution data from
Department of Environmental Conversation for particular
areas containing work and home address.
Murat Ali Bayir, Apr. 08
4
Our Motivation
Problems of Previous Approaches and Our Motivation
 Since
the previous approaches uses residential
information, they don’t consider time activity of an individual.
In real life, It is very common for a person to become mobile
between several location like going shopping, go to friends
house, go for lunch etc. Since the previous approaches does
not consider this conditions, their error in air pollution
estimation is increases.
 The aim of this project is to use using mobility paths of
individual collected via cell phones for increasing the
accuracy of air pollution estimation and remove the
deficiency of residential approach. We use Mobility Profiler
Framework [Bayir 08] for extracting mobility paths of
individuals.
Murat Ali Bayir, Apr. 08
5
OUTLINE
Motivation and Air Pollution Exposure Estimation
Problem
Mobility Profiler Framework and The Data Set
Mobility Path Construction
Air Pollution Estimation
Experimental Results
Murat Ali Bayir, Apr. 08
6
Mobility Profiler Framework and The Data Set
Mobility Profiler Framework [Bayir 08]
Path Construction
Mobility
Database
Pattern Discovery
Post Processing
Mobility paths
Rules and Patterns
Topology
Construction
Cell Tower
Topology
Murat Ali Bayir, Apr. 08
7
Interesting
Knowledge
Mobility Profiler Framework and The Data Set
•
•
•
•
The Data Set
The data set is collected by MIT Reality Mining Group
performing experimental study involving 100 people.
Each person uses Nokia N60 series cell phone and runs
software which records data about cell phone usage.
All of the data is kept in database spanning 350K hours of
data total size of which is about 1GB
The software on cellular phones is written in such a way
that it can log data without interrupting user’s process like
voice call.
Murat Ali Bayir, Apr. 08
8
Mobility Profiler Framework and The Data Set
The Database Structure
• All of the usage data is stored in reality database
including 10 tables. From these data set, the following
tables are used for mining cell phone user mobility.
This is the full schema of
the tables used. The
core table for mining is
cellspan.
Murat Ali Bayir, Apr. 08
9
Mobility Profiler Framework and The Data Set
Example CellSpan Log
Cell Transition Time: The time elapsed between any contiguous record of same user
00:02:43
oid
start time
end time
Person_oid
celltower_oid
1
[25/Apr/2007:03:04:41]
[25/Apr/2007:03:24:48]
12
86
2
[25/Apr/2007:03:27:43]
[25/Apr/2007:03:33:28]
12
87
3
[25/Apr/2007:03:36:11]
[25/Apr/2007:03:39:52]
12
95
00:03:41
Duration time: Time spent in the area of any cell tower
Murat Ali Bayir, Apr. 08
10
OUTLINE
Motivation and Air Pollution Estimation Problem
Mobility-Miner Framework and The Data Set
Mobility Path Construction
Air Pollution Estimation
Experimental Results
Murat Ali Bayir, Apr. 08
11
Mobility Path Construction
• Why do We need Mobility Paths?
• Using raw data in cell span table for most of the
application is difficult since we don’t have related cell
tower connection records together in a set.
• What does the related cell tower records means?
• The answer is hidden in the semantics of dataset which is
related to human mobility. All of human mobility data is
collected to during the individuals’ trip from one location to
another.
• Somehow, we need to construct sets for mobility paths
which corresponds to an individuals’ trip from one location
to another.
Murat Ali Bayir, Apr. 08
12
Mobility Path Construction
Return to Our raw Data
Cell Transition Time: The time elapsed between any contiguous record of same user
00:02:43
oid
start time
end time
Person_oid
celltower_oid
1
[25/Apr/2007:03:04:41]
[25/Apr/2007:03:24:48]
12
86
2
[25/Apr/2007:03:27:43]
[25/Apr/2007:03:33:28]
12
87
3
[25/Apr/2007:03:36:11]
[25/Apr/2007:03:39:52]
12
95
Duration time: Time spent in the area of any
cell tower
00:03:41
Cell Transition Time for particular two contiguous record or duration time for any record may be very long
which corresponds to static state for cell phone user. Therefore, we need to cut mobility paths from
these records which corresponds to departure or arrival point for particular trip
Murat Ali Bayir, Apr. 08
13
Mobility Path Construction
• Definition (Mobility Path): A Mobility Path C=[C1, C2,
C3,…, Cn] is an ordered sequence of cell tower ids which
correspond to cells (active area of cell tower represented
by Voronoi diagram) that an individual passed during
his/her travel from one location to another location.
• Each mobility Path must satisfy the following constraints:
Static Location Rule: (for Observed Static Location)
oCk  C satisfying LkdutT > δduration  k=1 or k=|C|
Transition Time Rule: (for Hidden Static Location)
oCk, Ck+1  C  L(k+1)start – Lkend  δtransition
Murat Ali Bayir, Apr. 08
14
Mobility Path Construction
Global variables
userSessionSet, tempSessionSet
Procedure CreateNewSession(person_oid, cell, start, end)
cellSequence := (Ci, starti, endi)
tempSessionSet := tempSessionSet U {(person_oid, cellSequence)}
End Procedure
Procedure SessionConstruction(L, δduration, δtransition )
userSessionSet := {}
tempSessionSet:={}
For each Li of L
durationi := endi - starti
If durationi  δduration then
If  userSessionk  tempSessionSet with person_oidk = person_oidi then
If (starti - lastEndTime(UserSessionk))  δtransition then
userSessionk := (person_oidk, CellSequencek U (Ci, starti, endi))
Else
userSessionSet := userSessionSet U {userSessionk}
tempSessionSet := tempSessionSet – {userSessionk}
CreateNewSession(person_oidi, Ci, starti, endi)
End If
Else
CreateNewSession(person_oidi, Ci, starti, endi)
End If
Murat Ali Bayir, Apr. 08
15
Mobility Path Construction
Else
If  userSessionk  userSessionSet with person_oidk = person_oidi then
If (starti - lastEndTime(UserSessionk))  δtransition then
userSessionk := (person_oidk, cellSequencek U (Ci, starti, endi))
userSessionSet := userSessionSet U {userSessionk}
tempSessionSet := tempSessionSet – {userSessionk}
CreateNewSession(person_oidi, Ci, starti, endi)
Else
userSessionSet := userSessionSet U {userSessionk}
tempSessionSet := tempSessionSet – {userSessionk}
CreateNewSession(person_oidi, Ci, starti, endi)
End If
Else
CreateNewSession(person_oidi, Ci, starti, endi)
End If
End If
End Procedure
Murat Ali Bayir, Apr. 08
16
Mobility Path Construction
δduration = 5
δtransition = 3
oid
Person_oid
Tstart
Tend
Tduration
Ttransition
Celltower_oid
1
1
0
4
4
-1
12
2
1
6
9
3
2
67
3
1
9
13
4
0
123
4
1
15
22
7
2
87
5
1
23
27
4
1
98
6
1
27
30
3
0
12
7
1
43
47
4
13
67
8
1
49
52
3
2
11
• [12, 67, 123, 87]
• [87, 98, 12] ..(gap)..
• [67, 11]
Murat Ali Bayir, Apr. 08
17
OUTLINE
Motivation and Air Pollution Estimation Problem
Mobility Profiler Framework and The Data Set
Mobility Path Construction
Air Pollution Estimation
Experimental Results
Murat Ali Bayir, Apr. 08
18
Air Pollution Estimation
 Easy Process after geographical Mapping
 We map each cell tower to geographical region in Air Pollution DB of
Department of Environmental Conversation. To illustrate of Mobility Path
is P = [C1, C2, C3]
Pollution Exposured = T1 * P<C1-T1> + T2 * P<C2-T2> + T3 * P<C3-T3>
P<CN-TN>: The average air pollution estimated on the region containing cell
tower Cn during time interval Tn
Murat Ali Bayir, Apr. 08
19
OUTLINE
Motivation and Air Pollution Estimation Problem
Mobility Profiler Framework and The Data Set
Mobility Path Construction
Air Pollution Estimation
Experimental Results
Murat Ali Bayir, Apr. 08
20
Experimental Results
Remember the Data Set
― More than 2M cell span record
― It keeps 350K hours of cell span data
―Cell span records of 100 mobile users
Murat Ali Bayir, Apr. 08
21
Experimental Results
Determining δduration and δtransition for Mobility Path Construction
Log Coverage Ratio vs Duration Threshold
Log Coverage Ratio
1
0.9
0.8
0.7
0.6
0.5
1
5
10
15
20
25
30
Duration Threshold (min)
Duration time of %94 of all logs smaller than 10 minutes
Murat Ali Bayir, Apr. 08
22
Experimental Results
Determining δduration and δtransition for Mobility Path Construction
Unlike the analysis of δ_duration time,
there is still some visibility problem if
we analyze this data without filtering
the regular handoffs which takes 0
second. In reality mining data set,
nearly, 99.2% of contiguous cellspan
records has regular handoff value
which is 0 second It is obvious that the
user can not be in hidden static
location in this time range. Therefore,
we filter regular handoff times for
analyzing δ_transition time.
Log Coverage Ratio vs Transition Threshold
0.7
Log Coverage Ratio
0.6
0.5
0.4
0.3
0.2
0.1
0
1
5
10
15
20
25
30
35
40
45
50
55
Transition Threshold (min)
Murat Ali Bayir, Apr. 08
23
60
Experimental Results
• By taking δduration=10 min and δtransition = 10 min,
the framework construct 120K mobility paths.
• The number of unique cell tower is 32K.
• We give Mobility paths of two case study to our
domain expert from Department of Social and
Preventive Medicine at UB in order to estimate air
pollution for two case studies.
Murat Ali Bayir, Apr. 08
24
References








[Bayir 08 ] Murat Ali Bayir, Murat Demirbas, Nathan Eagle, Mobility Profiler: A
Framework for Discovering Mobile User Profiles, 2008 (Under Submission)
[Demirbas 08] Murat Demirbas, Carole Rudra, Atri Rudra, Murat Ali Bayir:
IMAP: An Indirect Measurement of Air Pollution via Cell Phone, 2008 (Under
Submission)
[Brook 07] R. D. Brook. Is air pollution a cause of cardiovascular disease?
Updated review and controversies. Rev. Environ. Health, 22(2):115–137, 2007.
[Brundtland 02] G. H. Brundtland. Reducing risks to health, promoting healthy
life. JAMA, 288(16):1974, 2002. From the World Health Organization
[Sarnat 07] J. A. Sarnat and F. Holguin. Asthma and air quality. Curr. Opin
Pulm. Med., 13(1):63–66, 2007.
[Sorensen 99] N. Sorensen, K. Murata, E. Budtz-Jorgensen, P. Weihe, and P.
Grandjean. Prenatal methylmercury exposure as a cardiovascular risk factor at
seven years of age. Epidemiology, 10(4):370–375, 1999.
[Adar 07] S. D. Adar and J. D. Kaufman. Cardiovascular disease and air
pollutants: evaluating and improving epidemiological data implicating traffic
exposure. Inhal. Toxicol., 19(1):135–149, 2007.
[Barnes 05] B. Barnes, A. Mathee, and K. Moiloa. Assessing child timeactivity
patterns in relation to indoor cooking fires in developing countries: a
methodological comparison. Int. J. Hyg.. Environ. Health, 208(3):219–225,
2005.
Murat Ali Bayir, Apr. 08
25
Conclusion
Any Questions ??
Murat Ali Bayir, Apr. 08
26