Transcript slides
On the Enrichment of a RDF Repository of
City Points of Interest based on Social Data
Zied Sellami*, Gianluca Quercini**, Chantal
Reynaud*
*IASI Team, Université Paris-Sud 11, France
{sellami, reynaud}@lri.fr
**E3S Team, Supélec, France
[email protected]
WOD’2013 - Paris - 03th June, 2013
Outline
1. Introduction and Related Issues
2. Reconciliation of POI Data and Social Data
3. Enrichment based on Opinion Mining
4. Experiments and Results
5. Conclusion and Future Work
WOD'2013, Paris, June 2013
2/20
Introduction and Related Issues
Points of interest (POI) : geographic locations
Formalized as a RDF repository in the context of the
DataBridges project (Quercini et al., 2012)
Restaurants, museums, hotels, theatres, landmarks, etc.
A POI is described by facets (or attributes) : name, type,
category, address, longitude and latitude.
Example of POI : Louvre Museum
WOD'2013, Paris, June 2013
3/20
Introduction and Related Issues
POIs are automatically obtained by data extracting from
Google Fusion Tables (GFT) (Quercini et al., 2012)
Some extracted POIs contains few attributes
Some extracted POIs do not contains a precise attributes (not complete
address, not precise geographic location)
Lack of valuable indications in the extracted POIs (users reviews, official
Web Site, e-mail, etc.)
Enrich and Correct POIs
Additional elements : Phone number, e-mail, official web site…
Useful indications to potential visitors (good and bad aspects of
the place)
Enrich POI using what?
Using Social Networking Systems (Social Data)
WOD'2013, Paris, June 2013
4/20
Matching POIs Across Social Networks
Accessing and searching social Web Pages concerning POI
1.
Yelp (http://www.yelp.com/)
2.
Foursquare (https://foursquare.com/)
Social networking site for retrieving and reviewing POI
Application combining geolocalisation and social guidance
Similar searching method
Input: name and geographic position
Output: list of Web Pages of POI related to the geographic
position and words included in the query
Filtering the list to select only pertinent Web pages
WOD'2013, Paris, June 2013
5/20
Matching POIs Across Social Networks
Selecting the appropriate Web Pages for a POI
Several parameters can be used
Computing a similarity value
Name
Address
Category
Longitude and Latitude
Definition of a similarity formula
WOD'2013, Paris, June 2013
6/20
Matching POIs Across Social Networks: Similarity Measure
2 parameters used : name; longitude and latitude
Different social data different manner to describe category
Uncontrolled social data string address incomplete or wrong
O Pelicano (Portugal); Restaurante O Requinte (Portugal); etc.
String techniques for name pruning and name comparison
Eiffel Tower (Monument, Garden, etc ; Landmarks, Historical Building)
Stemming with porter algorithm; stop words lists
Levensthein distance and Jaccard distance
Filtering results using distance proximity
Processing geographic distance between POI and Web Page by
using longitude and latitude
WOD'2013, Paris, June 2013
7/20
Matching POIs Across Social Networks: Similarity Measure
Similarity measure
WP(x).name : name of an entity x in a Web Page
p.name : name of a POI
Combination of Levenshtein and Jaccard
Boosts the similarity score between names that employ words
even in a different order
Example : Museum of Louvre; Louvre Museum
WOD'2013, Paris, June 2013
8/20
Matching POIs Across Social Networks: Filtering Measure
Filtering measure
δ1 and δ2 : similarity name thresholds
distmax: distance thresholds
Thresholds values fixed after some experiments
δ1 = 0.9 and δ2 = 0.7
distmax = 1000 meters
WOD'2013, Paris, June 2013
9/20
Opinion Mining
Evaluation of the POI from reviews and comments
Notation: Good, Very Good, Bad, Very Bad, etc.
Useful information for a potential visitor:
What is interesting? (food, ambiance, place, etc.)
What is to be avoided? (drink, person, etc.)
Go further than a conventional sentiment analysis
Tweets classification (positive, negative or undetermined) (Pak
and Paroubek, 2010)
http://smm.streamcrab.com/
http://www.sentiment140.com/
Linguistic approach for opinion mining
WOD'2013, Paris, June 2013
10/20
Opinion Mining: Principle
Identification of positive and negative expressions
Using Verbs and adjectives (Chesley et al., 2006) (Moghaddam
and Popowich, 2010) (Li et al., 2012)
Example : Great food, not good place, I like the place, etc.
Generating a lexicon of positive and negative verbs and
adjectives
Processing with TreeTagger a lexicon of positive words and
negative words
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
Positive adjectives (1467 adj) / Negative adjectives (1609 adj)
Positive verbs (421 verb) / Negative Verbs (1243 verb)
WOD'2013, Paris, June 2013
11/20
Opinion Mining: Phrase Extraction
Definition of lexico-syntactic patterns to identify pertinent
expressions
Expressions describing objects
1.
2.
Expressions describing sentiments or advice
1.
2.
3.
(NOT)* ADJ OBJECT (Great food, not interesting place, etc.)
OBJECT BE ADJ (Sandwich is good, restaurant is nice, etc.)
ITS ADJ (it’s interesting, it’s happy, etc.)
I FEEL OR SUGGEST OBJECT ( I like this place, I advice you to
test the hotel, etc.)
I FEEL (NOT)* ADJ (I feel happy, I feel very hungry, etc.)
Implementation with Java Regex
WOD'2013, Paris, June 2013
12/20
Repository Enrichment: Notation of a POI
Notation measure
Scale for giving appreciation to POI
Very bad
Bad
Medium
Indetermined
Fairly
Good
Very
Good
-10
-6,6
-3,3
0
3,3
6,6
10
WOD'2013, Paris, June 2013
13/20
Repository Enrichment: Identifying Useful Information
1.
General assessment
2.
Tips
3.
Expressions describing sentiments
Expressions describing objects concerning the place; the name
of a POI; or one of the POI category
Expressions describing advices
Specific ideas
Expressions describing objects other than place; name or
category of the POI
WOD'2013, Paris, June 2013
14/20
Evaluation of the Similarity Measure
Dataset : 600 POI compared with foursquare data
Comparing our formula with Levenshtein and Jaccard
Formula
Precision
Recall
F-measure
Name + Levenshtein
0.84
0.68
0.75
Name + Jaccard
0.85
0.66
0.74
Our formula
0.86
0.66
0.75
The combination of Levenshtein and Jaccard improves the
similarity precision
Our formula and Levenshtein have a same F-measure
Precision parameter is more important
WOD'2013, Paris, June 2013
15/20
Evaluation of the Opinion Mining Approach
40 Yelp reviews of Louvre Museum and Eiffel Tower
Louvre Museum notation: Very Good (7.23)
Eiffel Tower notation: Very Good (8.58)
Louvre Museum
Eiffel Tower
General assessment:
Positive [magnificent place, beautiful place, good
museum, prestigious museum]
Negative [crowded place, hard museum, uncomfortable
museum]
Tips:
go basement, visit basement, not use pyramid entrance
Specific ideas:
Positive [contemporary art, contemporary sculpture,
original decor, real mummy]
Negative [sketchy people, strange marble sculpture,
massive crowd, grumpy folk]
General assessment:
Positive [great place, funny
place, beautiful monument]
Negative []
Tips:
Go top
Specific ideas:
Positive [good view, panoramic
view, light show]
Negative [slow elevator, crazy
line, illegal Eiffel tower souvenir]
WOD'2013, Paris, June 2013
16/20
Evaluation of the Opinion Mining Approach
Comparison with sentiment140 (statistical approach)
Analysis of 20 tweets concerning Louvre Museum and 14 tweets
concerning Eiffel Tower
Polarity of Louvre
Museum tweet
sentiment140
Our
approach
Positive
13
10
Negative
2
0
Undetermined
5
10
sentiment140
Our
approach
Polarity of Eiffel
Tower tweet
Positive
11
10
Negative
1
1
Undetermined
2
3
WOD'2013, Paris, June 2013
Not contradictory results
Our approach identified 3
sentiments that where not
identified by sentiment140
2 tweets analyzed differently
Our approach identified the
correctness polarity
17/20
Conclusion
Original approach for POI data enrichment
Definition of a similarity formula to compare POI data
Linguistic approach to analyze reviews and comments
Complete tool implemented in Java
Experiments shows promising results
About 86 % of similarity precision
Linguistic approach able to identify exactly positive and negative
aspects of the POI
WOD'2013, Paris, June 2013
18/20
Future Work
1.
Similarity measure optimisation
2.
Filtering positive and negative expressions
3.
Using metrics like frequency
Learning new positive and negative verbs and adjectives
4.
Compare selected Web Pages for the POI
Using SentiWordNet (Baccianella et al., 2010)
Using adverbs in the opinion mining approach (Benamara et
al., 2007)
Very good food is stronger than Good food
WOD'2013, Paris, June 2013
19/20
Thank you!
WOD'2013, Paris, June 2013
20/20