Transcript slides

On the Enrichment of a RDF Repository of
City Points of Interest based on Social Data
Zied Sellami*, Gianluca Quercini**, Chantal
Reynaud*
*IASI Team, Université Paris-Sud 11, France
{sellami, reynaud}@lri.fr
**E3S Team, Supélec, France
[email protected]
WOD’2013 - Paris - 03th June, 2013
Outline
1. Introduction and Related Issues
2. Reconciliation of POI Data and Social Data
3. Enrichment based on Opinion Mining
4. Experiments and Results
5. Conclusion and Future Work
WOD'2013, Paris, June 2013
2/20
Introduction and Related Issues

Points of interest (POI) : geographic locations


Formalized as a RDF repository in the context of the
DataBridges project (Quercini et al., 2012)


Restaurants, museums, hotels, theatres, landmarks, etc.
A POI is described by facets (or attributes) : name, type,
category, address, longitude and latitude.
Example of POI : Louvre Museum
WOD'2013, Paris, June 2013
3/20
Introduction and Related Issues

POIs are automatically obtained by data extracting from
Google Fusion Tables (GFT) (Quercini et al., 2012)


Some extracted POIs contains few attributes
Some extracted POIs do not contains a precise attributes (not complete
address, not precise geographic location)
Lack of valuable indications in the extracted POIs (users reviews, official
Web Site, e-mail, etc.)


Enrich and Correct POIs



Additional elements : Phone number, e-mail, official web site…
Useful indications to potential visitors (good and bad aspects of
the place)
Enrich POI using what?

Using Social Networking Systems (Social Data)
WOD'2013, Paris, June 2013
4/20
Matching POIs Across Social Networks

Accessing and searching social Web Pages concerning POI
1.
Yelp (http://www.yelp.com/)

2.
Foursquare (https://foursquare.com/)


Social networking site for retrieving and reviewing POI
Application combining geolocalisation and social guidance
Similar searching method


Input: name and geographic position
Output: list of Web Pages of POI related to the geographic
position and words included in the query
 Filtering the list to select only pertinent Web pages
WOD'2013, Paris, June 2013
5/20
Matching POIs Across Social Networks

Selecting the appropriate Web Pages for a POI


Several parameters can be used





Computing a similarity value
Name
Address
Category
Longitude and Latitude
Definition of a similarity formula
WOD'2013, Paris, June 2013
6/20
Matching POIs Across Social Networks: Similarity Measure

2 parameters used : name; longitude and latitude

Different social data  different manner to describe category


Uncontrolled social data  string address incomplete or wrong


O Pelicano (Portugal); Restaurante O Requinte (Portugal); etc.
String techniques for name pruning and name comparison



Eiffel Tower (Monument, Garden, etc ; Landmarks, Historical Building)
Stemming with porter algorithm; stop words lists
Levensthein distance and Jaccard distance
Filtering results using distance proximity

Processing geographic distance between POI and Web Page by
using longitude and latitude
WOD'2013, Paris, June 2013
7/20
Matching POIs Across Social Networks: Similarity Measure

Similarity measure



WP(x).name : name of an entity x in a Web Page
p.name : name of a POI
Combination of Levenshtein and Jaccard


Boosts the similarity score between names that employ words
even in a different order
Example : Museum of Louvre; Louvre Museum
WOD'2013, Paris, June 2013
8/20
Matching POIs Across Social Networks: Filtering Measure

Filtering measure



δ1 and δ2 : similarity name thresholds
distmax: distance thresholds
Thresholds values fixed after some experiments


δ1 = 0.9 and δ2 = 0.7
distmax = 1000 meters
WOD'2013, Paris, June 2013
9/20
Opinion Mining

Evaluation of the POI from reviews and comments


Notation: Good, Very Good, Bad, Very Bad, etc.
Useful information for a potential visitor:



What is interesting? (food, ambiance, place, etc.)
What is to be avoided? (drink, person, etc.)
Go further than a conventional sentiment analysis



Tweets classification (positive, negative or undetermined) (Pak
and Paroubek, 2010)
http://smm.streamcrab.com/
http://www.sentiment140.com/
 Linguistic approach for opinion mining
WOD'2013, Paris, June 2013
10/20
Opinion Mining: Principle

Identification of positive and negative expressions



Using Verbs and adjectives (Chesley et al., 2006) (Moghaddam
and Popowich, 2010) (Li et al., 2012)
Example : Great food, not good place, I like the place, etc.
Generating a lexicon of positive and negative verbs and
adjectives




Processing with TreeTagger a lexicon of positive words and
negative words
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
Positive adjectives (1467 adj) / Negative adjectives (1609 adj)
Positive verbs (421 verb) / Negative Verbs (1243 verb)
WOD'2013, Paris, June 2013
11/20
Opinion Mining: Phrase Extraction

Definition of lexico-syntactic patterns to identify pertinent
expressions

Expressions describing objects
1.
2.

Expressions describing sentiments or advice
1.
2.
3.

(NOT)* ADJ OBJECT (Great food, not interesting place, etc.)
OBJECT BE ADJ (Sandwich is good, restaurant is nice, etc.)
ITS ADJ (it’s interesting, it’s happy, etc.)
I FEEL OR SUGGEST OBJECT ( I like this place, I advice you to
test the hotel, etc.)
I FEEL (NOT)* ADJ (I feel happy, I feel very hungry, etc.)
Implementation with Java Regex
WOD'2013, Paris, June 2013
12/20
Repository Enrichment: Notation of a POI

Notation measure

Scale for giving appreciation to POI
Very bad
Bad
Medium
Indetermined
Fairly
Good
Very
Good
-10
-6,6
-3,3
0
3,3
6,6
10
WOD'2013, Paris, June 2013
13/20
Repository Enrichment: Identifying Useful Information
1.
General assessment


2.
Tips

3.
Expressions describing sentiments
Expressions describing objects concerning the place; the name
of a POI; or one of the POI category
Expressions describing advices
Specific ideas

Expressions describing objects other than place; name or
category of the POI
WOD'2013, Paris, June 2013
14/20
Evaluation of the Similarity Measure

Dataset : 600 POI compared with foursquare data

Comparing our formula with Levenshtein and Jaccard
Formula


Precision
Recall
F-measure
Name + Levenshtein
0.84
0.68
0.75
Name + Jaccard
0.85
0.66
0.74
Our formula
0.86
0.66
0.75
The combination of Levenshtein and Jaccard improves the
similarity precision
Our formula and Levenshtein have a same F-measure

Precision parameter is more important
WOD'2013, Paris, June 2013
15/20
Evaluation of the Opinion Mining Approach

40 Yelp reviews of Louvre Museum and Eiffel Tower


Louvre Museum notation: Very Good (7.23)
Eiffel Tower notation: Very Good (8.58)
Louvre Museum
Eiffel Tower
General assessment:
Positive [magnificent place, beautiful place, good
museum, prestigious museum]
Negative [crowded place, hard museum, uncomfortable
museum]
Tips:
go basement, visit basement, not use pyramid entrance
Specific ideas:
Positive [contemporary art, contemporary sculpture,
original decor, real mummy]
Negative [sketchy people, strange marble sculpture,
massive crowd, grumpy folk]
General assessment:
Positive [great place, funny
place, beautiful monument]
Negative []
Tips:
Go top
Specific ideas:
Positive [good view, panoramic
view, light show]
Negative [slow elevator, crazy
line, illegal Eiffel tower souvenir]
WOD'2013, Paris, June 2013
16/20
Evaluation of the Opinion Mining Approach

Comparison with sentiment140 (statistical approach)

Analysis of 20 tweets concerning Louvre Museum and 14 tweets
concerning Eiffel Tower
Polarity of Louvre
Museum tweet
sentiment140
Our
approach
Positive
13
10
Negative
2
0
Undetermined
5
10
sentiment140
Our
approach
Polarity of Eiffel
Tower tweet
Positive
11
10
Negative
1
1
Undetermined
2
3
WOD'2013, Paris, June 2013
Not contradictory results
Our approach identified 3
sentiments that where not
identified by sentiment140
2 tweets analyzed differently
Our approach identified the
correctness polarity
17/20
Conclusion

Original approach for POI data enrichment




Definition of a similarity formula to compare POI data
Linguistic approach to analyze reviews and comments
Complete tool implemented in Java
Experiments shows promising results


About 86 % of similarity precision
Linguistic approach able to identify exactly positive and negative
aspects of the POI
WOD'2013, Paris, June 2013
18/20
Future Work
1.
Similarity measure optimisation

2.
Filtering positive and negative expressions

3.
Using metrics like frequency
Learning new positive and negative verbs and adjectives

4.
Compare selected Web Pages for the POI
Using SentiWordNet (Baccianella et al., 2010)
Using adverbs in the opinion mining approach (Benamara et
al., 2007)

Very good food is stronger than Good food
WOD'2013, Paris, June 2013
19/20
Thank you!
WOD'2013, Paris, June 2013
20/20