Transcript SC17

Freebase: A Collaboratively Created Graph
Database For Structuring Human Knowledge
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor
Metaweb Technologies, Inc.
San Francisco
International Conference on Management of Data (2008)
2008. 11. 12.
Center for E-Business Technology
Seoul National University
Seoul, Korea
Summarized & presented by Babar Tareen, IDS Lab., Seoul National University
Motivation – Wikipedia
 Free multilingual encyclopedia
 Supports 264 languages
 854 Volumes of English articles
Copyright  2008 by CEBT
2
Motivation – English Wikipedia Growth
Copyright  2008 by CEBT
3
Introduction
 A public repository of world’s knowledge
 Inspired by The Semantic Web and Wikipedia
 Supports highly diverse and heterogeneous data
 Tries to merge the scalability of structured databases with the
diversity of collaborative wikis into a practical, scalable,
database of structured general human knowledge
 The information contained in Freebase is open to anyone
 However, Freebase backend database is not open
Copyright  2008 by CEBT
4
Data Sources
 User Contribution
 Metaweb Bots
 Incorporates facts from many large, publicly available
information sources
Copyright  2008 by CEBT
5
Data Model

Freebase is a graph database

Set of nodes and a set of links that establish relationships between the nodes

Key Concepts




Domains
–
Bases: collections of topics created by users
–
Commons: similar to bases but more general
–
Film, Religion, Computers
Types
–
Analogues to classes
–
Film Actor, Film Festival, Film Distribution, Film Rating, Film Format
Properties
–
Specific information elements within a type
–
Film Performances, Film Dubbing Performances, IMDb Entry
Topics
–
Analogues to objects
–
Instances of a type
–
Topics can be linked to other domains or other topics
Copyright  2008 by CEBT
6
Data Model (2)
Copyright  2008 by CEBT
7
Key Components
 A scalable Tuple Store
 An HTTP/JSON-Based API

MQL for read / write operations
 A Lightweight, Collaborative Typing System

Loose collection of structuring mechanisms and conventions
 A Large, Diverse Data Set

100 million asserts

4000 types
 A Philosophy of “Complete Normalization”

Only one GUID for a real world object
Copyright  2008 by CEBT
8
Data Entry
Copyright  2008 by CEBT
9
Schema Creation
Copyright  2008 by CEBT
10
Data Evaluation
Copyright  2008 by CEBT
11
{
Metaweb Query Language


"code" : "/api/status/ok",
"q1" : {
"code" : "/api/status/error",
"messages" : [
{
Metaweb
Query Language
"code" : "/api/status/error/mql/result",
:{
Who"info"
created
the comic character Spider-Man ?
"count" : 2,
"result" : [
"Steve Ditko",
"Stan Lee"
]
QUERY
},
"message"[: "Unique query may have at most one result. Got 2",
{
"path" : "character_created_by",
"query" : [ "character_created_by" : null,
"name" : "Spider-Man",
{
"type" : "/fictional_universe/fictional_character"
"character_created_by"
: null,
}
"error_inside" : "character_created_by",
"name" ]: "Spider-Man",
"type" : "/fictional_universe/fictional_character"
}
]
}
]
},
"status" : "200 OK",
"transaction_id" : "cache;cache01.p01.sjc1:8101;2008-11-11T05:54:45Z;0021"
}
Copyright  2008 by CEBT
12
MQL Queries
 Characters created by Stan Lee
 Foreign donations to 2008 US Political Candidates
 Nikon Cameras in order of Resolution
 Tropical Storms in the 90's
 Mountains of the Himalayas
 African American authors and their books
 Web Browsers that run on the Mac
 US cities named Canton
Copyright  2008 by CEBT
13
Applications
 Parallax: Freebase Browser
http://mqlx.com/~david/parallax/index.html
 Powerset: Semantic Search Engine
http://www.powerset.com/
 ArchiPortal
http://dev.mqlx.com/~zak/arch/
 Dipity Timelines
http://www.dipity.com/
Copyright  2008 by CEBT
14
Discussion
 Simple architecture
 Topics can be associated to multiple types
 Analogues to having a database of knowledge
 BUT, Now we have two Knowledge bases to maintain

Wikipedia

Freebase
Copyright  2008 by CEBT
15
References
 Freebase
http://www.freebase.com
 The Semantic Edge (Web 2.0 Summit 2007)
http://www.web2summit.com/cs/web2007/view/e_sess/15043
 MQL Query Editor
http://www.freebase.com/tools/queryeditor/
 Freebase Blog
http://blog.freebase.com/
 Freebase Sample Queries
http://www.freebase.com/view/freebase/freebase_query
Copyright  2008 by CEBT
16