Transcript SC17
Freebase: A Collaboratively Created Graph
Database For Structuring Human Knowledge
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor
Metaweb Technologies, Inc.
San Francisco
International Conference on Management of Data (2008)
2008. 11. 12.
Center for E-Business Technology
Seoul National University
Seoul, Korea
Summarized & presented by Babar Tareen, IDS Lab., Seoul National University
Motivation – Wikipedia
Free multilingual encyclopedia
Supports 264 languages
854 Volumes of English articles
Copyright 2008 by CEBT
2
Motivation – English Wikipedia Growth
Copyright 2008 by CEBT
3
Introduction
A public repository of world’s knowledge
Inspired by The Semantic Web and Wikipedia
Supports highly diverse and heterogeneous data
Tries to merge the scalability of structured databases with the
diversity of collaborative wikis into a practical, scalable,
database of structured general human knowledge
The information contained in Freebase is open to anyone
However, Freebase backend database is not open
Copyright 2008 by CEBT
4
Data Sources
User Contribution
Metaweb Bots
Incorporates facts from many large, publicly available
information sources
Copyright 2008 by CEBT
5
Data Model
Freebase is a graph database
Set of nodes and a set of links that establish relationships between the nodes
Key Concepts
Domains
–
Bases: collections of topics created by users
–
Commons: similar to bases but more general
–
Film, Religion, Computers
Types
–
Analogues to classes
–
Film Actor, Film Festival, Film Distribution, Film Rating, Film Format
Properties
–
Specific information elements within a type
–
Film Performances, Film Dubbing Performances, IMDb Entry
Topics
–
Analogues to objects
–
Instances of a type
–
Topics can be linked to other domains or other topics
Copyright 2008 by CEBT
6
Data Model (2)
Copyright 2008 by CEBT
7
Key Components
A scalable Tuple Store
An HTTP/JSON-Based API
MQL for read / write operations
A Lightweight, Collaborative Typing System
Loose collection of structuring mechanisms and conventions
A Large, Diverse Data Set
100 million asserts
4000 types
A Philosophy of “Complete Normalization”
Only one GUID for a real world object
Copyright 2008 by CEBT
8
Data Entry
Copyright 2008 by CEBT
9
Schema Creation
Copyright 2008 by CEBT
10
Data Evaluation
Copyright 2008 by CEBT
11
{
Metaweb Query Language
"code" : "/api/status/ok",
"q1" : {
"code" : "/api/status/error",
"messages" : [
{
Metaweb
Query Language
"code" : "/api/status/error/mql/result",
:{
Who"info"
created
the comic character Spider-Man ?
"count" : 2,
"result" : [
"Steve Ditko",
"Stan Lee"
]
QUERY
},
"message"[: "Unique query may have at most one result. Got 2",
{
"path" : "character_created_by",
"query" : [ "character_created_by" : null,
"name" : "Spider-Man",
{
"type" : "/fictional_universe/fictional_character"
"character_created_by"
: null,
}
"error_inside" : "character_created_by",
"name" ]: "Spider-Man",
"type" : "/fictional_universe/fictional_character"
}
]
}
]
},
"status" : "200 OK",
"transaction_id" : "cache;cache01.p01.sjc1:8101;2008-11-11T05:54:45Z;0021"
}
Copyright 2008 by CEBT
12
MQL Queries
Characters created by Stan Lee
Foreign donations to 2008 US Political Candidates
Nikon Cameras in order of Resolution
Tropical Storms in the 90's
Mountains of the Himalayas
African American authors and their books
Web Browsers that run on the Mac
US cities named Canton
Copyright 2008 by CEBT
13
Applications
Parallax: Freebase Browser
http://mqlx.com/~david/parallax/index.html
Powerset: Semantic Search Engine
http://www.powerset.com/
ArchiPortal
http://dev.mqlx.com/~zak/arch/
Dipity Timelines
http://www.dipity.com/
Copyright 2008 by CEBT
14
Discussion
Simple architecture
Topics can be associated to multiple types
Analogues to having a database of knowledge
BUT, Now we have two Knowledge bases to maintain
Wikipedia
Freebase
Copyright 2008 by CEBT
15
References
Freebase
http://www.freebase.com
The Semantic Edge (Web 2.0 Summit 2007)
http://www.web2summit.com/cs/web2007/view/e_sess/15043
MQL Query Editor
http://www.freebase.com/tools/queryeditor/
Freebase Blog
http://blog.freebase.com/
Freebase Sample Queries
http://www.freebase.com/view/freebase/freebase_query
Copyright 2008 by CEBT
16