The presentation.

Download Report

Transcript The presentation.

The Web of Data
emerging industries
Michalis Vafopoulos,
vafopoulos.org
2014
Creative Commons License
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License.
Contents
①The Web of documents vs. Web of data
– Some technology
– Some economics
– ..and action
② PSNET project
③and more…
3
The Data trilogy
① Open: access
everyone to use and republish
② Big: scale
high volume, velocity and variety
③ Linked: use
publish once, use as many times
The Web of Documents
• Simple, big and unstructured
• Organized in Silos
But humans:
• are interested in Things,
no documents
& these Things might be in docs or
elsewhere
• Limited capacity to extract
meaning...
5
The Web of Data
• Analogy: a global file system ----> global database
• Designed for: human consumption ->machines first, humans later
• Primary objects: documents --> things (or descriptions of things)
• Links between: documents --> things
• Degree of structure in objects: fairly low ---> high
• Semantics of content and links: implicit --> explicit
(Tom Heath)
6
The Web of Data: why?
 encourages reuse
 reduces redundancy
 maximizes its (real and potential) interconnectedness
 enables network effects to add value to
data
7
The Web of Data: how?
– current state on the Web
• Relational Databases
• APIs
• XML
• CSV
• XLS
Computers can’t consume data because:
• Different formats & models
• Not inter-connected
8
The Web of Data: how?
– we need to create a standard way of
publishing Data on the Web (like HTML for
docs)
This is the Resource Description
Framework (RDF)
(a simple example here from Juan F. Sequeda), more next
semester!)
9
Resource Description Framework (RDF)
• A data model
– A way to model data
– Inspired form Relational databases and Logic
• RDF is a triple data model
• Labeled Graph (semantic networks)
• Subject, Predicate, Object
<Isidoro> <was born in> <Chios>
<Chios> <is part of> <Greece>
Example: Document on the Web
Databases back up documents
THINGS have PROPERTIES:
A Book as a Title, an author, …
Isbn
Title
Author
PublisherID ReleasedData
978-0-59615381-6
Programming Toby
the Semantic Segaran
Web
1
July 2009
…
…
…
…
…
This is a THING:
A book title “Programming the
Semantic Web” by Toby Segaran,
…
PublisherID
PublisherNa
me
1
O’Reilly
Media
…
…
Data representation in RDF
Isbn
Title
Author
PublisherI
D
ReleasedDat
a
978-059615381
-6
Programming
the Semantic
Web
Toby
Segaran
1
July 2009
PublisherI
D
PublisherName
1
O’Reilly Media
Programming the
Semantic Web
title
book
author
Toby
Segaran
isbn
978-0-596-15381-6
publisher
Publishe
r
name
O’Reilly
Everything on the web is
identified by a URI!
link the data to other data
Programming the
Semantic Web
title
http://…/i
sbn978
author
Toby
Segaran
isbn
978-0-596-15381-6
publisher
http://…/p
ublisher1
name
O’Reilly
consider the data from Revyu.com
http://…/r
eview1
hasReview
http://…/is
bn978
description
reviewer
Awesome
Book
http://…/r
eviewer
name
Juan
Sequeda
start to link data
http://…/r
eview1
hasReview
http://…/is
bn978
description
sameAs
hasReviewer
Awesome
Book
http://…/r
eviewer
Programming the
Semantic Web
title
http://…/is
bn978
name
author
Toby Segaran
isbn
978-0-596-15381-6
Juan
Sequeda
publisher
http://…/pu
blisher1
name
O’Reilly
Juan Sequeda publishes data too
http://juanse
queda.com/i
d
livesIn
name
http://dbpedia.org/Austin
Juan Sequeda
Let’s link more data
http://…/r
eview1
hasReview
http://…/is
bn978
description
hasReviewer
Awesome
Book
http://…/r
eviewer
sameAs
http://juanse
queda.com/i
d
name
Juan
Sequeda
livesIn
name
http://dbpedia.org/Austin
Juan Sequeda
Linked data = internet + http + RDF
http://…/
review1
hasReview http://…/i
sbn978
description
hasReviewer
Programming the
Semantic Web
title
sameAs
Awesom
e Book
http://…/
reviewer
http://…/i
sbn978
name
author
Toby
Segaran
isbn
978-0-596-15381-6
sameAs
http://juanse
queda.com/i
d
Juan
Sequeda
livesIn
name
publisher
http://…/p
ublisher1
name
http://dbpedia.org/Austin
Juan Sequeda
O’Reilly
Linked data = internet
+ http + RDF
Linked Data Principles
1. Use URIs as names for things
2. Use URIs so that people can look up
(dereference) those names.
3. When someone looks up a URI,
provide useful information.
4. Include links to other URIs so that
they can discover more things.
Web as a database
Linked Data makes the web exploitable
as ONE GIANT HUGE GLOBAL
DATABASE!
Is there any query language like SQL?
SPARQL…
Is it working?
• Current Employee Names, Salaries, and
Position Titles
• The Open Database Of The Corporate
World
• Crime map
• NHS efficiency savings: the role of
prescribing analytics
• where public money goes worldwide
Examples
Can you find the famous persons born in
Beirut before 1900?
Or if the Greek Government buys sperm?
Examples
#anoixtigenia, @vafopoulos
Examples
#anoixtigenia, @vafopoulos
May 2007
What is a Linked Data
application/service?
Software system that makes use of
data on the Web from multiple
datasets and that benefits from
links between the datasets
Characteristics of Linked Data
Applications
• Consume data that is published on the web
following the Linked Data principles: an application
should be able to request, retrieve and process the
accessed data
• Discover further information by following the links
between different data sources: the fourth principle
enables this.
• Combine the consumed linked data with data from
sources (not necessarily Linked Data)
• Expose the combined data back to the web
following the Linked Data principles
• Offer value to end-users
the 5 stars of open linked data
★make your stuff available on the Web (whatever
format)
★★make it available as structured data (e.g. excel
instead of image scan of a table)
★★★non-proprietary format (e.g. csv instead of
excel)
★★★★use URLs to identify things, so that people
can point at your stuff
★★★★★link your data to other people’s data to
provide context
http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
Two magics of Web Science:
the case of Linked Data
The (practical) question
contextualized & hands-on experience in
Semantic Web & Business 3.0 on
a unique, fast evolving and semantified
dataset
35
PSNET project: the answer
The first attempt to generate, curate,
interlink and distribute daily updated public
spending data in LOD formats that can be
useful to both expert (i.e. scientists and
professionals) and naïve users.
36
The context first…
37
Research question
Web economy: from potential to
actual
Enable new virtuous cycles
in the economy
through Linked Open Data
38
EU Unification: the institutions
Best in theory – poor in practice
a (complicated) market example
• monetary policy, currency, eurozone
• European Single Market
• fiscal policy FORTHCOMING
39
EU Unification: the technology
Linked Data or Web of data
• “publish once, use many times”.
• different consumers extract different
slices of the data for different
purposes
• publish in context:
value & “meaning”
40
EU Unification: the technology
• Linked Data (LD) + Open Data =LOD
• Economic LOD as “data currency”
41
Why LOD?
• Transparency & innovation
Network effects: enabling users to
• bidirectional & massively processable
interconnections among data
• re-using the existing infrastructure in
the government and business spheres
42
Economic LOD: the story so far
• Isolated/fragmented behind
technological & institutional barriers
• General statistics: Eurostat etc.
• LOD2 case
• Some isolated projects
43
budget
remix
users
tenders
business
information
LOD graph
analyze
spending
prices
Follow public
money all the way
Economic LOD: use cases
• Business applications on top
• Users: citizens, gov., EU, business
• track the life-cycle of every financial flow:
evaluate budget allocation, tenders,
spending and their efficiency
• pre-allocate resources on provisional
public works
• receive & submit information in real-time
45
Economic LOD: engineering
46
Government Budget
• heterogeneous repositories & methods (mainly PDF)
47
Tenders
• Closed data in HTML
• Public Contracts Ontology (PCO), e.g.
– pco:Contract and pco:AwardCriterion
• Common Procurement Vocubulary
• now working on linking our ontology to:
– Payments Ontology
– GoodRelations
– FOAF
48
Spending
•
•
•
•
•
•
most dynamic & open part
increasing number of countries/cities
raw & structured data
leader: the Greek Clarity project
spending decisions ex-ante to execution
Actually every decision
49
Business Information
• Registries: mainly closed
• Key standards
– Classification of Products by Activity (CPA)
– eXtensible Business Reporting Language (XBRL)
CHECK OD BAROMETER – OD INDEX
50
Business Information
51
The Transparency program in
Greece (2010-2014)
o A revolution in open government
o ex-ante reporting of every state
decision
o paradigm shift for 40K public servants
52
The Transparency program in
Greece
o manifests the value of
procrastination principle (again)
o strong rival to the Clientelistic state
o The new version under beta testing
(delivery: in 10 days!)
53
publicspending.net
2011: I believed that the Transparency
program is the open data “gold” (&
persuaded 7 more people)
54
publicspending.net
2012: …with some dust and rocks in a
deep goldmine
55
2013: time to chisel some jewelry
2014: open data everywhere
56
Why public spending LOD
o more & better information
o objective and processable information for
economic/political “dialogue”
• to promote competition
• to decrease cost
• to judge the efficiency of policy mixtures
• to enable participation
57
LOD in Greece
•
•
•
•
in its infancy – few Apps yet
2-3 stars
Open not Linked
limited public awareness
58
LOD in Greece: why it is important
• quality of information during economic
crisis
• transparency & efficiency in funding
development
59
Issues
o how can we initiate the virtuous cycle
of creation?
demonstrate LOD’s added value
o how to get the most out of data?
local & global interconnections
60
In few words,
Apps, Apps, Apps…..
61
Indexing, searching, global
comparisons
Indexing, searching, global
comparisons
Indexing, searching, global
comparisons
Indexing, searching, global
comparisons
Interlinking in global scale
Interlinking in global scale
The future of the Web
• Data.gov: a paradigm shift
• Policy challenges are related to data
• Freedom, Privacy, Creativity
Policy framework
Personal grid
workspace (g-work)
for every citizen
① Processing power
② Storage
③ Network access
④ Online data & services
⑤ Privacy
New analysis: Web science
• a trans-disciplinary field
– Web as its primary object of study
– Web= techno-social artifact
• positive or negative?
Transformative!
3/18
Web science
The envelope question
what technological and other
changes need to be made in order
for the Web to work better for
more people?
3/18
The Web as a social machine
Being protected by
digitizing
73
…challenges the basic aspects
of human nature:
o
o
o
o
o
o
Technology
Body
Moral Values
Sociality
Generations
Economy
Humanizing the Web
Webizing Humanity
Successful business &
science facilitate this
dialogue
Not only answers but
make the questions more
concrete
Global initiatives
•
•
•
•
OGP how it works
GIFT
IBP - OBS Tracker
Web index
76
Let us talk about projects
77
References
• Weaving the Economic Linked Open Data
• The Web Economy: Goods, Users, Models, and
Policies
• Public Spending: Interconnecting and
Visualizing Greek Public Expenditure Following
Linked Open Data Directives
• A Framework for Linked Data Business Models
78