S5-iptcsemwebstuartmylesx

Download Report

Transcript S5-iptcsemwebstuartmylesx

IPTC
Semantic Web
Working Group
Stuart Myles
Associated Press
7th March 2011
IPTC SemWeb Working Group
• Making use of Semantic Web technologies for news
– Leverage technologies developed by others
• The work falls into two main areas:
– Models
• Determine what aspects of news to represent
– Formats
• The details of how to represent the news
• Using RDFa, microformats, Linked Data
© 2011 IPTC (www.iptc.org)
All rights reserved
2
IPTC’s Semantic Web Agenda
At this meeting we will discuss
• News Metadata in HTML
• Introducing the rNews RDFa news vocabulary
• hNews Microformat
• For vote: rNews Draft 0.1
• Linked Data
• IPTC discussion with MINDS
• News Ontology
• Automated extraction of metadata
© 2011 IPTC (www.iptc.org)
All rights reserved
3
News Metadata in HTML
• One goal
– Promote the creation of better, more accurate tools for working
with news on the web
– By adding news metadata to HTML
• Two approaches
– rNews (RDFa) – draft to be voted on this meeting
– hNews (microformats) – draft agreed late 2009
• RDFa
– alter your HTML document to use non HTML attributes
– to insert RDF triples to indicate the meaning of markup
• Microformats
– conventions for using standard HTML
– to add labels to indicate the meaning of markup
© 2011 IPTC (www.iptc.org)
All rights reserved
4
An Example
rNews (RDFa) ByLine
hNews (Microformats) ByLine
The same meaning, different mechanisms
© 2011 IPTC (www.iptc.org)
All rights reserved
5
Introducing rNews
Presentation from Evan Sandhaus
© 2011 IPTC (www.iptc.org)
All rights reserved
6
hNews Metadata
hNews lets you add some machine-readble news-specific
metadata to display-ready HTML
• title
• source-org
• author
• dateline
• published
• geo
• updated
• item-license
• content / summary
• principles
• rel-tag
Large degree of overlap with rNews (not surprisingly!)
© 2010 IPTC (www.iptc.org)
All rights reserved
7
News Metadata in HTML
• The rNews / hNews differences are mainly technical
As a microformat, hNews uses POSH (plain old semantic HTML)
• No need to alter the doctype
• No need to add XML namespaces
• No non-HTML4 attributes: @about, @property, @typeof, @content
• RDFa keeps vocabulary and the expression mechanism
distinct
• Microformats require decisions about how to use existing
HTML to express any new piece of vocabulary
• Which to use is largely a matter of technical preferences
© 2011 IPTC (www.iptc.org)
All rights reserved
8
hNews Uptake
• hNews draft agreed in late 2009
http://microformats.org/wiki/hnews
• As of October 2010
–
–
–
–
About 1,200 sites that AP is aware of using hNews
No one needs to ask permission to use hNews
Roughly half being processed by AP News Registry
About 14,000 unique hNews items per day
© 2011 IPTC (www.iptc.org)
All rights reserved
9
Discussion with MINDS
• Presentation to MINDS about IPTC and Linked Data
• Questions about
– SemWeb technology details
– Creation and maintenance of taxonomies
– Autocategorization and entity extraction
• Discussion of potential inter-agency use cases
– Olympics London 2012
© 2011 IPTC (www.iptc.org)
All rights reserved
10
News Ontology
• News ontology = formal model for news
• Deciding what aspects of news to represent
– Finished publication?
– Internal workflow?
– Post publication information, such as reader comments?
• Expressing NewsML-G2 (the NAR) in OWL
• Various (somewhat overlapping) efforts, e.g. AFP, EBU
• Coordinating RDFa, news / sport ontology work?
© 2011 IPTC (www.iptc.org)
All rights reserved
11
Motion for a Vote
Motion to the Standards Committee:
To approve rNews Draft 0.1
as distributed in 20110221-DRAFT-rNews_0.1.zip
and
to launch the rNews Experimental Phase
© 2010 IPTC (www.iptc.org)
All rights reserved
12
Discussion of Next Steps
•
•
•
•
Next draft / final draft
Outreach for feedback and experimentation
Relationship with W3C
Marketing of rNews
– In comparison to NITF, NewsML 1, NewsML-G2, hNews…
© 2010 IPTC (www.iptc.org)
All rights reserved
13
Next Meeting
Time and Place of Next Meeting
2011 AGM
6th – 9th June 2011
Berlin, Germany
‫شكرا لكم وداعا‬
© 2011 IPTC (www.iptc.org)
All rights reserved
14