presentation source

Download Report

Transcript presentation source

New Ways of Querying the Web
by
Eliahu Brodsky
and
Alina Blizhovsky
1
Simple Querying
• A search engine looks for the word (or the
words) that a document contains.
• A search engine looks for a Web document
which contains the word.
2
Querying structured data
• A data on the Web may be structured (e.g.
books catalog).
• A “structure” means schema.
• The schema may not be rigid (semistructured data).
• More complex queries may be executed.
3
CGI
• Advantage
– Uses the existing DBMS (e.g. relational).
• Disadvantage
– Problems on integrating a data from the
different Web sources.
4
XML
(Extensible Markup Language)
• A subset of SGML
• Benefits
– Arbitrary extension of a document’s tags and
attributes.
– Support for documents with complex structure.
– Validation of documents structure (with respect
to an optional Document Type Descriptor).
5
Example of XML data
<book year=“1995”>
<title> Database Systems </title>
<author><name> Date </name></author>
<publisher> Addison-Wesley </publisher>
</book>
<book year=“1998”>
<publisher> The Math Works </publisher>
<title> MATLAB </title>
</book>
6
Example of Document Type
Descriptor (DTD)
<!ELEMENT book
(title,author?,publisher)>
<!ATTLIST book year CDATA>
<!ELEMENT author (name)>
7
Semi-structured Data Model
• Non-rigid schema
• Object Exchange Model (OEM)
• Data represented by a graph.
8
Example of XML data
<book year=“1995”>
<title> Database Systems </title>
<author><name> Date </name></author>
<publisher> Addison-Wesley </publisher>
</book>
<book year=“1998”>
<publisher> The Math Works </publisher>
<title> MATLAB </title>
</book>
9
book
book
(year=“1995”)
(year=“1998”)
author
title
title
publisher
publisher
name
Database
Systems
AddisonWesley
Date
MATLAB
The
Math
Works
10
Example of XML data
<book year=“1995” id=“o100”>
<title> Database Systems </title>
<author><name> Date </name></author>
<publisher> Addison-Wesley </publisher>
</book>
<book year=“1998” related=“o100”>
<publisher> The Math Works </publisher>
<title> MATLAB </title>
</book>
11
book
book
related
(year=“1995”)
(year=“1998”)
author
title
title
publisher
publisher
name
Database
Systems
AddisonWesley
Date
MATLAB
The
Math
Works
12
XML-QL
• Extracts data from large XML documents.
• Integrates XML data from multiple sources.
• Translates XML data between different
DTD.
• Processes a request by
– sending queries to XML sources, or by
– transporting large amounts of XML data to
clients.
13
Example of XML-QL
WHERE
<book>
<publisher> Addison-Wesley </>
<title> $t </title>
</book> IN “www.a.b.c/books.xml”
CONSTRUCT <result><title> $t </></>
14
Example of XML data
<book year=“1995” id=“o100”>
<title> Database Systems </title>
<author><name> Date </name></author>
<publisher> Addison-Wesley </publisher>
</book>
<book year=“1998” related=“o100”>
<publisher> The Math Works </publisher>
<title> MATLAB </title>
</book>
15
Result of the query
<result>
<title> Database Systems </title>
</result>
16
WHERE
<book> <publisher> Addison-Wesley </>
<author> $a1 </>
</> IN “www.a.b.c/books1.xml”,
<book> <publisher>
<name> The Math Works </>
</>
<author> $a2 </>
</> IN “www.d.e.f/books2.xml”,
$a1 = $a2
CONSTRUCT <author> $a1 </>
17
Regular Path Expressions
• Permitted wherever XML permits an
element.
• Provide:
– alternation ( | )
– concatenation ( . )
– Kleene-star operators ( * )
18
Example of a regular path
expression
WHERE
<part+.(subpart | component.piece)>
$r
</> IN “www.a.b.c/parts.xml”
CONSTRUCT <result> $r </>
19
<part><subpart> $r </></>
<part><part><component><piece>$r</></></></>
<part><part><subpart> $r </></></>
.
.
.
20
XQL
• Is designed specifically for XML
documents.
• Provides a simple syntax (patterns modeled
after directory notation).
• Expressed in strings that can be embedded
in programs, scripts, and XML or HTML
attributes.
21
The Result of XQL Query
• Depends on implementation. One of the
following:
– XML document.
– A tree that can be fed back in to XQL.
– Different type of structure (e.g. set of pointers
to nodes).
22
Search Context
• Is the set of nodes against which a query
operates.
• The “root context” and the “current
context”:
• / use the “root context”
• . / use the “current context” explicitly
23
Example of an XQL query
./book[@style = /bookstore/@specialty]
book[@style = /bookstore/@specialty]
Find all books where the value of style attribute of
the book is equal to the value of the specialty
attribute of the bookstore element at the root of the
XML document.
24
Additional examples
author[lastname = ‘Bob’]
Find all author elements whose last name
sub element is Bob.
author[. = ‘Bob’]
Find all author elements whose value is Bob.
25
Regular path expressions in XQL
• bookstore//title
Find all title elements, one or more levels
deep in the bookstore.
• bookstore/*/title
Find all title elements that are grandchildren
of bookstore elements.
26
Indices in XQL
<x>
<y>
<y>
</x>
<x>
<y>
<y>
</x>
Text1 </y>
Text2 </y>
x/y[0]
Text1,Text3
(x/y)[3]
Text4
Text3 </y>
Text4 </y>
x[1]/y[0]
Text3
27
XML-QL vs. XQL
• XQL may easily be embedded into
programs, scripts, XML and HTML tags.
• XQL assume the user understand XML
document as a graph.
• XML-QL provides construction of new
complicated XML documents.
• XML-QL provides XML-like patterns.
28