Google Spelling Suggestion - OpenSiteSearch

Download Report

Transcript Google Spelling Suggestion - OpenSiteSearch

Implementing Search Spelling Suggestions using
the Google Web Services API
Dave Costakos
Software Developer, Systems Engineering Division
May 2nd, 2002
Agenda
• Overview of the Google Web Service API
• Overview of SOAP and related
technologies
• Description of Google’s Web Service
API
• Description of how Google’s Spelling
Suggestion service was integrated.
• Explanation of how it all works
• Example integration walkthrough
Google’s Web Service API
• A BETA web program that enables developers to
access Google services via SOAP.
• Allows clients to connect to Google and use their
search, cached page and spelling suggestion
software inside the client application
• Available for non-commercial use from:
http://www.google.com/apis/
• It can be used commercially with written
permission from Google.
• Limited to 1,000 accesses per day. May allow
more accesses at a later date for a commercial
fee.
• Provides a client API in Java and .NET but any
language could be used to access the services.
Web Services Overview
• A Web Service is a piece of business
logic located somewhere on the
internet accessible through a
standardized XML messaging system.
• Because Web Services use XML, they
are not tied to any specific platform or
Operating System.
• Very similar to how the library
community standardized and shares
information via the Z39.50 protocol.
• A main differences between Z39.50
and Web Services are cross-industry
support and the use of XML.
SOAP Overview
• Simple Object Access Protocol (SOAP)
• Provides a standard packaging
structure for transporting XML
messages over many standard
protocols like HTTP, SMTP or FTP.
• This standard transport mechanisms
allows heterogeneous clients and
servers to be interoperable.
• The Cornerstone protocol of Web
Services.
SOAP Overview
Example Envelope
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<ns1:doSpellingSuggestion xmlns:ns1="urn:GoogleSearch"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<key xsi:type="xsd:string">00000000000000000000000000000000</key>
<phrase xsi:type="xsd:string">britney speers</phrase>
</ns1:doSpellingSuggestion>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Web Services Description Language
• WSDL (Web Services Description
Language) – describes the interface to a
web service in a standardized way.
• Tools are available that take a WSDL file
and turn it into a Web Service client or
take Web Service code and turn it into a
WSDL file.
• Google’s WSDL:
http://api.google.com/GoogleSearch.wsdl
Universal Description, Discovery,
Integration: UDDI
• UDDI (Universal Description,
Discovery and Integration) – provides
a registry for web services for
advertisement, discovery and
integration
• WSDL files can be published in UDDI
registries making them available to
the general public for discovery. A
shopping Mall for computers
• ebXML is on the horizon
Web Service Tools
• Most programming languages now provide
tools to make using and programming Web
Services simple and easy.
• Some SOAP tools:
• IBM SOAP4J / Web Services Toolkit
• Sun JWSDP (Java Web Services
Developer Pack)
• Apache SOAP / Axis
• Perl SOAP::Lite (Perl)
• Microsoft SOAP Toolkit (.NET)
Web Service Work Flow
SOAP Request
Web Service
Client
Lookup Service
SOAP Response
UDDI
Registry
Web Service
Server
Publish WSDL
Integrating the Google API with
SiteSearch
Requirements
• A license key obtained from
Google
• Conformance to the Google
license agreement
• Limited to 1,000 accesses per day
• The googleapi.jar file
• Java 2 (1.2.2 or better)
Integrating the Google API with
SiteSearch
Files Touched
• Java Files:
GoogleSpellingSuggestion.java,
QUERY.java, ZServer.java
• ini/servers/ZBase.ini (ZBase_rb.ini if
you have record builder)
• HTML: resultsnav.html, nfsort.html,
nfbrief.html, nffull.html,nfrefine.html
• Scripts: ssmgr.HOSTNAME file (added
googleapi.jar file to CLASSPATH
Integrating the Google API with
SiteSearch
How it Works
• GoogleSpellingSuggestion handles the
details of obtaining the spelling
suggestions from Google and caching
results (we have a limited number of
accesses available)
• QUERY takes the client query, extracts the
entered term, obtains the suggestion and
loads it into the user data for display
• ZServer: Loads the key into memory from
the configuration file
• HTML: displays search suggestion stored
in user data
Integrating the Google API with
SiteSearch
User
Integrating the Google API with
SiteSearch
Search
User
WebZ
Integrating the Google API with
SiteSearch
Z39.50
Request
Search
User
WebZ
JaSSI
Integrating the Google API with
SiteSearch
Z39.50
Request
Search
User
WebZ
JaSSI
Search
ZBase
Integrating the Google API with
SiteSearch
Z39.50
Request
Search
User
JaSSI
Search
WebZ
ZBase
SOAP
Response
Google
SOAP
Request
Integrating the Google API with
SiteSearch
Z39.50
Request
Search
User
WebZ
JaSSI
Search
Spelling
Suggestion
ZBase
SOAP
Response
Google
SOAP
Request
Integrating the Google API with
SiteSearch
JaSSI
Z39.50
Request
Search
HTML
Search
User
HTML
WebZ
Spelling
Suggestion
ZBase
SOAP
Response
Google
SOAP
Request
Example Screen Implementation
Where do I get this Enhancement?
The SiteSearch Open Source Server
http://www.sitesearch.oclc.org/projects/spelling/
Possible Improvements?
• Caching: The
GoogleSpellingSuggestion object
maintains an internal cache of
suggestions. However, the cache was
never performance tested and it is
unclear how much good it will do in a
production environment.
• API Usage: The
GoogleSpellingSuggestion object uses
the standard Google classes provided
to make SOAP calls. Users may be
able to improve upon these APIs in
terms of performance.
Where to Get Software
Apache SOAP: http://xml.apache.org/soap/
Apache Axis: http://xml.apache.org/axis/
JWSDP (Java Web Services Developer Pack):
http://java.sun.com/webservices/webservicespack.html
SOAP::Lite for Perl: http://www.soaplite.com/
IBM WSTK (Web Services Toolkit):
http://www.alphaworks.ibm.com/tech/webservicestoolkit
PHP Soap Toolkit: http://sourceforge.net/projects/phpxmlp/
MS SOAP: Buy it from Microsoft
Questions?