workshop_slides

Download Report

Transcript workshop_slides

A Web Services Search Engine
CS 8803 [AIA] - Spring 2008
Roland Krystian Alberciak
Piotr Kozikowski
Sudnya Padalikar
Tushar Sugandhi
Outline
• Project Overview
• Searching Web-services
o
o
Tools / APIs
How to figure out what information to show
• Results :Working prototype
o
Locate, classify, rank, and present web-services
• System Integration
o
Diversity!
 Languages (no joke): Python, Ruby on Rails, PHP,
C#, Java, Perl.
 Databases: MySQL, MSSQL
Project Overview
Step 1 - There are web-services available on the web
Step 2 - (Challanges)
Obstacles to find WS vs. web pages because:
 Effort to Register
 Directories disconnected
 No Clustering available
 No Ranking available
Step 3 - Profit
 Should be Beneficial for Web Developers
 Should be Beneficial for us
What is out there?
• Swoogle -“10,000 ontologies” (they are more
concerned with “semantic web” and “metadata”,
and not so much on web services)
• Programmableweb -726 (only APIs)
• "Yellow pages" - 5000 web-services
• XMethods - 500 web-services
• UDDI - Discontinued but was useful to many web
services to advertise themselves.
Survey of the Market-
We found solutions for Step 2!
Step 1. Have web-services available on the web
Step 2. (Solutions)
Crawler, database, web application and a
bunch of clustering algorithms and lots of "glue"
Step 3. Our proposed solution - Web Slogger!
- for us: content based advertising
- for users: easy way to search for web-services
System Architecture
Crawling
Yahoo!
Why not Google?
Restricted extraction: Could not extract many results
What about Alexa?
Couldn't afford it! :-)
What did we crawl for?
.wsdl and .asmx files
How is Webslogger different from the Yellow
Pages project (last year's class project)?
• Multiple Language support
Categorization and Clustering
Glossaries
• Hierarchical Categirization (27 Categories)
• List of keywords for each category (2800 keywords)
Web Service Partitioning By Importance
Some sections in web service are more important than othe r
e.g. Service Name / Operation Name is more important than
message type name.
Affinity Vector
• Weight assigned to each term in Webservice based on its
mapping with Glossary
• Determines which web service belongs to which category
Ranking Insight
Fundamental Difference: Web page ranking is based on
inlinks and outlinks. Web service ranking should be based on
objects and web methods.
Recall: Our results are extracts from search engines.
Therefore:
• We don't know how many pages link to a particular wsdl file.
• Search engine algorithms [ie. PageRank] have this data and
can assert 'popularity', 'credibility' of hubs which locate
sources.
Resolution: We must find alternate ways to rank
content
Ranking Options
1. Community Level: Collaborative Ranking:
• users can leave comments,
• Likert scale ranking
• rank good users / bad users in the community: experts
2. User Level: Usage statistic ranking:
• how long you view a wsdl
• do you go back to look at it again [since it is like an API...]
• inquire about what wsdl files they used to achieve a goal
Ranking Options ..contd
3. Use Page Ranking provided by Google / Yahoo
4. File Level: Quality of file:
• "Do You Care if Your WSDL is W3C Compliant?"
o Good format, thoroughness. Heuristics on model files.
5. Generate referral chain from WSDL
o Understand citation network in order to determine
valuable web services
o Web services often use methods / objects from other web
services. Use this linking to rank web services.
<?xml version="1.0"?>
<definitions name="StockQuote"
targetNamespace="http://example.com/stockquote.wsdl"
xmlns:tns="http://example.com/stockquote.wsdl"
xmlns:xsd1="http://example.com/stockquote.xsd"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns="http://schemas.xmlsoap.org/wsdl/">
<message name="SubscribeToQuotes">
...
element="xsd1:SubscriptionHeader"/>
</message>
<portType name="StockQuotePortType">
<operation name="SubscribeToQuotes">
...
</operation>
</portType>
www.wbslogger.com
Future work
• Develop our own crawler
• Further improve clustering
(there is always room for that!)
• Figure out an innovative (&& effective) way for
ranking
• Location based clustering
Questions ?