Matthew Helmbrecht

Download Report

Transcript Matthew Helmbrecht

Matthew Helmbrecht
http://mjhelmb.appspot.com
Crawler 
ThreadPool 
Search
Servlet
Map parsed
content to
Hashmaps
Aggregated
reduce over all
threads
Per thread
content parser
Database
Create Flatfiles
for use by GAE
Web Crawler
Response to search query
WelcomeJSP
Enter link,
search term (if
needed),
depth, and
Single/
Multithreaded
run.
Servlet
Response
Give
information on
crawler
runtime, db /
file time.
Servlet
Response
Give
information on
crawler
runtime, db /
file time.
Word Count from localhost
Servlet
Map parsed
content to
Hashmaps
Aggregated
reduce over all
threads
Response to search query
WelcomeJSP
Enter Search
Term
GetWCCount
Class
computes the
word count
Flat Files
generated by
WebCrawler