Screen Scraping
Download
Report
Transcript Screen Scraping
Screen Scraping
MIS 424
Professor Sandvig
Today
What is Screen Scraping
When to use it
How
Legal Issues
What is Screen Scraping
Programmatically “scraping” information
from a web page
Two steps:
Retrieve Page
Scrape desired information
1.
2.
Regular Expressions
When to Use
Data not available via more direct methods:
APIs
web services
Restful
RSS
database
When to Use
Examples
Search engines
News sites
Google, Bing, Yahoo, …
Google news, Yahoo news, …
PadMapper, MapCraigs
Scrape Craigslist
Interface with Legacy Systems
No support for web services, RSS, etc.
How
Handout:
ScreenScrape.aspx (source)
Scrape WWU Departmental Directory
Legal Issues
Potential to violate copyright laws
History of lawsuits
Meta shopping sites
Google
Legal Issues
MapCraigs.com
Scraped Craigslist real estate
Displayed on Google maps
Blocked IP
PadMapper vs. Craigslist lawsuit
Use cautiously