Screen Scraping

Download Report

Transcript Screen Scraping

Screen Scraping
MIS 424
Professor Sandvig
Today
What is Screen Scraping
 When to use it
 How
 Legal Issues

What is Screen Scraping

Programmatically “scraping” information
from a web page

Two steps:
Retrieve Page
Scrape desired information
1.
2.

Regular Expressions
When to Use

Data not available via more direct methods:

APIs




web services
Restful
RSS
database
When to Use

Examples

Search engines


News sites



Google, Bing, Yahoo, …
Google news, Yahoo news, …
PadMapper, MapCraigs
 Scrape Craigslist
Interface with Legacy Systems

No support for web services, RSS, etc.
How
Handout:
 ScreenScrape.aspx (source)


Scrape WWU Departmental Directory
Legal Issues
Potential to violate copyright laws
 History of lawsuits

Meta shopping sites
 Google

Legal Issues

MapCraigs.com
Scraped Craigslist real estate
 Displayed on Google maps
 Blocked IP

PadMapper vs. Craigslist lawsuit
 Use cautiously
