Master Thesis in Web Archiving

Download Report

Transcript Master Thesis in Web Archiving

iCrawl – Master Thesis and Hiwi Jobs
Context
- iCrawl Project – A novel approach for the creation of high
quality Web Archives
- Easy to use and extensible Web
archive crawler framework
- Usable also by non-technicians
User Interface
- Key Component to interact with
the crawler
- Setting up crawls
- Maintaining and monitoring crawls
- Quality assurance of crawls
Thomas Risse
21/07/15
1
Master Thesis: Crawl Specification Wizard
Problem Statement
- Quality of a Web Archive depends on the quality of the Crawl specification
- Crawl specification for focused crawls are complex and hard to define (Initial Starting
points, good descriptions of terms, entities, etc.)
- Crawl specification are similar to search engine queries but more complex
Aim of the Master Thesis
- Development of an semi-automatic tool that learns the intention of a crawl
-
-
Based on a set of reference pages
or on search engine results
Iterative and interactive process
Requires analysis and extraction of information from Web pages
Requirements
- Interest in doing cool things in the context of a research project
- A “feeling” for good design and user friendliness
- Programming skills in Java
Contact: Thomas Risse (L3S), [email protected]
Thomas Risse
21/07/15
2
Master Thesis: Entity-centric Linked Data Crawler
Topic
- Development of an entity-centric Linked Data crawler
- Automatic collection of metadata for Linked Data sources to
enable crawler prioritization
- Integration of the crawler with the iCrawl platform for
integrated crawling of Web pages and Linked Data
Requirements
- Good grades in the IR-related courses
- Good programming skills in Java
- Interest in research-oriented projects
Contact: Elena Demidova, [email protected]
Thomas Risse
21/07/15
3
Hiwi Job in the context of Web Archiving
Topic
- User Interface development for setup, maintaining and
monitoring of crawls
- Easy to use (also for non-computer scientists)
- Near-real-time information
Requirements
- Interest in doing cool things in the context
of a research project
- A “feeling” for good design and
user friendliness
- Programming skills in Java
Contact: Thomas Risse (L3S), [email protected]
Thomas Risse
21/07/15
4