Transcript DEiXTo

DEiXTo
 Powerful web data extraction tool
 Freeware GUI tool (built with Turbo Delphi, Windows-only)
 Free, cross-platform Command Line Executor (in Perl)
 DEiXToBot agent (implemented in Perl)
 W3C Document Object Model (DOM)
 DOM-based extraction rules (wrappers).
 Extracted data can be exported to a wide variety of formats (tab
delimited, XML, RSS, etc).
 Command Line Executor:


has database support via the Database independent interface for Perl
supports additional formats: Excel, CSV, OpenDocument Spreadsheet
(.ods), HTML
GUI DEiXTo

user friendly graphical interface

enhanced, tree based, extraction rules

HTML tag filtering

fast, flexible and high performance tree
pattern matching algorithm

regular expression support

can follow "Next Page" links and submit
simple forms

can export results to XML and tab delimited
formats and create RSS feeds

XML encoded wrapper project files (.wpf) that
can be executed at will

last but not least, it's freeware!
DEiXTo Command Line Executor (CLE)
 portable, efficient and fast command line executor of GUI DEiXTo generated wrappers
 provides options and flexibility that you cannot get with GUI DEiXTo
 supports additional output formats such as CSV, Excel and OpenDocument Spreadsheet
 provides database support via DBI (the Database independent interface for Perl)
 supports HTML output using an HTML template processor and an editable template file
 overwrite, append and prepend output modes for all supported formats
 can be scheduled to execute wrappers automatically (e.g. using cron in GNU/Linux)
 it is free and open source, distributed under the GNU General Public License (GPL)
Version 3!
DEiXToBot
 A Mechanize agent (essentially a browser emulator) capable of
extracting data of interest.
 Flexible and efficient.
 Allows extensive customization.
 Supports multiple patterns on a single page and combination of
their results.
 Allows post-processing of the extracted data and enables you to
transform it to any format you wish.
 Programming skills required though to utilize it.
Corgialenios Library use case
From HTML unstructured data
To ESE format!
DEiXTo Services
 We can definitely help you to:
 transform the contents of your digital library into
OAI-PMH or another suitable format
 quickly populate product catalogues with full
specifications
 search various web resources in real time and extract
the results returned
 prepare large, focused datasets for scientific tasks
(i.e. data mining)
 monitor prices of the competition
 <your extraction task goes here!>
Happy DEiXTo users!
For further information, please visit http://deixto.com