Transcript DEiXTo
DEiXTo
Powerful web data extraction tool
Freeware GUI tool (built with Turbo Delphi, Windows-only)
Free, cross-platform Command Line Executor (in Perl)
DEiXToBot agent (implemented in Perl)
W3C Document Object Model (DOM)
DOM-based extraction rules (wrappers).
Extracted data can be exported to a wide variety of formats (tab
delimited, XML, RSS, etc).
Command Line Executor:
has database support via the Database independent interface for Perl
supports additional formats: Excel, CSV, OpenDocument Spreadsheet
(.ods), HTML
GUI DEiXTo
user friendly graphical interface
enhanced, tree based, extraction rules
HTML tag filtering
fast, flexible and high performance tree
pattern matching algorithm
regular expression support
can follow "Next Page" links and submit
simple forms
can export results to XML and tab delimited
formats and create RSS feeds
XML encoded wrapper project files (.wpf) that
can be executed at will
last but not least, it's freeware!
DEiXTo Command Line Executor (CLE)
portable, efficient and fast command line executor of GUI DEiXTo generated wrappers
provides options and flexibility that you cannot get with GUI DEiXTo
supports additional output formats such as CSV, Excel and OpenDocument Spreadsheet
provides database support via DBI (the Database independent interface for Perl)
supports HTML output using an HTML template processor and an editable template file
overwrite, append and prepend output modes for all supported formats
can be scheduled to execute wrappers automatically (e.g. using cron in GNU/Linux)
it is free and open source, distributed under the GNU General Public License (GPL)
Version 3!
DEiXToBot
A Mechanize agent (essentially a browser emulator) capable of
extracting data of interest.
Flexible and efficient.
Allows extensive customization.
Supports multiple patterns on a single page and combination of
their results.
Allows post-processing of the extracted data and enables you to
transform it to any format you wish.
Programming skills required though to utilize it.
Corgialenios Library use case
From HTML unstructured data
To ESE format!
DEiXTo Services
We can definitely help you to:
transform the contents of your digital library into
OAI-PMH or another suitable format
quickly populate product catalogues with full
specifications
search various web resources in real time and extract
the results returned
prepare large, focused datasets for scientific tasks
(i.e. data mining)
monitor prices of the competition
<your extraction task goes here!>
Happy DEiXTo users!
For further information, please visit http://deixto.com