The Caught and Coloured website

Download Report

Transcript The Caught and Coloured website

The Caught and Coloured
website:
its EMu origins
Alex Chubaty – Collection Information Systems
Craig Churchill – IT Software Development
Museum Victoria
www.museum.vic.gov.au/caughtandcoloured
2
Background
• Documents the illustration of fauna in colonial
Victoria in The Prodromus of the Zoology of
Victoria
• Collection of artwork and manuscripts held in
MV Archives
• Website managed by MV Online Publishing
team
3
Basic Data structure
• Data used in website collected initially for
purposes of collection management
• Two kinds items catalogued
• Parent/Child structure of records
4
Parent record
Child records
5
Data collection
• Complex data set capturing information relating
to science and art
• Used Catalogue, Parties, Bibliography,
Taxonomy, Collection Events & Sites, Multimedia
(MMR) and later, Narratives modules
• Partitioning/tab switching
• Early data recorded first in spreadsheet then
transferred to EMu
6
7
8
EMu records and relationship with
website
• Data and images collected in EMu used in
‘Collection’ section of website
• Searchable under headings or groupings of
types of fauna
• Once a faunal group is selected individual
species as represented in drawings, prints and
notes can be browsed
9
10
11
Additional data linked to Catalogue
•
Some data added to MMR records used
in website:
1. Title field = Caption
2. Metadata tab = alt tag
12
13
Additional data linked to Catalogue
•
Other types of data added to Narratives
module and linked to Catalogue records:
1.
2.
3.
Narrative about the faunal group
McCoy’s description of species in the Prodromus
Kate Phillips’ description of species from Melbourne’s
Wildlife
•
Numbers 2 & 3 flagged in Narratives Identifier
field
Number 1 has relevant Catalogue records
attached
•
14
15
16
Other sections of website using
Narratives
1.
2.
3.
McCoy’s Zoology of Victoria
Natural Observations
Stories from Nature
•
Each section a Master Narrative with several
sub Narratives
Each sub Narrative may have its own sub
Narrative
Associated images also entered into MMR and
linked to the Narratives records
•
•
17
Master Narrative
Sub Narratives
18
Getting Data out of EMu
• EMu reports created using select data
• Separate reports for Catalogue, Narratives
and MMR records
• Reports exported in Excel format
19
Into SQL Server
• Perl script reads Excel reports and loads data including
images into SQL Server
– Creates a table for each module and necessary relationship
tables
• Ecatalogue
• EcatalogueMultimedia
– Captures values in labelled text fields and loads into separate
fields
– Attempts to identify Scientific names and surround with <sn> tags
• Not a fully automated process, takes approximately 30
minutes to update data
20
Out to the Web
• ASP.NET environment using VB.NET
• Images served directly from database and
resized dynamically (thumbnails)
• XML tags in data converted to html or
using as processing instructions
– eg <hst> converted to
<div class=“historic-text”></div>
21
Marking Up the Content
• Storing HTML in EMu
– Is this a good thing to do?
– What are the alternatives?
- a less intrusive mark-up like WikiWikiWeb
c2.com/cgi/wiki
- store HTML in EMu put don’t display
- use XML instead
• Storing XML in EMu
– What Schema should we use
• Should we create our own?
• Investigate existing Schemas
– Text Encoding Initiative
http://www.tei-c.org/
– Use XSLT PageView to preview
22
Scientific Names
Requirement
“All scientific names should be italicised when
displayed on the web.”
Problem
“How do we identify scientific names contained
within a text field if they haven’t been tagged?”
23
Scientific Names (cont)
Possible solutions
• Cross reference/link text against taxonomy module
• Check text against a pre-build list
• TaxonGrab – Natural Language Processing solution written in
PHP:
• http://sourceforge.net/projects/taxongrab
• FindIT - parses freetext and identifies scientific names and
author combinations:
• http://names.mbl.edu/tools/recognize.php
• Currently testing this technology, initial results are promising
24
Future Possibilities
• Hit EMu directly, no more exporting data to
SQL Server
– Use KE PHP web and web services libraries
• record extractor object
• xml, xslt and xpath
• Investigate professional XML authoring
tools to allow authors to create narratives
that are valid and well formed
25