Database-Web Portal project - University of California Davis

Download Report

Transcript Database-Web Portal project - University of California Davis

Integration of the UC Davis Biological
Collections Data via a Web Portal
[A Pilot Project]
Project Goals
• To develop a Web Portal allowing better & more use of the
information stored in our biological datasets.
• Use current & developing computer technologies allowing for this
access & data integration.
• Develop methods and Web tools to simplify dataset integration
among our biological datasets.
• Fulfill & meet the needs of our user community needing access
to our datasets.
UC Davis Biological Collections webservice:
-Darwin Core
(provider)
How will this work?
-Complete datasets
Registry returns
matching web service
Query UDDI Registry
Query Web
services
UDDI registry for
Web Services
DiGIR
WEB
Darwin Core
(providers)
Go Get species:
Data/WEB
Lycopersicon
Stream…
esculentum
Concepts for integrating
the museum datasets
1.
2.
3.
4.
5.
6.
Use the Dublin Core Metadata & Darwin Core Profile as the
integrating elements for all the museum datasets.
Develop dynamic methods to “mine” these above “core”
data from the individual museum datasets.
Use an enterprise database system as a repository for the
integrated museum datasets and to “serve out” the data.
Develop a Web service for distributing the dataset queries in
response to Web requests.
Develop a Web Portal interface using the Web service.
Develop methods to “Serve out” the data attributes that are
not part of Dublin/Darwin core elements.
Biological Collection summaries
Biological Collection
Approx. holdings
Database system
Approx. records
Herbarium
250,000+ (200+
Types)
ACCESS
15,000
TGRC (Tomato Genetics
Resource Center)
1,200+ Wild species
(4,500 genetic
accessions)
ACCESS
4,500
Phaff (Yeast collection)
6,000+ yeast
strains
FileMaker
6,500
Wildlife and Fisheries
12,000+
ACCESS
12,000
Anthropology
500+(bones/fossils
mainly)
ACCESS
300
Conservatory (Botanical)
3,000+
ACCESS/FileMaker
5,600
* Other database systems yet to work with:
Bohart (Entomology)
6,000,000+
Arboretum
4,000+
Nematology
11,000+ Types
(53,000 general)
...others (Vet/Med)
-
FileMaker
FileMaker
FileMaker
-
-
Darwin profile of Datasets
Objective:
To export the elements from the UC Davis
biological collection datasets into the DiGIR part of
the UC Davis Biological Collections "Web service".
The next slides show the beginnings of the Web
Service prototype…
Museum datasets matched to the Darwin Profile
The Darwin Profile is composed of 48 elements, in groups:
An ACCESS prototype of the integrated databases
Links to independent database tables listed here are prefixed by their
names, e.g. "Herbarium_Labels". Imported data (Darwin core) created from
these table links, are prefixed by "Darwin",
e.g."Darwin_Herbarium":
An ACCESS prototype of the integrated databases
Queries were created to parse out only the Darwin Core
elements from the tables.
e.g. qryDarwin_Herbarium
Museum test database
(Example)
The above shows the Phaff Yeast dataset fields matched to the
Darwin Profile elements.
…
Similarly the other datasets have been matched to the Darwin
Profile.
Museum test database
Next Step: To continue our Prototype development:
To develop queries into all these Darwin Profile datasets;
All contain the same data element sets which will allow for
a “UC Davis Museums” combined dataset query.
Essentially complete the implementation of DiGIR, as a
priority and secondarily develop methods to serve out the
additional data attributes residing in the individual
museum/biological collections datasets.
Integration of the UC Davis Biological
Collections Data via a Web Portal
[What Next?]
Next on the agenda:
• Discuss & demonstrate how the Wildlife & Fisheries
database system will serve as the “prototype” database
system for our combined biological datasets.
•Discuss collaborative efforts with other groups on the UC
Davis campus and other institutions.
•Discuss software researched to develop our Web Service;
e.g. XML, Open source development tools, Microsoft .NET
development tools, SRB (Storage Resource Broker).
How to integrate? Decisions to make.
Integrated database prototype, our approach:
Queries made against the linked tables can generate the
output for the Darwin core elements. However, we are not
certain how to present DiGIR with queries rather then
tables; Thus we created intermediate tables from the
linked tables.
-Is our approach of combining all our biological collection
datasets into 1 repository to “serve out” the Darwin core
elements as done in this prototype a good idea?
What is DiGIR’s preferred method?
Integrating Dataset Issues
Directors, Curators, faculty & staff have
concerns/issues about the Web Portal:
-Security (If their respective databases are made directly
available via the Web Portal system).
-Corruption possibilities.
-Data validation/quality control (once their datasets
are served out via the Web Portal).
-They cannot assure that their datasets are up-to-date &
complete. How will the Web Portal address this?
Addressing these Issues
Security:
- Databases will be uploaded to the Web Portal server entirely (for
small datasets). All requirements to prepare & make available the
datasets will entirely take place on the Web Portal server.
-Larger datasets (or if incompatible with Web Portal Server) will
require programming tools to be developed o the data server that
stores these before being exported to the Web Portal server.
Quality Assurance:
- The individual dataset holders will have control over their upload
and/or running of procedures to export their datasets to the Web
Portal server. Meta data will accompany the data defining the
dataset limitations, etc. [e.g. DiGIR provides this sort of meta
data information through the resource descriptive files]