Presentation - University of Illinois at Urbana

Download Report

Transcript Presentation - University of Illinois at Urbana

Graduate Curriculum for Biological Information Specialists:
A Key to Integration of Scale in Biology
P. Bryan Heidorn
Carole L. Palmer
Dan Wright
Melissa H. Cragin
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
2nd International Digital Curation Conference
Digital Data Curation in Practice
22 November 2006
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
Outline
●
Scientific Collaboration Initiative
●
Information challenges of biological data
●
Research foundation of our program
●
Biological Information Specialists
●
Data Curation Concentration in MSLIS
●
Partners and internships
●
Integration of research and practice
Scientific Communication Initiative (SCI)
Given:
 The ever growing universe of information resources, informatics
tools, and scholarly communication options that need to be
understood, assessed, and coordinated
SCI Aims:
 Improve information transfer and integration, technology
development and sustainability, and collaboration in the practice
of science through
●

basic and applied research

education to train of information specialists to work cooperatively
with research scientists
Complement, not duplicate, expertise of natural & computational
scientists.
Information challenges in biology
●
Emergent complexity is hallmark of modern biology.

●
●
●
●
Complexity of the data is of greater consequence for scientific
discovery than the volume of the data
New data practices must make sure that data at many scales is
interoperable to support this kind of research
Data is active and part of with the scientific process
in a changing producer-consumer economy
Scientific inquiry requires integration of lab data, procedures, literature
and reference work
A critical shortage of personnel trained to manage biological
information and data
Research foundation
Modeling and computational neuroscience
• Information and Discovery in Neuroscience
Palmer, NSF IIS-0222848
Automated metadata extraction and inference
• Automatic Museum Label Metatdata Extraction
Heidorn, NSF DBI-9982849, NSF, DBI-0345387
• Georeferencing Museum Specimen Sources
Heidorn, Moore 2005-2929-00
Terminology, schema, and ontology development
• Plant Description Standards
Heidorn, IMLS NR-00-01-0017-01
Collaborative data collection
• BioDiversity Survey Collaboration and Verification
Heidorn and Palmer, NSF BDI-0113918
Modeling and computational neuroscience
●
●
●
Large user group for experimental biological data, yet rarely
(if ever) generate their own data sets
One of the communities making the most use of shared data
repositories
Difficulties in re-purposing data collected under a specific set of
experimental circumstances and constraints

●
Metadata difficult to gather
Needs generally not taken into account in planning, collection,
and storage of experimental data
Automated metadata extraction and inference
●
●
Historical collections in botany, zoology, and entomology have been
curated for centuries, along with rich metadata
(~ 2 billion specimen labels)
Manual extraction is making collections globally accessible and usable
(Darwin core/ABCD, DigIR, TCS)
●
Automated metadata extraction: HERBIS and Biogeomancer approach
●
Implication: Predictive ecological modeling under climate change
Terminology, schema, and ontology development
Serving goals of knowledge representation, discovery, and data
integration
●
Biodiversity: Informatics Core Ontology

●
●
Taxonomic Databases Working Group standards
Neuroscience: ontology and vocabulary development work
aimed at integrating animal and human imaging data.
Text mining to ontology from texts
Biological Information Specialists
At present:
●
Biologists at all degree levels self-trained in information technology
●
Information technologists at all degree levels self-trained in biology
(both with gaps in knowledge for many months, years)
●
Differing roles of BIS in large and small
Master of Science in Biological Informatics
Part of campus-wide bioinformatics masters program
curriculum development funded by
NSF/CISE/IIS, Education Research and Curriculum
Development, 0534567
(Palmer, PI)
Degree Program began September 2006
Combines Biology, Bioinformatics, Computer Science core
with LIS courses from GSLIS long-standing top ranked program.
What does a BIS need to know?
Biological training
and interest in solving biological research problems
Information skills
●
●
●
Evaluation and implementation of information systems:
user based assessment and continual quality improvement for the
development of tools that work and are used.
Information acquisition, management, and dissemination: development
of digital libraries, data archives, institutional repositories, and related
tools.
Information organization and integration:
ontology development, structuring information for optimal use and
sharing, and standards development.
LIS orientation
●
●
●
LIS is the only field concerned with the full landscape of scientific
information and the interactions among fields
Focus on information needs of users, rather then internal criteria such
as technical elegance
Tradition of the training scientific information professionals as
informationist
“The informationist concept meets a critical need for an intermediary
between the expanding information universe and practitioners and
researchers. Successful informationists may come from a variety of
backgrounds and perform a variety of roles,
but must have knowledge about both a subject domain and the process of
locating, analyzing, and synthesizing information.”
(Giuse et al. 2005, p. 2) emphasis added.
UIUC bioinformatics core coursework
Cross-disciplinary course distribution requirement
Example courses include:
Bioinformatics:
Computing in Molecular Biology
Algorithms in Bioinformatics
Principles of Systematics
Computer Science:
Algorithms
Database Systems
Biology:
Human Genetics
Introductory Biochemistry
Macromolecular Modeling
Sample existing LIS courses
Representing and Organizing
Information
Interfaces to Information
Systems
Building Digital Libraries
Indexing and Abstracting
Information Modeling
Architecture of Networked
Information Systems
Information Sources and
Services in the Sciences
Implementation of Information
Retrieval Systems
Use and Users of Information
Electronic Publishing
Health Sciences Information
Services and Resources
New and proposed courses
Ontologies in the Natural Sciences
(Renear)
Biodiversity Informatics (Heidorn)
Information Transfer and
Collaboration in Science (Palmer)
Metadata in Theory and Practice
(McDonough)
Scientific Data and Procedure
Standards
Scientific Classification and Vocabulary
Discovery Informatics and Data Mining
Scientific Literatures and Bibliometrics
Bioinformatics Resources and Tools
Open Access Repositories
MSLIS Data Curation Concentration
Data Curation Educational Program (DCEP)
IMLS – Laura Bush 21st Century Librarian Program,
RE-05-06-0036-06 (Heidorn, PI)
Students with the DC concentration will be trained to add value
to data and promote sharing across labs and disciplinary
specializations
Integration of research and practice
●
●
●
●
Cooperating institutions:

Biomedical Informatics Research Network (UCSD)

Arrowsmith literature mining project (University of Illinois at Chicago,
Neuroscience Dept.)

Smithsonian Institution

American Museum of Natural History

Missouri Botanical Garden

U.S. Army Strategic Environmental Management Program

MIT Data Services Librarian
Identify information problems and collect best practices from our partners to
provide a broad understanding of information and data techniques, issues, and
needs
Place students in internships with our partners at biological science
institutions to gain real-world biological research experience
Cultivate new partners and new collaborative reseearch
New research directions
Focus on integration and scale
Informatics infrastructure as competitive edge
Sample areas of development

Landinformatics Group
Atmospheric science, hydrology, nutrient balance, carbon cycle,
ecology, agronomy


Critical Zones Observatory
Focus on data integration problems across larger ranger sciences
References
Giuse, N.B., Sathe, N., and Jerome, R. (2005). Envisioning the
Information Specialist in Context (ISIC): A Multi-Center Study to
Articulate Roles and Training Models. Medical Library Association.
Palmer, C.L., Cragin, M.H., and Hogan, T.P. (2004). Information at the
Intersections of Discovery: Case Studies in Neuroscience.
Proceedings of the American Society for Information Science and
Technology annual meeting 41: 448-455.
Greenberg, Jane, P. Bryan Heidorn and Stephen Seiberling (2005).
Growing Vocabularies for Plant Identification and Scientific Learning.
International Conference on Dublin Core and Metadata Applications
(DC-2005, Sept 15, 2005), Madrid, Spain.
Acknowledgements
●
Research grants:
IIS 022848, DBI 0345387
●
GSLIS Research Writing Group
_____________________
●
Scientific Communication Initiative: http://sci.lis.uiuc.edu/
under development