Transcript Document
Challenges of Developing a Global Alerting System
American Chemical Society National Meeting - Chicago
Symposium Honoring Gary Wiggins
March 25th, 2007
Leah Sandvoss
Information Scientist
Honors
Chemical Informatics program initiation
Mentorship
Independent study on Chemical Information
2
Outline
Definitions
Overview of Global Alerting System
Business Analysis – pre-project
Information Retrieval
Metadata
Users
Project limitations
Lessons Learned
3
Definitions
Alerts/Selective Dissemination of Information(SDI)/Current Awareness – a
stored search strategy which is run periodically against a database to return any
newly added results to the end-user
Information Retrieval – the systematic storage and recovery of data, as from a
file or database
Knowledge Management – refers to a range of practices used by organizations
to identify, create, represent, and distribute knowledge for reuse, awareness and
learning across the organization.
Business Analysis – act of gathering and translating business issues and needs
into a form that can be given to appropriate people to form solutions
Metadata – data about data
“Zip Code” is the metadata for the piece of data “92121”
“Abstract” is the metadata for the actual abstract text
Unstructured data – data structure which is not readily machine readable
Controlled vocabulary – carefully selected list of words and phrases which are
used to tag units of information to make them more retrievable by a search
cancer vs neoplasm
Sources: Chicago Manual Style (CMS): information retrieval. Dictionary.com. Dictionary.com
Unabridged (v 1.1). Random House, Inc. http://dictionary.reference.com/browse/information
retrieval (accessed: February 21, 2007). “Consulting Skills for Business Analysis” course by
Watermark Learning
4
Knowledge Management
Each company in the healthcare and pharmaceutical sector has spent
an average of US$274,000 per annum on knowledge management over
the past three years (ref 2001)
Dyer, G. and McDonough B. (2001) Vertical targets for knowledge management vendors. International Data Corporation. Document No. 25535
5
State of Biomedical Literature Mining
Source: Jensen, L.J., Sari, J., Bork, P. Literature Mining for the biologist: from information retrieval to biological discovery Nature, vol 7, February
2006
6
Current Awareness System
Common platform to deliver many types of information, providing a common process for
inserting the information
Compiles results from multiple information retrieval systems
Allows for the collection, review, analysis, and summarization of information types
Export capabilities
Uses controlled vocabularies
Provides structured, actionable information
News
Patents
Books
Portal
Literature
Integration layer
Internal
TOCs
Key Op Leaders
7
Business Analysis
8
Business Analysis
A team was formed in 2002 to look at key information products
available to the end-user, coined “value-added products”
First focus was on the products providing alerts/SDIs
Used online alert survey from an existing internal system to
identify user needs. Approximately 230 total respondents.
Held discussions among team members about workflow based
on their experience with customer needs
Conducted a per-month cost comparison of various alerting
services
Several products provided overlapping information, resulting in
duplication of effort among the information scientists
Recommendations were made for a future system and workflow for
managing alerts
9
Business Analysis
In late 2004, a project team was formed to develop a tool to manage alerts
as well as search results
First goal was to provide a repository to manage content. Tool would:
Allow information scientists to contribute, manage, and disseminate
content
Replace existing current awareness products
Provide automation where possible to save time on the part of
information scientists
Second goal was to develop a tool to display content to the end-user in one
interface with a common format
Environmental scan was performed but determined to develop product inhouse
Facilitate incorporating internal content
Focus of this talk is on the repository development
10
Information Retrieval
11
Information Retrieval– Licensing Issues
Needed to determine what was “in scope” for existing contracts
Copyright restrictions questions
Vendor-produced abstracts subject to copyright restrictions?
Does it violate copyright to redistribute results?
Can full-text of article be used and classified?
Can a screen scrape be performed on an HTML page?
Can complete citation(s) be stored? If not, what fields can be stored? (ie, a
unique identifier so that user can get back to complete citation).
For stored content, is there an expiration date?
Results Options
Format - XML, plain text, HTML, etc
Transfer type - sFTP, e-mail, HTML view
12
Unstructured or Semi-structured Data – BRS/Tagged
UI 92158846
TI Cluster headache syndrome. Ways to abort or ward off attacks. [Review]
AU Marks DR. Rapoport AM
13
Unstructured or Semi-structured data - HTML
14
Unstructured or Semi-structured data - TOC
15
Information Retrieval – Process
Supported
Commercial
Databases
Rules applied for
strategy setup and
delivery
E-mail inbox
Repository
Parsers applied
Supported
News
Sources
Other
Sources
Information
Scientists
16
Metadata
17
Source System Metadata
Source system - defined set of metadata to which the vendor tags would map
To define core fields, looked at range of fields provided by all the databases of interest from the different vendors
Used Dublin Core fields where applicable
Abstract
Molecular Sequence
Author
Molecular Source Number
Classification
Open URL issn
Date Granted
Open URL issue
Device Manufacturer
Organism Info
Device Trade Name
Patent Assignee
Diseases
Patent Class
Edition Subset
Patent Country
External Reference ID
Gene Info
Keywords
Language
Literature Title
Literature Type
Location
Methods & Equipment
Miscellaneous
Patent Number
Personal Name as subject
Publish Date
Publisher
Sequence Data
Space Flight Mission
Subject Heading
Title
TOC Categories
18
Metadata Mapping
Minimum set of fields / database standpoint
Exclude fields not used for search or retrieval (ex: Item URL, Locally Held, Local
Messages, Record Owner, Update Code, Notes, Order Number, Price, Abbreviated
Source, Reprint Address, etc.)
Manual process by subject matter experts (information scientists)
Database Name
Database Tag Name
Target Metadata Field
Name
Biosis Previews
Concept Code
Classification
Derwent World Patents
Index
Title Index Terms and Additional
Words
Keywords
Derwent World Patents
Index
Derwent Accession Number
External Reference ID
SciSearch
Cited Work
Cited Reference
CAB Abstracts
Organism Descriptors
Organism Info
Medline
Country of Publication
Location
19
Metadata - Content Objects
Content objects defined to differentiate content types on the backend
Contained unique metadata as well as overlapping metadata
Choices for end-user interface
Content Objects:
20
Metadata - Controlled Vocabulary
Controlled terms enhance search and retrieval capability
Terms are selected by user (information scientist) for tagging content items
Use preferred term, then list of synonyms
Standard terminology lists as pick lists (ex: Therapeutic area, disease)
Authoritative sources were used to determine appropriate values
Internal vocabularies
National Library of Medicine Medical Subject Headings (MeSH)
Medical Dictionary for Regulatory Activities (MedDRA)
Authoritative
Classifications
MeSH
Cyclohexatriene
Metathesaurus
MedDRA
Benzene
Internal
Benzene
Repository
Figure source: DATAFUSION, Inc copyright 1999
21
Users
22
Users
Information
Scientists
Information Scientists
Set up alert strategies in vendor databases as
well as the source system repository
Involved in interactive sessions with the tool to
discuss content needs and find bugs in the
system
End-Users
End-users
Used the portal which displayed content
Involved early on in the initial requirements
gathering, then engaged by the information
scientists to test the tool
23
Project Limitations
24
Project Limitations – Source System
For every new vendor file/database that needed to be added to the system, a manual
mapping from the vendor database fields to the target metadata had to be performed
Repository interface was cumbersome
Setting up a strategy was quite time-consuming as there was no auto-population
of data
Opening new windows within the system was quite slow
New version of source system arrived mid-project
An approver role was required to allow an alert strategy to be set-up
System did not provide robust, boolean searching at the time
Only had one expert on the source system
25
Project Limitations - Organizational
Key reasons why projects fail:
Inattentiveness to organizational change
Sponsorship is lost or changes
Lack of budget/resources
Other Factors
Project team leaders and members changed several times throughout
life of project
Other applications identified to integrate into the solution were also
“new” or in development
IT resources not well supported
NO-GO decision was made near production
26
Lessons Learned
For a multi-year project:
Manage change
– Knowledge transfer
– Sustain momentum
Sustain business sponsorship
Plan the budget carefully
Involve influencing parties (vendors/publishers) early
Current awareness system:
Portal concept well-supported by end-users
– Flexibility on their part to manage alerts
– Integrated several different content types
Common workflow supported by information scientists
27
Summary
Knowledge Management is a continuous challenge
A need still exists for a global current awareness system
Follow-up plans
Currently evaluating commercially available products
Internal efforts to filter, consolidate, and analyze content for
customers
28
Acknowledgements
Ajit Acharya
Amy Tellez-Karsten
Andrew Horgan
Angela Liu
Angelika Wendler-Awasthi
Ann Young
Barb Miller
Barbara Breen
Beverly Kucharski
Bill Gillick
Bob Berger
Bryon Tilley
Cara Evans
Chandra Aitha
Chris Duhl, West Pole
Christina Carr
Christina Keil
Christine Ng
Claire Hogikyan
Clare Challenger
Cleazoe Malek
Dan Cooney, West Pole
David Walsh
Ed Pelic
Elaine Logan
Emory Emrich
Fradwin Marmol
Francis Di Bella
Getu Diro
Hennie Oswald
Ian Parsons
Iradj Reza
Jan Carr
Janet Smith
Jill Maddox
Julie grannis
Karen Erani
Karl Royer
Kathy Cornish
Kathy VanLeeuwen
Ken Drake
Kevin Ogborne
Kim Johnson
Kirsten Kliwinski
Leah Sandvoss
Maheshkar Porandla
Mark Mitchell
Mary Skousen
Michele Wang
Michele Wolfe
Murali Nandula
Nathaniel Dunford
Nicola Cooper
Pam Kubiak
Pat Burke
Penny Miller
Peter Dresslar, Metamatics
Pragati Mithal
Raj Dandamudi
Ravneesh Sachdev
Rich Steel
Richard Nicholas
Rob Exposito
Rob Purdue
Robert Linde
Shuntai Wang
Simona Hendl
Srilekha Komma, Keane, Inc
Susan Suchetta
Suzan Quick, West Pole
Thomas Knowles
Veronica Trimble
Vishal Kumar
29
Thanks
30
Backup Slides – Requirements from VAP team
Developed Requirements. Key documents included:
A proposed “alert service model” which included questions regarding alert gadget data entry,
working with clients and ROI metrics.
A list of roles and responsibilities of stakeholders involved in “Global alerting process”,
including IM Colleagues and Pfizer Colleagues.
A detailed description of the requirements needed for a future alerting system. It includes
requirements for processing and managing alerts, archiving/distribution/retention issues, and
delivery and service to clients.
A list of all of the various types of alerts that are currently used within IM and at which
location they are run is provided.
A process model that describes how an end-user might look for/subscribe to alerts. Also
included is a process that would be used by the IM colleague when setting up alerts.
A “client need” summary, provided from the IM perspective
31
Backup - Dublin Core Metadata Elements
Contributor
Coverage
Creator
Date
Description
Format
Resource Identifier
Language
Publisher
Relation
Rights Management
Source
Subject and Keywords
Title
Resource Type
32