Transcript Document

Challenges of Developing a Global Alerting System
American Chemical Society National Meeting - Chicago
Symposium Honoring Gary Wiggins
March 25th, 2007
Leah Sandvoss
Information Scientist
Honors
 Chemical Informatics program initiation
 Mentorship
 Independent study on Chemical Information
2
Outline
 Definitions
 Overview of Global Alerting System
 Business Analysis – pre-project
 Information Retrieval
 Metadata
 Users
 Project limitations
 Lessons Learned
3
Definitions
 Alerts/Selective Dissemination of Information(SDI)/Current Awareness – a
stored search strategy which is run periodically against a database to return any
newly added results to the end-user
 Information Retrieval – the systematic storage and recovery of data, as from a
file or database
 Knowledge Management – refers to a range of practices used by organizations
to identify, create, represent, and distribute knowledge for reuse, awareness and
learning across the organization.
 Business Analysis – act of gathering and translating business issues and needs
into a form that can be given to appropriate people to form solutions
 Metadata – data about data
 “Zip Code” is the metadata for the piece of data “92121”
 “Abstract” is the metadata for the actual abstract text
 Unstructured data – data structure which is not readily machine readable
 Controlled vocabulary – carefully selected list of words and phrases which are
used to tag units of information to make them more retrievable by a search
 cancer vs neoplasm
Sources: Chicago Manual Style (CMS): information retrieval. Dictionary.com. Dictionary.com
Unabridged (v 1.1). Random House, Inc. http://dictionary.reference.com/browse/information
retrieval (accessed: February 21, 2007). “Consulting Skills for Business Analysis” course by
Watermark Learning
4
Knowledge Management
 Each company in the healthcare and pharmaceutical sector has spent
an average of US$274,000 per annum on knowledge management over
the past three years (ref 2001)
Dyer, G. and McDonough B. (2001) Vertical targets for knowledge management vendors. International Data Corporation. Document No. 25535
5
State of Biomedical Literature Mining
Source: Jensen, L.J., Sari, J., Bork, P. Literature Mining for the biologist: from information retrieval to biological discovery Nature, vol 7, February
2006
6
Current Awareness System
 Common platform to deliver many types of information, providing a common process for
inserting the information
 Compiles results from multiple information retrieval systems
 Allows for the collection, review, analysis, and summarization of information types
 Export capabilities
 Uses controlled vocabularies
 Provides structured, actionable information
News
Patents
Books
Portal
Literature
Integration layer
Internal
TOCs
Key Op Leaders
7
Business Analysis
8
Business Analysis
 A team was formed in 2002 to look at key information products
available to the end-user, coined “value-added products”
 First focus was on the products providing alerts/SDIs
 Used online alert survey from an existing internal system to
identify user needs. Approximately 230 total respondents.
 Held discussions among team members about workflow based
on their experience with customer needs
 Conducted a per-month cost comparison of various alerting
services
 Several products provided overlapping information, resulting in
duplication of effort among the information scientists
 Recommendations were made for a future system and workflow for
managing alerts
9
Business Analysis
 In late 2004, a project team was formed to develop a tool to manage alerts
as well as search results
 First goal was to provide a repository to manage content. Tool would:
 Allow information scientists to contribute, manage, and disseminate
content
 Replace existing current awareness products
 Provide automation where possible to save time on the part of
information scientists
 Second goal was to develop a tool to display content to the end-user in one
interface with a common format
 Environmental scan was performed but determined to develop product inhouse
 Facilitate incorporating internal content
 Focus of this talk is on the repository development
10
Information Retrieval
11
Information Retrieval– Licensing Issues
 Needed to determine what was “in scope” for existing contracts
 Copyright restrictions questions
 Vendor-produced abstracts subject to copyright restrictions?
 Does it violate copyright to redistribute results?
 Can full-text of article be used and classified?
 Can a screen scrape be performed on an HTML page?
 Can complete citation(s) be stored? If not, what fields can be stored? (ie, a
unique identifier so that user can get back to complete citation).
 For stored content, is there an expiration date?
 Results Options
 Format - XML, plain text, HTML, etc
 Transfer type - sFTP, e-mail, HTML view
12
Unstructured or Semi-structured Data – BRS/Tagged
UI 92158846
TI Cluster headache syndrome. Ways to abort or ward off attacks. [Review]
AU Marks DR. Rapoport AM
13
Unstructured or Semi-structured data - HTML
14
Unstructured or Semi-structured data - TOC
15
Information Retrieval – Process
Supported
Commercial
Databases
Rules applied for
strategy setup and
delivery
E-mail inbox
Repository
Parsers applied
Supported
News
Sources
Other
Sources
Information
Scientists
16
Metadata
17
Source System Metadata

Source system - defined set of metadata to which the vendor tags would map

To define core fields, looked at range of fields provided by all the databases of interest from the different vendors

Used Dublin Core fields where applicable
Abstract
Molecular Sequence
Author
Molecular Source Number
Classification
Open URL issn
Date Granted
Open URL issue
Device Manufacturer
Organism Info
Device Trade Name
Patent Assignee
Diseases
Patent Class
Edition Subset
Patent Country
External Reference ID
Gene Info
Keywords
Language
Literature Title
Literature Type
Location
Methods & Equipment
Miscellaneous
Patent Number
Personal Name as subject
Publish Date
Publisher
Sequence Data
Space Flight Mission
Subject Heading
Title
TOC Categories
18
Metadata Mapping

Minimum set of fields / database standpoint

Exclude fields not used for search or retrieval (ex: Item URL, Locally Held, Local
Messages, Record Owner, Update Code, Notes, Order Number, Price, Abbreviated
Source, Reprint Address, etc.)

Manual process by subject matter experts (information scientists)
Database Name
Database Tag Name
Target Metadata Field
Name
Biosis Previews
Concept Code
Classification
Derwent World Patents
Index
Title Index Terms and Additional
Words
Keywords
Derwent World Patents
Index
Derwent Accession Number
External Reference ID
SciSearch
Cited Work
Cited Reference
CAB Abstracts
Organism Descriptors
Organism Info
Medline
Country of Publication
Location
19
Metadata - Content Objects

Content objects defined to differentiate content types on the backend
 Contained unique metadata as well as overlapping metadata

Choices for end-user interface

Content Objects:
20
Metadata - Controlled Vocabulary

Controlled terms enhance search and retrieval capability
 Terms are selected by user (information scientist) for tagging content items
 Use preferred term, then list of synonyms
 Standard terminology lists as pick lists (ex: Therapeutic area, disease)

Authoritative sources were used to determine appropriate values
 Internal vocabularies
 National Library of Medicine Medical Subject Headings (MeSH)
 Medical Dictionary for Regulatory Activities (MedDRA)
Authoritative
Classifications
MeSH
Cyclohexatriene
Metathesaurus
MedDRA
Benzene
Internal
Benzene
Repository
Figure source: DATAFUSION, Inc copyright 1999
21
Users
22
Users
Information
Scientists

Information Scientists
 Set up alert strategies in vendor databases as
well as the source system repository
 Involved in interactive sessions with the tool to
discuss content needs and find bugs in the
system
End-Users

End-users
 Used the portal which displayed content
 Involved early on in the initial requirements
gathering, then engaged by the information
scientists to test the tool
23
Project Limitations
24
Project Limitations – Source System
 For every new vendor file/database that needed to be added to the system, a manual
mapping from the vendor database fields to the target metadata had to be performed
 Repository interface was cumbersome
 Setting up a strategy was quite time-consuming as there was no auto-population
of data
 Opening new windows within the system was quite slow
 New version of source system arrived mid-project
 An approver role was required to allow an alert strategy to be set-up
 System did not provide robust, boolean searching at the time
 Only had one expert on the source system
25
Project Limitations - Organizational
 Key reasons why projects fail:
 Inattentiveness to organizational change
 Sponsorship is lost or changes
 Lack of budget/resources
Other Factors
 Project team leaders and members changed several times throughout
life of project
 Other applications identified to integrate into the solution were also
“new” or in development
 IT resources not well supported
NO-GO decision was made near production
26
Lessons Learned
 For a multi-year project:
 Manage change
– Knowledge transfer
– Sustain momentum
 Sustain business sponsorship
 Plan the budget carefully
 Involve influencing parties (vendors/publishers) early
 Current awareness system:
 Portal concept well-supported by end-users
– Flexibility on their part to manage alerts
– Integrated several different content types
 Common workflow supported by information scientists
27
Summary
 Knowledge Management is a continuous challenge
 A need still exists for a global current awareness system
Follow-up plans
 Currently evaluating commercially available products
 Internal efforts to filter, consolidate, and analyze content for
customers
28
Acknowledgements
Ajit Acharya
Amy Tellez-Karsten
Andrew Horgan
Angela Liu
Angelika Wendler-Awasthi
Ann Young
Barb Miller
Barbara Breen
Beverly Kucharski
Bill Gillick
Bob Berger
Bryon Tilley
Cara Evans
Chandra Aitha
Chris Duhl, West Pole
Christina Carr
Christina Keil
Christine Ng
Claire Hogikyan
Clare Challenger
Cleazoe Malek
Dan Cooney, West Pole
David Walsh
Ed Pelic
Elaine Logan
Emory Emrich
Fradwin Marmol
Francis Di Bella
Getu Diro
Hennie Oswald
Ian Parsons
Iradj Reza
Jan Carr
Janet Smith
Jill Maddox
Julie grannis
Karen Erani
Karl Royer
Kathy Cornish
Kathy VanLeeuwen
Ken Drake
Kevin Ogborne
Kim Johnson
Kirsten Kliwinski
Leah Sandvoss
Maheshkar Porandla
Mark Mitchell
Mary Skousen
Michele Wang
Michele Wolfe
Murali Nandula
Nathaniel Dunford
Nicola Cooper
Pam Kubiak
Pat Burke
Penny Miller
Peter Dresslar, Metamatics
Pragati Mithal
Raj Dandamudi
Ravneesh Sachdev
Rich Steel
Richard Nicholas
Rob Exposito
Rob Purdue
Robert Linde
Shuntai Wang
Simona Hendl
Srilekha Komma, Keane, Inc
Susan Suchetta
Suzan Quick, West Pole
Thomas Knowles
Veronica Trimble
Vishal Kumar
29
Thanks
30
Backup Slides – Requirements from VAP team
 Developed Requirements. Key documents included:






A proposed “alert service model” which included questions regarding alert gadget data entry,
working with clients and ROI metrics.
A list of roles and responsibilities of stakeholders involved in “Global alerting process”,
including IM Colleagues and Pfizer Colleagues.
A detailed description of the requirements needed for a future alerting system. It includes
requirements for processing and managing alerts, archiving/distribution/retention issues, and
delivery and service to clients.
A list of all of the various types of alerts that are currently used within IM and at which
location they are run is provided.
A process model that describes how an end-user might look for/subscribe to alerts. Also
included is a process that would be used by the IM colleague when setting up alerts.
A “client need” summary, provided from the IM perspective
31
Backup - Dublin Core Metadata Elements

Contributor

Coverage

Creator

Date

Description

Format

Resource Identifier

Language

Publisher

Relation

Rights Management

Source

Subject and Keywords

Title

Resource Type
32