Veteris Technologies Solution Validation of Researcher

Download Report

Transcript Veteris Technologies Solution Validation of Researcher

IBM Content Analytics Helping
to Drive Down Cost in
Pharmaceutical R&D
Session Number 1413A
John Kamins, CEO
Veteris Technologies
Keyur Dalal, Solution Architect
IBM jStart Emerging Technologies
Topics to Address
• Today’s Life Sciences Industry Pain Points
• Veteris Technology Addressing Industry Needs
• Why IBM Technology/Expertise?
• Veteris Approach for Success
– Research Profile Automation Proof Of Concept (POC)
• IBM jStart Methodology
• Veteris POC Goals
• Solution Architecture
• POC Measurements Goals and Results
• POC Lessons Learned
• Veteris Next Steps
1
Life Science Industry Need
Veteris Technologies was founded based on significant unmet needs in the Life Science
industry and others for data driven decision-making through real time access to business
and technical knowledge.
Pharma Research Pain Points
Life Science “tools” Supplier Pain Points
•
•
•
•
•
•
•
Increasing cost of drug development rising to over
$1B per drug
Heightened competition from new market entrants
Lapsing patents
Need to contain research costs and consolidate
vendors
Limited understanding on part of procurement as to
how consumables and technologies from suppliers
can best be bundled and/or consolidated relative to
product portfolios and researcher preference.
Need for better business intelligence tools that
leverage the growing abundance of unstructured
web-based data around Life Science tools and
research trends.
2
•
•
•
•
Vendor sales and marketing tools and
channels were becoming stale, ineffective,
and costly.
Cost-savings initiatives within pharma is
driving vendor consolidation.
NIH budget cuts reducing research spend in
government and academic research
institutions.
Limited understanding on part of vendors as
to how their product portfolios can best be
bundled for competitive advantage and halo
effect.
Need for better business intelligence tools
that leverage the growing abundance of
unstructured web-based data around Life
Science tools and research trends.
Addressing Industry Pain Points
Veteris’ web-based software harnesses and deploys unstructured market data into predictive
tools that help reveal relationships between the people, companies, and technologies that use,
buy, and sell Life Science research products, allowing for better purchasing decisions.
Pharma, Biotech, Academic,
& Government Procurement
Researchers
Interface between groups is HARD due to:
• Constantly growing and changing
supplier technology portfolios
• Organizational changes in
pharmaceutical research
Supplier Sales & Marketing
3
Veteris Technologies Solution
Predictive Software for Vendor Consolidation
Overlap Analysis of Researcher Profile with Vendor Product Profile
Optimal
Product
Sourcing
Researcher
Profile
Consumable
Technology
Product Profile
Veteris Technologies Solution
Predictive Software for Vendor Consolidation
Overlap Analysis of Researcher Profile with Vendor Product Portfolios
Vendor A
Vendor B
Optimal Vendor
Sourcing
Vendor C
1,000s of Researcher
Workflow Profiles
1,000s of Research Products
from 1,000s of Vendors
Predict and leverage volume through vendors with best product match to researcher needs
Veteris Technologies Solution
Maintain Researcher Profiles
Key Profile Terms
• Discipline
• Profile Type
•
•
•
•
•
•
•
•
•
•
•
Job Role
Science Area
Applications
Techniques
Stage
Therapeutic Area
Target Class
Drug Class
Bioprocessing
Species
Cell Type
Domain Specific
Descriptors
Tier 1
Tier 2
Tier 3
6
• Translational/Clinical
Research
• Drug Discovery
• Drug Development
• Diagnostics
• Public Health
• Testing
• Forensics.Human ID
• Manufacturing,
Production & Process
Development
• QA/QC
Veteris Technologies Solution
Automated Discovery & Validation of Researcher Profiles
10s of papers =
>Basic Profile
100s of papers =
>Enhanced Profile,
>Basic I.D.Validation
Discovery
Verification
Dr. Jane Researcher
Biosensor
Polyclonal antibody
Molecular biology
….
Respiratory disease
Microchip
Gel electrophoresis
….
1,000s of papers,
Multiple sources =
>Advanced Profile,
>High Level I.D. Validation
Fully profiled &
Validated I.D.
Validation
Veterinary medicine
Pathobiology
Infectious disease
….
Auburn University
Department of Sciences
Tel: (333) 222-1111
[email protected]
Veteris Technologies Solution
Validation of Researcher Profiles Multi-Source, Iterative
Process
PubMed
Institution/Lab Websites
Validate I.D. through overlap of
profile data from multiple sources
Profile data from multiple papers
Social Media
Profile data from public profiles
Profile data & multiple journal references
Client Documents
Profile data from e-mails, tech notes, etc.
Veteris Technologies Solution
IBM Content Analytics Technology
PubMed
Institution/Lab Websites
Validate I.D. through overlap of
profile data from multiple sources
Profile data from multiple papers
Profile data & multiple journal references
Content Analytics
Social Media
Profile data from public profiles
Client Documents
Profile data from e-mails, tech notes, etc.
Veteris Technologies Solution
Why IBM?
• IBM has a 50+ year History in Text Analysis and Discovery
• World Class Natural Language Processing (NLP) Technology
• Leading Edge NLP based Solutions (Watson)
• IBM jStart Text Analytics Expertise
10
Research Profile Automation POC
IBM jStart Engagement Process
Iterative Development, Continuous Testing
Constant feedback on Business & Technology
Solution Drivers &
Boundaries
Requirements &
Solution Scope
Detailed
Design
 Clear
understanding of
business problem
to be solved
 Business and
technical
management
commitment
 Funding in place
 Right skills
identified and
committed to
project
 Decision making
context
 Solution definition
 Small team
 Define scope
 Map business
needs and
technology
 Deliverables
 Use cases
 Preliminary
design
 Tentative
schedule
 Initial sizing
 Detailed
schedule
 Finalize scope
 Final technology
selections
 Deliverables
 Design
documents
 Project schedule
 Funding
approved for
pilot
implementation
Iterative
Development
 Early prototyping
 Regular code
drops
 Testing
throughout cycle
 Constant
feedback from
users
 Modifications via
change request
Deployment &
Skills Transfer
 Solution
deployment
 Customer selfsufficiency
 Reusable assets
 Other business
areas or
technology
Research Profile Automation POC
Purpose, Goals, and Structure
• Purpose
– Automate the creation and maintenance of researcher profiles from unstructured public data
sources, such as PubMed and university department pages.
• Key Goals
– Track accuracy, completeness, and currency of crawled data for 1600 researchers
– Discover additional researchers, not currently tracked by Veteris
– Document strategy for expanding the list of data sources
– Evaluate feasibility of solution for millions of researchers
• Structure
– The pilot will be created and deployed at IBM, and can be demonstrated to Veteris and as
such, will not be accessible outside the IBM network
– The project is structured in two phases and is designed to provide Veteris with an
understanding of the value and usage of the technology early in the project, and to reduce
risk
12
Researcher Profile Automation POC
Researcher Profile Fields
• A researcher’s profile is made up of three
types of data
– Key data, which comprises the name
– Contact Fields, such as
•
•
•
Address
Email
Phone
– Profile fields such as
•
•
•
•
Profile type
Tier 1/2/3
Scientific Area
Discipline
13
Research Profile Automation POC
Solution Architecture
IBM Content Analytics
Deploy Model
Document Server
(UIMA Pipeline)
Research
Profile
Model
JDBC UIMA
CAS
Consumer
Create/Update’
Profiles
ICA-LanguageWare
Resource Workbench
File System
Crawler
Veteris
Taxonomy
Seed PubMed Search
(Researcher Name)
E-Utilities
Research Profile
Consumable
Technology
Product Profile
Predictive Software for Vendor
Consolidation
Research Profile Automation POC
Measurement Goals
• Maintain profile data for 1600 researchers
• Track accuracy of data crawled
– % of names correctly identified
•
•
•
Name + institution
Name + at least one contact field
Name + overlap of some profile data
• Track completeness of data crawled
– % of contact fields discovered per researcher
– % of fields verified (identical as base data)
– # of rows per researcher
• Track currency of data
– % of proposed updates that are newer information
Research Profile Automation POC
Measurement Goal Results
• Maintain profile data for 1600 researchers
– The number of known researchers' names found: 1044 (Strong, Medium and
Weak candidates) – 65%
– The number of known researchers' names found: 925 (Strong and Medium) –
57%
• Track accuracy of data crawled (based on Strong, Medium candidates)
–
–
–
–
% of names correctly identified
Name + institution - 20.3%
Name + at least one contact field –8.6%
Name + overlap of some profile data - 47%
Research Profile Automation POC
Measurement Goal Results (continued)
• Track completeness of data crawled (based on Strong, Medium and Weak
candidates)
– % of contact fields discovered per researcher
•
•
•
418 Emails were discovered for all known researcher
212 Phones were discovered for all known researcher
681 Addresses were discovered for all known researcher
– % of fields verified (identical as base data)
•
•
•
•
•
•
Email – 8.6% (90 emails)
Address – 0%
Phone – 0%
Profile type – 53.3% (557 profile types)
Tier1 – 20% (216 tiers 1)
Tier2 – 3.9% (41 tiers 2)
– # of rows per researcher – 23 (based on Strong and Medium candidates)
Research Profile Automation POC
Measurement Goal Results (continued)
• Track currency of data (based on Strong and Medium candidates)
– % of proposed updates that are newer information
•
•
•
•
•
•
•
Email – 29% (270 emails)
Address – 65% (605 addresses)
Phone – 1.83% (17 phones)
Organization – 69% (641 organizations)
Profile type – 84.4% (871 profile types)
Tier1 – 96% (890 tiers 1)
Tier2 – 45% (423 tiers 2)
Research Profile Automation POC
Lessons Learned
•
Validated the Viability of Research Profile Management Automation using Unstructured
Library Sources
– Proved out an Automated and Scalable Process for Research Profile
Management
– Can Leverage Industry Taxonomies of Research Terms to Build Robust Profiles
– Built a Solution Architecture that is Extendable to Include Additional Library
Sources
– Discovered the Process Yielded New Previously Unknown Relationships between
Researchers with Similar Profiles (~50000 in this case)
•
PubMed / PubMed Central Content Source
–
–
–
–
Abstract Limitations / Full Publication Copyright Complexities
Good source for Domain Specific Descriptor Information
Not an Optimal Source of Researcher Contact Information
Searching Approach
Retrieve PubMed and PubMed Central documents by Domain Descriptors instead of name
Research Profile Automation POC
What is Next?
• Optimize and Expand Library Sources to Enhance Profile Content
– Optimize PubMed Content Retrieval
– Tune Extraction Model for Improve Recognition of Entities and
Relationships
– Leverage Additional Library Source to Round Out Profile
• Retrieve PubMed and PubMed Central documents by Domain
Specific Descriptors instead of name
• Mirror automated profile process for research “tool” products
• Used for automated alignment with researcher profiles in Veteris
predictive software.
Thank You
John Kamins, CEO Veteris Technologies
[email protected]
Keyur Dalal, IBM jStart Emerging Technologies
[email protected]