Aitken presentation

Download Report

Transcript Aitken presentation

A Process Ontology for
Cell Biology
Stuart Aitken
Artificial Intelligence
Applications Institute
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
1
Outline
• Rapid Knowledge Formation (RKF)
Project
– RKF Project goals and domain
– The Cyc knowledge based-system
– RKF Tools
• Process Ontology
– General approach
– Formalisation
– Example
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
2
Rapid Knowledge Formation
• The RKF project aims to develop
tools which will allow domain
experts to enter knowledge directly
into the KBS.
• DARPA-funded, two teams:
– CYCORP
– SRI
• Organised around ‘Challenge
Problems’ – Cell Biology
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
3
RKF
Aim: To enable biologists to construct an
ontology/KB from a textbook source
formalise
Ontology
Alberts et al,
Essential Cell Biology, 1998
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
4
Rapid Knowledge Formation
Key techniques:
• The KBS has knowledge of the KA
process
– Knowledge of salience
– Knowledge of the requirements of an
adequate formalisation
• There is a dialogue between expert
and system, which clarifies the
concept being defined.
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
5
Rapid Knowledge Formation
Evaluation:
After a period of tool development,
• trials are organised, both
• expert performance, and
• KE performance is measured,
• and assessed independently.
The evaluation is extensive – over a
period of 2 weeks
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
6
The Cyc KBS
• Cyc (Doug Lenat) is a knowledgebased system, under development
since ~1984, aiming to represent
common sense knowledge.
• Cyc uses a large upper-level
ontology
• Uses a logical language based on
first-order logic
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
7
The Cyc KBS
Concepts in the Upper Ontology:
–
–
–
–
–
–
Thing, Agent, Event
TangibleThing, InformationBearingObject
…. Dog, Book
subclass(genls), instance-of(isa)
parts, subevent, role predicates
1600 concepts in total in the public
release (1998) - small% of Cyc
Classification:
– Stuff-like vs Object-like
– Individual vs Set
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
8
The Cyc KBS
• The upper-ontology supports
application development:
Thing
Upper-level
Intermediate-level
Application-level
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
9
The Cyc KBS
Cyc includes:
• An inference engine,
• GUI,
• tools for ontology development.
• Until the RKF project, ontology
development was by trained
knowledge engineers, working with
domain experts.
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
10
RKF
New tools in Cyc:
• Define a new concept, and place it
correctly in the ontology
• Refine a concept definition
• Define a new predicate
• Assert a new fact
• Define a new rule
• State an analogy
• Construct a new process
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
11
RKF
User interaction:
• Selection of items in the interface
– Choice determined ‘intelligently’, KBS
has knowledge of salience, and the KA
process, this knowledge must be
authored
• Browsing of the ontology
• Search
• Natural language dialogue
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
12
Process Models
RNA Transcription
BindsTogether
Move
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
13
Process Descriptor
Q: Name the process
A: [ RNA Transcription ]
Q:Select the type of Process that describes
the category best
• event localised
• creation or destruction event…
• ‘say this:’[ _ _ _ _ _ _ ]
Q: Define:
• affected object: [ _ _ _ _ _ ]
• location: [ _ _ _ _ _ ]
• actor:
[_____]
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
14
Process Models
Describing Processes:
• Complex expressions at the instance level
• Simpler to describe in terms of types
Upper-level
subevent(Event,Event)
doneBy(Event,Agent)
Intermediate-level
Application-level
?
ForAll ?E ?F ?G implies
(subevent(?E,?G) and isa(?E,BindsTogether)
subevent(?F,?G) and isa(?F,Move))
before(startOf(?E),startOf(?F))
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
15
Script Vocabulary
The Script theory defines the
semantics of Type-Level assertions
(typePlaysRoleInScene RNATranscription
DNAMolecule BindsTogether
objectActedOn)
• Requires rules for identity
– Can require complex reasoning
• Good for user input
• Can be extended to cover pre and
postconditions of actions
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
16
Scripts
subevents
RNA Transcription
startsAfterStartingOfInScript
BindsTogether
Move
t
e
f
Forall subevents f of t, of type Move,
and all subevents e of t, of type BindsTogether,
(startsAfterStartingof f e) where t is of type RNATranscription
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
17
Scripts
Type playing role
Types:
Instance:
Nucleotide
BindsTogether
N
e
objectActedOn
For some n in N, (objectActedOn e n)
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
18
New Script Vocabulary
• Pre and Post conditions
(preconditionOfScene-negated
BindsTogether touchingDirectly
<Ribonucleotide Nucleotide>)
BindsTogether
N
not R
touchingDirectly
N
R
(postconditionOfScene
BindsTogether connectedTo
<Ribonucleotide Nucleotide>)
connectedTo
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
19
New Script Vocabulary
Types:
BindsTogether
Nucleotide
Set of
Instances:
N
role
e
role
Ribonucleotide
R
Precondition:
Postcondition:
Some ?n in N, some ?r in R
(not
(touchingDirectly ?n ?r))
Some ?n in N, some ?r in R
(connectedTo ?n ?r)
identity
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
20
Script Vocabulary
• The Script vocabulary forms an
‘intermediate level’, which
• lies behind the Process descriptor
GUI (i.e. the textboxes)
• Not, in itself, a taxonomy of
processes, but allows processes to
be described in detail.
• Defining the subclass relation is
just one task.
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
21
Vaccinia Virus Life Cycle
• The vaccinia virus life cycle was
selected as an example of a complex
model to formalise as a set of Scripts.
• The model includes actions,
decomposition, ordering, objectsplaying-roles and pre/postconditions
• It is a good test for the Script
vocabulary
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
22
Vaccinia Virus Life Cycle
Temporal:
mRNATranscription-Early
ViralGeneTranslation-Early
MovementOfProtein
Participants
mRNATranscription-Early
Outputs:messengerRNA
ViralGeneTranslation-Early Inputs:messengerRNA
MovementOfProtein
Conditions:
mRNATranscription-Early
Pre:spatiallySubsumes Cell VirusCore
ViralGeneTranslation-Early
MovementOfProtein
Post:spatiallySubsumes
CellCytoplasm Vitf2
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
23
Evaluation
• 8 biologists were selected, and trained
in the tools, 4 per team
• The knowledge to be formalised was
selected (chapter 7 in Alberts)
• The knowledge base was allowed to
contain ‘pump-priming’ knowledge
• The biologists entered knowledge ,
using the tools, then tested it against a
set of questions,
• Ontology/KB was revised
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
24
Evaluation
Results (outline)
• A huge amount of data was collected,
but analysis is complex (IET Inc)
• Domain experts were able to develop
ontologies after ‘light’ training
• Knowledge engineers out-perform
domain experts in ontology
construction
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
25
Summary
‘Power Tools’ for ontology
development are being implemented
and tested in the RKF project.
• A Script/Process vocabulary has
been developed and applied to
processes in cell biology, covering:
–
–
–
–
Temporal order
Participants
Pre/postconditions
Repetition
Artificial Intelligence Applications Institute
Centre for Intelligent Systems and their Applications
26