NLP using Active Structure
Download
Report
Transcript NLP using Active Structure
From Words to Knowledge
ORION
Active Structure
Two Approaches
We could separate the process of turning
words into knowledge into its components, or
we could adopt a more holistic approach.
ORION
Active Structure
A Sequence of Activities
Words
POS Tags
Grammar
Semantics
Knowledge
This approach segments the process into separate
parts, each of which is blind to all the others. This
seems easier conceptually, but is obviously not what
people do in reading text.
ORION
Active Structure
The Holistic Approach
Global
Knowledge
Semantics
Words
Active
Structure
Grammar
Local
Knowledge
The lexing, grammar, semantic and structure-building
processes proceed simultaneously and synergistically,
opportunistically using any information coming from
any direction
ORION
Active Structure
The Basic Elements
These are the basic elements of Active Structure
- variables, operators, links, values flowing in the
structure
ORION
Active Structure
A Common Substrate
PARSE
The basic elements of Active Structure
can also be seen as
Entities, Relations and States
These three elements are adequate to model everything
- including the grammar of language and the world of
objects
ORION
Active Structure
The Reading Process
A document is read, paragraph by paragraph, sentence
by sentence, word by word.
As the words are read, they are turned into objects that
can be manipulated - objects that have the properties
both of words and of the objects they represent - a
ligand, a gene
The word objects are assembled through grammar into
larger objects - receptor or gene structure
And into larger structures, using the relations between
the objects provided by nouns and verbs
ORION
Active Structure
Transformation
changes in the conformation of the Tsr dimer induced by
serine binding improve methylation efficiency
ORION
Active Structure
Building Structure
Four Word
Noun Phrase
Start
The
Red
Car
When a single possible structure match is found, an
invocation of the structure is built, leaving a new BRIDGE
operator to look for higher level matches
ORION
Active Structure
Stop
Next Symbol
Prep
Phrase
Noun
Phrase
Start
The
Hat
Verb
Phrase
Pre
position
Was
On
Noun
Phrase
Pre
position
Head
Of
Noun
Phrase
Man
The Next Symbol depends on the local structure - run
down from the current symbol, then run up again if other
structure exists, otherwise jump across a PARSE
ORION
Active Structure
Stop
Harpooning the Model
When the noun
phrase is recognised,
the objects it joins are
searched for
connection - one is
found for animal and
colour through
ATTRIBUTE, so the
same relation joining
the objects is
searched for in the
model, and a unique
match is “harpooned”
for use with relations the type of object
changes the grammar
ORION
Active Structure
Automatic Phasing
Four Word
Noun Phrase
Start
The
Red
Car
A BRIDGE operator doing a long match may find not
all the information is available
If so, it puts a connection on the missing information
and waits to be re-activated
ORION
Active Structure
Stop
In the Process of Building
Part of a sentence under construction - hundreds of
different active structures are cooperating in the process building up, cutting out, reversing connections
ORION
Active Structure
Tight Integration
The structure combines lexical information,
grammar and semantics - we pick up the fact
that a word is a noun because it is an Entity,
we know something isn’t a Material because
the Verb says not.
This tight and immediate interweaving of
lexical, grammatical and semantic analysis
allows us to do things that are not possible
with a static sequential approach.
ORION
Active Structure
Scientific Sentences Are Complex
The synergistic effect of serine and CheW binding to Tsr is attributed to
distinct influences on receptor structure; changes in the conformation of the
Tsr dimer induced by serine binding improve methylation efficiency, and
CheW binding changes the arrangement among Tsr dimers, which
increases access to methylation sites.
ORION
Active Structure
Grammar Is Not Enough
Grammar alone would turn meaningful scientific text into
sludge - a participial phrase “induced by...” has to be
anchored on the right object, a relative pronoun “which”
has to be anchored on the relation
The reading process demands that domain knowledge
be available at every turn - knowledge that is held in
object hierarchies and relations, and which is
seamlessly intermingled with grammatical knowledge
during the parsing
ORION
Active Structure
What Does It Rely On
The paradigm relies on dynamic construction and
destruction of active structure, where operators in the
structure respond to their local environment by
changing the local topology, and then respond to the
changed environment, and so on.
Each operator can only transmit information through its
links, change its connections, add structure or destroy
itself.
Their interaction suffices to cause all the necessary
processes to proceed in parallel, in an opportunistic
and synergistic manner.
ORION
Active Structure
Typical Domain Knowledge Model
The model is built out of the same
variables, operators, links as the
grammatical and semantic
structures, so it can interact with
them
Greece Info
(GIS)
Find distance between
site and epicentre,
local conditions, etc.
Attenuation
Acceleration attenuation
based on magnitude,
distance and local site
conditions
Intensity/
Damage
Relations
between
acceleration,
intensity and
damage ratio
Earthquake
Event
ORION
Active Structure
Relations between magnitude and
frequency, building type, number
of floors and natural frequency
Frequency/
Amplification
Genetic Knowledge
Anatomy
ABCA1: ATP-binding cassette, sub-family A (ABC1), member 1
Genes
LocusID: 19
Overview
Family ABC (transporter across membranes)
Subfamily ABC1 (members ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White)
Gene ABCA1
Protein NP_005493
Substrate Cholesterol
Function cholesterol efflux pump associated lipid removal pathway
Mutation causes Tangier’s disease, familial high-density lipoprotein deficiency.
Liver
Brain
?
The membrane-associated protein encoded by this gene is a member of the
superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport
various molecules across extra- and intracellular membranes. ABC genes are
divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP,
GCN20, White). This protein is a member of the ABC1 subfamily. With cholesterol
as its substrate, this protein functions as a cholesteral efflux pump in the cellular
lipid removal pathway. Mutations in this gene have been associated with Tangier's
disease and familial high-density lipoprotein deficiency.
Diseases
Kidney
Eye
Brain
Cortex
ABC
HKT
PFIC
Proteins
PKC
NP_0001
NP_0047
Chromosome: 9 mv Cytogenetic: 9q31.1 RefSeq
The structure is used to understand the text then the text is used to extend the structure
ORION
Active Structure
Liver
FF
Why Do This
The automated process of Information Extraction needs
to be in the same state as a knowledgeable human
reader at every point in the text, so inferences about
alternatives and anaphora are made on the same basis
- the basis on which the writer expects them to be
made.
The automated process also needs the ability to
backtrack when reading more text refutes assumptions
already built into any part of the structure.
ORION
Active Structure
Is It Really So Different
We are asserting that knowledge can only be captured in
active structure - structure that is capable of adapting
itself to its environment.
Efforts at capturing knowledge in static structure founder
on two reefs - the pieces of structure will not fit together
statically, and an algorithm that could manage their
combination would be more complex than the
combination of the pieces, and is thus unmanageable.
Active Structure avoids both problems - the pieces adapt
to each other, and the behavior of the combination is
managed by the interaction of the pieces.
ORION
Active Structure
ORION
Active Structure