Beespace Prototype Design Meeting Entity Recognition

Download Report

Transcript Beespace Prototype Design Meeting Entity Recognition

Beespace Prototype Design Meeting
Entity Recognition
Jing Jiang
09/28/2005
Entity Recognition in Prototype V1
 Target entities: gene names
 Supervised learning: LingPipe (word
trigram and tag bigram model)
 Training data:


BioCreative (manually annotated)
Drosophila (generated from gene lists)
Sample Results
 http://sifaka.cs.uiuc.edu/jiang4/Beespace
Performance
 Some gene names without explicit
mention of “gene” can be captured

E.g., “glutathione S-transferase”
 Problems


Gene-like phrases, e.g., “China 2”, “13.8”
Mismatch of gene name boundaries and
noun phrase boundaries, e.g., “nicotinic” in
“nicotinic pathway”
V2 -- Entity Types
 Annotation guideline for BioCreative


Guideline for Beespace?
Ontology? (GENIA ontology)
 What to tag?



Genes and proteins
Family of genes
Gene descriptions
 Entity boundaries and noun phrase boundaries

Tag only noun phrases that refer to genes or tag any
occurrence of a gene name inside a noun phrase?
Sample Sentences
 A dose-dependent transactivation of human
hARE-mediated chloramphenicol
acetyltransferase (cat) gene expression was
observed upon treatments of the Hepa-1
transfectants with TPA, a known inducer, as
well as with CAPE.
 In the present study, we identified its preferred
binding sequence as 5'-CCCTATCGATCGATCTCTACCT-3' and characterized its DNA binding properties using truncated Mblk-1
mutants.
Sample Sentences (cont.)
 At least two kinds of nicotinic receptors
seem to be involved in honeybee
memory, an alpha-bungarotoxin-sensitive
and an alpha-bungarotoxin-insensitive
receptor.
 The involvement of nicotinic pathways in
memory formation and retrieval
processes was tested by injecting…
Sample Sentences (cont.)
 We report the cloning of a honeybee
CSP gene called ASP3c, as well as the
structural and functional characterization
of the encoded protein.
 Natural occurring variatioin in npr-1, a
gene encoding a putative receptor for an
NPY-like molecule, causes variation in
feeding behaviour.
Sample Sentences (cont.)
 The gene encoding ZENK, an EARLY
IMMEDIATE GENE well known in other
learning and memory contexts, has figured
prominently in molecular songbird research
thus far.
 This is because frequent contacts of these
types cause an increase in the expression of
the gene encoding a glucocortocoid receptor in
the hippocampus, and…
Training Data
 Dictionary
 Rules/guidelines
 Bootstrapping
 Cross-domain training

Can training data in other domains (fly,
human, etc.) still be useful?