Beespace Prototype Design Meeting Entity Recognition
Download
Report
Transcript Beespace Prototype Design Meeting Entity Recognition
Beespace Prototype Design Meeting
Entity Recognition
Jing Jiang
09/28/2005
Entity Recognition in Prototype V1
Target entities: gene names
Supervised learning: LingPipe (word
trigram and tag bigram model)
Training data:
BioCreative (manually annotated)
Drosophila (generated from gene lists)
Sample Results
http://sifaka.cs.uiuc.edu/jiang4/Beespace
Performance
Some gene names without explicit
mention of “gene” can be captured
E.g., “glutathione S-transferase”
Problems
Gene-like phrases, e.g., “China 2”, “13.8”
Mismatch of gene name boundaries and
noun phrase boundaries, e.g., “nicotinic” in
“nicotinic pathway”
V2 -- Entity Types
Annotation guideline for BioCreative
Guideline for Beespace?
Ontology? (GENIA ontology)
What to tag?
Genes and proteins
Family of genes
Gene descriptions
Entity boundaries and noun phrase boundaries
Tag only noun phrases that refer to genes or tag any
occurrence of a gene name inside a noun phrase?
Sample Sentences
A dose-dependent transactivation of human
hARE-mediated chloramphenicol
acetyltransferase (cat) gene expression was
observed upon treatments of the Hepa-1
transfectants with TPA, a known inducer, as
well as with CAPE.
In the present study, we identified its preferred
binding sequence as 5'-CCCTATCGATCGATCTCTACCT-3' and characterized its DNA binding properties using truncated Mblk-1
mutants.
Sample Sentences (cont.)
At least two kinds of nicotinic receptors
seem to be involved in honeybee
memory, an alpha-bungarotoxin-sensitive
and an alpha-bungarotoxin-insensitive
receptor.
The involvement of nicotinic pathways in
memory formation and retrieval
processes was tested by injecting…
Sample Sentences (cont.)
We report the cloning of a honeybee
CSP gene called ASP3c, as well as the
structural and functional characterization
of the encoded protein.
Natural occurring variatioin in npr-1, a
gene encoding a putative receptor for an
NPY-like molecule, causes variation in
feeding behaviour.
Sample Sentences (cont.)
The gene encoding ZENK, an EARLY
IMMEDIATE GENE well known in other
learning and memory contexts, has figured
prominently in molecular songbird research
thus far.
This is because frequent contacts of these
types cause an increase in the expression of
the gene encoding a glucocortocoid receptor in
the hippocampus, and…
Training Data
Dictionary
Rules/guidelines
Bootstrapping
Cross-domain training
Can training data in other domains (fly,
human, etc.) still be useful?