Advanced JAPE
Download
Report
Transcript Advanced JAPE
Advanced JAPE
Mark A. Greenwood
University of Sheffield NLP
Recap
• Installed and run GATE
• Understand the idea of
LR – Language Resources
PR – Processing Resources
• ANNIE
Understand the goals of information extraction
Loaded ANNIE into GATE
Constructed one or more gazetteer lists
• Created JAPE rules with simple RHS
University of Sheffield NLP
Overview
•
•
•
•
•
Simple RHS Limitations
The RHS API
Accessing Annotations and Features
Adding New Annotations
Hands-On
University of Sheffield NLP
Simple RHS Limitations
• The simple RHS of a JAPE rule can only add
simple annotations and features
Feature values are hard coded or can be copied from
annotations matched by the LHS
• You may need more complex processing
Removing temporary annotations
Building complex features
...
• Fortunately the RHS of a rule can consist of
arbitrary Java code – the possibilities are endless!
University of Sheffield NLP
The RHS API
• Java code provided as a RHS is used as
the body of this method:
public void doit(Document doc,
Map bindings,
AnnotationSet annotations,
AnnotationSet inputAS,
AnnotationSet outputAS,
Ontology ontology)throws JapeException
• This provides easy access to the document,
rule bindings and annotations.
DO NOT USE annotations IT IS DEPRECATED!
University of Sheffield NLP
Accessing Annotations
and Features
• Each labelled section of the LHS results in
an Annotation Set
• These Annotation Sets can be retrieved
from the bindings map
AnnotationSet set =
(AnnotationSet)bindings.get("labelname");
University of Sheffield NLP
Accessing Annotations
and Features
• When writing complex JAPE you will often
need to access annotation features
• All features of an annotation are stored in
a map
FeatureMap map = annotation.getFeatures()
• Each feature is accessed by name
Object obj = map.get(“featurename”)
University of Sheffield NLP
Adding New Annotations
• New annotations should always be
created in the outputAS
• To create an annotation you need
The annotation name
The start and end offset
A FeatureMap instance (can be empty)
outputAS.add(start,end,label,features)
University of Sheffield NLP
Shorthand Notation for
JAVA RHS
• Where a Java block refers to a single lefthand-side binding, JAPE provides a
shorthand notation:
Rule: RemoveDoneFlag
(
{Instance.flag == "done"}
):inst -->
:inst{
Annotation theInstance =
(Annotation)instAnnots.iterator().next();
theInstance.getFeatures().remove("flag");
}
University of Sheffield NLP
Shorthand Notation for
JAVA RHS
• A label :<label> on a Java block creates a
local variable <label>Annots within the
Java block which is the AnnotationSet
bound to the <label> label.
• The Java code in the block is only
executed if there is at least one annotation
bound to the label
University of Sheffield NLP
Hands On:
Extending the IE Example
• In the previous JAPE session you wrote a
rule to annotate phrases such as
Whitbread shares closed up 2p at 645p.
• Annotating the phrase is useful but there is
lots of information which would be useful
to extract as features
Starting price
Change in price
Closing price
University of Sheffield NLP
Hands On:
Extending the IE Example
• You will need to
Extract the closing price and change
• assume they are always in pence so you can get
the value by removing the trailing ‘p’
Get the minorType of the Lookup
Calculate the starting price
Create a new annotation with these values as
features
Your Turn!
Feel Free To Refer To The User Guide
And To Ask For Help
University of Sheffield NLP
Hands On:
Extending the IE Example
Phase: Shares
Input: Token Organization Lookup Money
Options: control = appelt
Rule:ShareChange
(
{Organization}
({Token})[0,3]
({Lookup.majorType=="change"}):lookup
({Token})[0,3]
({Money}):delta
{Token.string == "at"}
({Money}):closing
):change -->
{
try {
AnnotationSet change = (AnnotationSet)bindings.get("change");
Annotation delta = ((AnnotationSet)bindings.get("delta")).iterator().next();
Annotation closing = ((AnnotationSet)bindings.get("closing")).iterator().next();
boolean rise = ((AnnotationSet)bindings.get("lookup")).iterator().next().getFeatures().get("minorType").equals("Changes-up");
int deltaValue = Integer.parseInt(doc.getContent().getContent(delta.getStartNode().getOffset(),delta.getEndNode().getOffset()-1).toString());
int closingValue = Integer.parseInt(doc.getContent().getContent(closing.getStartNode().getOffset(),closing.getEndNode().getOffset()-1).toString());
int startValue = (rise ? closingValue - deltaValue : closingValue + deltaValue);
FeatureMap features = Factory.newFeatureMap();
features.put("rule","ShareChange");
features.put("opening",startValue+"p");
features.put("change",deltaValue+"p");
features.put("closing", closingValue+"p");
features.put("direction", (rise ? "up" : "down"));
outputAS.add(change.firstNode(),change.lastNode(),"ShareChange",features);
}
catch (Exception e) {
// ignore this for now
}
}