Advanced JAPE

Download Report

Transcript Advanced JAPE

Advanced JAPE
Mark A. Greenwood
University of Sheffield NLP
Recap
• Installed and run GATE
• Understand the idea of
 LR – Language Resources
 PR – Processing Resources
• ANNIE
 Understand the goals of information extraction
 Loaded ANNIE into GATE
 Constructed one or more gazetteer lists
• Created JAPE rules with simple RHS
University of Sheffield NLP
Overview
•
•
•
•
•
Simple RHS Limitations
The RHS API
Accessing Annotations and Features
Adding New Annotations
Hands-On
University of Sheffield NLP
Simple RHS Limitations
• The simple RHS of a JAPE rule can only add
simple annotations and features
 Feature values are hard coded or can be copied from
annotations matched by the LHS
• You may need more complex processing
 Removing temporary annotations
 Building complex features
 ...
• Fortunately the RHS of a rule can consist of
arbitrary Java code – the possibilities are endless!
University of Sheffield NLP
The RHS API
• Java code provided as a RHS is used as
the body of this method:
public void doit(Document doc,
Map bindings,
AnnotationSet annotations,
AnnotationSet inputAS,
AnnotationSet outputAS,
Ontology ontology)throws JapeException
• This provides easy access to the document,
rule bindings and annotations.
DO NOT USE annotations IT IS DEPRECATED!
University of Sheffield NLP
Accessing Annotations
and Features
• Each labelled section of the LHS results in
an Annotation Set
• These Annotation Sets can be retrieved
from the bindings map
AnnotationSet set =
(AnnotationSet)bindings.get("labelname");
University of Sheffield NLP
Accessing Annotations
and Features
• When writing complex JAPE you will often
need to access annotation features
• All features of an annotation are stored in
a map
FeatureMap map = annotation.getFeatures()
• Each feature is accessed by name
Object obj = map.get(“featurename”)
University of Sheffield NLP
Adding New Annotations
• New annotations should always be
created in the outputAS
• To create an annotation you need
 The annotation name
 The start and end offset
 A FeatureMap instance (can be empty)
outputAS.add(start,end,label,features)
University of Sheffield NLP
Shorthand Notation for
JAVA RHS
• Where a Java block refers to a single lefthand-side binding, JAPE provides a
shorthand notation:
Rule: RemoveDoneFlag
(
{Instance.flag == "done"}
):inst -->
:inst{
Annotation theInstance =
(Annotation)instAnnots.iterator().next();
theInstance.getFeatures().remove("flag");
}
University of Sheffield NLP
Shorthand Notation for
JAVA RHS
• A label :<label> on a Java block creates a
local variable <label>Annots within the
Java block which is the AnnotationSet
bound to the <label> label.
• The Java code in the block is only
executed if there is at least one annotation
bound to the label
University of Sheffield NLP
Hands On:
Extending the IE Example
• In the previous JAPE session you wrote a
rule to annotate phrases such as
 Whitbread shares closed up 2p at 645p.
• Annotating the phrase is useful but there is
lots of information which would be useful
to extract as features
 Starting price
 Change in price
 Closing price
University of Sheffield NLP
Hands On:
Extending the IE Example
• You will need to
 Extract the closing price and change
• assume they are always in pence so you can get
the value by removing the trailing ‘p’
 Get the minorType of the Lookup
 Calculate the starting price
 Create a new annotation with these values as
features
Your Turn!
Feel Free To Refer To The User Guide
And To Ask For Help
University of Sheffield NLP
Hands On:
Extending the IE Example
Phase: Shares
Input: Token Organization Lookup Money
Options: control = appelt
Rule:ShareChange
(
{Organization}
({Token})[0,3]
({Lookup.majorType=="change"}):lookup
({Token})[0,3]
({Money}):delta
{Token.string == "at"}
({Money}):closing
):change -->
{
try {
AnnotationSet change = (AnnotationSet)bindings.get("change");
Annotation delta = ((AnnotationSet)bindings.get("delta")).iterator().next();
Annotation closing = ((AnnotationSet)bindings.get("closing")).iterator().next();
boolean rise = ((AnnotationSet)bindings.get("lookup")).iterator().next().getFeatures().get("minorType").equals("Changes-up");
int deltaValue = Integer.parseInt(doc.getContent().getContent(delta.getStartNode().getOffset(),delta.getEndNode().getOffset()-1).toString());
int closingValue = Integer.parseInt(doc.getContent().getContent(closing.getStartNode().getOffset(),closing.getEndNode().getOffset()-1).toString());
int startValue = (rise ? closingValue - deltaValue : closingValue + deltaValue);
FeatureMap features = Factory.newFeatureMap();
features.put("rule","ShareChange");
features.put("opening",startValue+"p");
features.put("change",deltaValue+"p");
features.put("closing", closingValue+"p");
features.put("direction", (rise ? "up" : "down"));
outputAS.add(change.firstNode(),change.lastNode(),"ShareChange",features);
}
catch (Exception e) {
// ignore this for now
}
}