Biological Expression Language Overview

Download Report

Transcript Biological Expression Language Overview

Biological Expression Language Overview
August 2012
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy
of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative
Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
1
Contents
•
•
•
•
•
•
BEL Statements
BEL Statement Annotations
BEL Terms
BEL Functions
BEL Relationships
General Hints
BEL Statements
• Basic statement types:
Term Expression
Relationship
Term Expression
p(HGNC:CCND1) directlyIncreases kin(p(HGNC:CDK4))
Term Expression
complex(p(HGNC:CCND1), p(HGNC:CDK4))
3
BEL Statements
Term Expression
Relationship
Term Expression
a(CHEBI:corticosteroid) -> path(MESHD:"Insulin Resistance")
The abundance of molecules
designated by the name
“corticosteroid” in the CHEBI
namespace.
The pathology designated by
the name “Insulin
Resistance” in the MESHD
namespace.
4
BEL Statements
Term Expression
Relationship
Term Expression
a(CHEBI:corticosteroid) -> path(MESHD:"Insulin Resistance")
increases
5
BEL Statements
• Complex statement type:
– A causal statement can be used as the target term of a
causal statement
Term Expression
Causal Relationship
Causal Statement
p(HGNC:CLSPN) -> (kin(p(HGNC:ATR)) => p(HGNC:CHEK1, pmod(P)))
6
Contents
•
•
•
•
•
•
BEL Statements
BEL Statement Annotations
BEL Terms
BEL Functions
BEL Relationships
General Hints
BEL Statement Annotations
• Annotations provide information about one or more BEL
Statements
SET Citation = {"PubMed", "J Mol Med", "12682725", "200303-14","Limbourg FP|Liao JK",""}
SET Evidence = "high-dose steroid treatment decreases
vascular inflammation and ischemic
tissue damage after myocardial infarction and stroke
through direct vascular effects involving the
nontranscriptional activation of eNOS"
SET Species = "9606"
SET Tissue = "Vascular System"
SET Disease = "Stroke"
a(CHEBI:corticosteroid) -| bp(MESHD:"Inflammation")
8
Contents
•
•
•
•
•
•
BEL Statements
BEL Statement Annotations
BEL Terms
BEL Functions
BEL Relationships
General Hints
BEL Terms
function(ns:value)
• BEL terms minimally have the following components:
– Function
• Required
• Can be nested to create complex terms
– Namespace Abbreviation
• Optional
– Value
• Required
• Generally found in the referenced namespace
• BEL terms using values from different namespaces
can be equivalenced
10
BEL Terms
a(CHEBI:corticosteroid)
function - abundance()
path(MESHD:"Insulin Resistance")
function - pathology()
11
BEL Terms
a(CHEBI:corticosteroid)
Namespace abbreviation CHEBI
path(MESHD:"Insulin Resistance")
Namespace abbreviation –
MESHD
12
BEL Terms
a(CHEBI:corticosteroid)
Namespace value
bp(MESHD:"Insulin Resistance")
Namespace value
13
Equivalence of Terms
p(EG:207)
“the abundance of the protein
designated by EntrezGene id
207” (human AKT1)
p(SPAC:P31749)
“the abundance of the protein
designated by Swiss-Prot id
P31749” (human AKT1)
p(HGNC:AKT1)
Can unify to
p(HGNC:AKT1)
in the KAM
“the abundance of the protein
designated by HGNC gene
symbol ‘AKT1’” (human AKT1)
Terms are unified during compilation using information in the BEL
namespace equivalence documents
Contents
•
•
•
•
•
•
BEL Statements
BEL Statement Annotations
BEL Terms
BEL Functions
BEL Relationships
General Hints
BEL Functions
• Types of functions:
–
–
–
–
–
–
Abundances
Processes
Modifications of abundances
Activities
Transformations
List functions
• Abundances and processes are applied directly to
namespace values
• All other functions are applied to abundance functions!
BEL Functions - Abundances
• Abundances
–
–
–
–
–
–
abundance(), a()
geneAbundance(), g()
rnaAbundance(), r()
microRNAAbundance(), m()
complexAbundance(), complex()
compositeAbundance(), composite()
17
abundance(), a()
• Use abundance() to represent any abundances that
are not represented by a more specific abundance
type, including:
– Chemicals
• a(CHEBI:corticosteroid)
– Cellular structures
• a(GOCCTERM:"astral microtubule")
• No modification functions apply to abundance terms
• Generally, activity functions do not apply to
abundance terms
18
geneAbundance(), g()
• Use geneAbundance terms to represent DNA
– Can use to represent gene amplification and deletion
events
– Used in "gene scaffolding"
• g(HGNC:AKT1) transcribedTo r(HGNC:AKT1)
– Use in complexes to represent binding to promoters
• complex(p(HGNC:TP53), g(HGNC:CDKN1A))
• In BEL v1.0, the only modification function that can
be applied to gene abundances is fusion()
– g(HGNC:TMPRSS2,fusion(HGNC:ERG))
• No activity functions apply to geneAbundance terms
19
complexAbundance(), complex()
• Use complexAbundance() to represent molecular
complexes and binding events
• complexAbundance terms can take two forms:
– complexAbundance(ns:value)
• Used for named complexes
• E.g., complexAbundance(NCH:"AP-1 Complex")
– complexAbundance(<abundance term list>)
• Use to represent binding events or to define complexes by
components
• Unordered list
• E.g., complex(p(HGNC:FOS),p(HGNC:JUN))
20
compositeAbundance(), composite()
• Use to represent cases where multiple abundances
synergize to produce an effect
– Composite terms should not be used if any of the
abundances alone are reported to cause the effect
– Use composite terms only as subjects of statements
– E.g., composite(p(HGNC:TGFB1), p(HGNC:IL6))
21
BEL Functions - Processes
• Processes include biological phenomena that occur
at the level of the cell or organism
– biologicalProcess(), bp()
• E.g., bp(GO:"cellular senescence")
– pathology(), path()
• E.g., path(MESHD:"Muscle Hypotonia")
22
BEL Functions – Abundance Modifications
• Modifications are functions used as arguments within
abundance functions
• Currently supported modification types are:
– Variants - use to represent protein sequence variants, generally
resulting from a mutation or polymorphism
• substitution(), truncation(), fusion()
• E.g., p(HGNC:PIK3CA, sub(E, 545, K))
– PIK3CA protein with glutamic acid 545 substituted with a lysine
– Protein Modifications - use to represent post-translational
modifications of proteins
• Includes phosphorylation, ubiquitination, acetylation, glycosylation
• proteinModification()
• E.g., p(HGNC:HIF1A, pmod(H, N, 803))
– Modification of HIF1A by hydroxylation at amino acid asparagine 803
23
BEL Functions - Activities
• Activity functions are applied to protein, complex, and RNA
abundances to specify the frequency of events resulting from
the molecular activity of the abundance
– E.g., tport(complex(NCH:"EnaC Complex"))
• Transporter activity of the EnaC sodium channel complex
• This distinction is particularly useful for proteins whose
activities are regulated by post-translational modification
• BEL v1.0 supports 10 distinct activity functions:
– catalyticActivity, peptidaseActivity, gtpBoundActivity, transportActivity,
chaperoneActivity, transcriptionalActivity, molecularActivity,
kinaseActivity, phosphataseActivity, ribosylaseActivity
• molecularActivity() should be used to represent activities that
are not represented by a more specific function
24
BEL Functions - Transformations
• Transformations are events in which one class of
abundance is transformed or changed into a second
class of abundance
– Translocations
• translocation(), tloc()
• cellSecretion(), sec()
• cellSurfaceExpression(), surf()
– Reactions
• reaction(), rxn()
– Degradation
• degradation(), deg()
25
translocation(), tloc()
• Use translocation terms to represent the movement
of abundances from one cellular location to another
• E.g., tport(complex(NCH:"EnaC Complex")) => \
tloc(a(CHEBI:"sodium(1+)"), MESHCL:"Extracellular Space", \
MESHCL:"Intracellular Space")
– The transport activity of the EnaC Complex translocates
sodium ions from extracellular to intracellular
26
cellSecretion(), sec()
cellSurfaceExpression(), surf()
• sec() and surf() are convenience functions for
commonly used translocations
27
degradation(), deg()
• Generally used to indicate complete proteolysis of a
protein
• Do not use to indicate proteolysis which results in
functional cleavage products!
• During compilation Phase I, degradation nodes are
linked to the root abundance with a
directlyDecreases relationship
– E.g., deg(p(HGNC(MAPT))
– Compilation adds:
deg(p(HGNC:MAPT)) =| p(HGNC:MAPT)
BEL Functions – List Functions
• List functions used for:
– Protein family assignment
• p(PFH:"Cu-Zn SOD Family") hasMembers list(p(HGNC:SOD1), p(HGNC:SOD3))
– Complex component assignment
• complex(GOCCTERM:"gamma-secretase complex") hasComponents \
list(p(HGNC:PSEN1),p(HGNC:NCSTN),p(HGNC:APH1A),p(HGNC:PSEN2))
– Reactants and Products within a reaction term
• rxn(reactants(a(CHEBI:superoxide)), \
products(a(CHEBI:"hydrogen peroxide")))
29
Contents
•
•
•
•
•
•
BEL Statements
BEL Statement Annotations
BEL Terms
BEL Functions
BEL Relationships
General Hints
BEL Relationships
• Causal relationships
– increases, directlyIncreases, decreases, directlyDecreases,
rateLimitingStepOf, causesNoChange
• Correlative relationships
– negativeCorrelation, positiveCorrelation, association
• Biomarker relationships
– biomarkerFor, prognosticBiomarkerFor
• Assignment to groups
– hasMember, hasComponent, hasMembers, hasComponents
• Other
– isA, subProcessOf
• Genomic relationships
– transcribedTo, translatedTo, orthologousTo
31
BEL Relationships – Compiler Inserted
Relationships
• These relationships are not needed for creating BEL
statements
– Used only by the compiler
•
•
•
•
•
•
•
actsIn
hasModification
hasProduct
hasVariant
reactantIn
translocates
includes
32
Contents
•
•
•
•
•
•
BEL Statements
BEL Statement Annotations
BEL Terms
BEL Functions
BEL Relationships
General Hints
General BEL Hints
• BEL functions, relationships, and namespace values
are all case sensitive
• Every term must have a function
– Namespace values are always associated with an
abundance or process function
– Exception - cellular location values within a translocation
function
• Namespace values with spaces or unusual characters
require quotes
– E.g., complex(GOCCTERM:"gamma-secretase
complex")
34