Transcript Slide 1
BIOINFORMATIK I UEBUNG 2
http://icbi.at/bioinf
mRNA processing
splicing
Spliceosome assembly
U2
A
U1
GU
U4
U2AF
YAG
U6
U5
hnRNP
U1
U4
U2
A
U6
U5
YAG
SR proteins
kinases and phosphatases
U1
U4
U2
A
RNA helicases
U6
+ ~200 non-snRNP
proteins
U5
YAG
Cyclophilins
Different levels of regulation
Regulation of transcription
ChIP procedure
E/F
E/F
PPAR RXR
A/B
A/B
C
C
PPRE
PPRE
AACTAGGTCAAAGGTCA
Farnham, Nature Rev Genetics, 2009
DNA
microRNAs
http://www.mirbase.org/
Ensembl BioMart
UCSC Table Browser
UCSC Table Browser
Notepad++ and regular expressions
begin of line
>
any symbol
0 or more times
^ > . * \r \n
carriage return (CR)
line feed (LF)
Notepad++ and regular expressions
character
meaning
\
escape; used to make specials non-special
()
group; you can retrieve its contents e.g. with \1 for the first occurrence
[]
any character inside is considered a match
.
matches any character
*
match the previous character 0 or more times
+
match the previous character 1 or more times
{n}
match the previous character n times
^
if the first character in the regex, means “beginning of line”; inside [] means “not”
$
last character in the regex, means “end of line”
\s
any space character (space, tab)
\t
tab (-->)
\r
carriage return (CR)
\n
line feed (LF)
Notepad++ and regular expressions
^>.*\r\n
^[ACGT].*\r\n
^(.{20}).*\r\n
replace with
replace with
replace with
\1\r\n
\r\n
replace with
>
replace with
\r\n>
repeatMasking=none
replace with
\r\n
^>.*\r\n
replace with
.*(.{20})$
replace with
\1
Sequence Logo
http://icbi.at/logo
KEGG
Protein domains
Uniprot, Prosite, Interpro, Pfam, CD, SMART
Gene Ontology
The Gene Ontology project provides a controlled vocabulary to describe gene and gene
product attributes in any organism.
3 organizing principles
• cellular component (e.g. mitochondrium)
• biological process (e.g. lipid metabolism)
• molecular function (e.g. hydrolase activity)
Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a GO term
Evidence code
ISS
IEP
IMP
IGI
IPI
IDA
RCA
TAS
NAS
IC
ND
Inferred from Sequence Similarity
Inferred from Expression Pattern
Inferred from Mutant Phenotype
Inferred from Genetic Interaction
Inferred from Physical Interaction
Inferred from Direct Assay
Inferred from Reviewed Computational Analysis
Traceable Author Statement
Non-traceable Author Statement
Inferred by Curator
No biological Data available
Directed acyclic graph (DAG) with different levels and 2 relations (part_of, is_a)
Orthologs
Protein A
Homologs: A – B – C
Orthologs: B1 – C1
Paralogs: C1 – C2 –C3
Inparalogs: C2 – C3
Outparalogs: B2 – C1
Xenologs: A1 – AB1
Orthologous prediction
Ortholog databases
• YOGY (eukarYotic OrtholoGY) is a web-based resource and
integrates 5 independent resources (Sanger)
• COG Cluster of ortholog groups of proteins and KOG for 7
eukaryotic genomes (NCBI),
• Inparanoid (Center Stockholm Bioinformatics)
• HomoloGene (NCBI)
• OrthoMCL use Markov Clustering algorithm (University of
Pennsylvania)
Multiple sequence alignment (CLUSTALW)
Progressive tree alignment
Jalview
Exercise 2-1: REGULATORY GENOMICS
Pyruvate Carboxylase as example
Ensembl Biomart
1.1 For the human transcript NM_000920 (pyruvate carboxylase) find official gene
symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3'UTR sequence as
fasta file, length of 3'UTR
microRNA target prediction
1.2 Is there a complementary sequence within the 3'UTR of PC to postion 2-8 in the
sequence of microRNA hsa-mir-182.
UCSC genome browser
1.3 Position of transcript start site and transcription end of Pyruvate carboxylase
(NM_000920) in hg19 assembly
Exercise 2-1: REGULATORY GENOMICS
Find splicing signals
1.4 Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders
from pyruvate carboxylase using UCSC table browser and Notepad++
1.5 Construct in both cases sequence logo and frequency plot. Can you identify
(regulatory) sequence motifs?
Regulatory motifs (transcription factor binding sites)
1.6 We know from Chromatin immunoprecipitation (ChIP-seq) experiments in a mouse
cell line that the transcription factor Pparg is binding near the pyruvate carboxylase
gene and hence potentially regulate its transcription (ppar.wig). Show binding region as
custom track in UCSC genome browser and extract sequence.
Exercise 2-2: PROTEIN FUNCTION
Identify function /processes/pathways for a protein
2.1 What is the function of pyruvate carboxylase and in which pathways and
processes this enzyme is involved?
Show pathway maps and find Enzyme ID (EC) using KEGG
Identify functional domains and Gene Ontology Annotation of the protein sequence
using Uniprot, Prosite, Pfam
Find orthologs and perform multiple sequence alignment
2.2 Find ortholog protein sequences in Mus musculus, Rattus norvegicus,
Saccharomyces cervisiae, perform multiple sequence alignment using ClustalW, and
visualize with Jalview.