Data generation - Bioinformatics
Download
Report
Transcript Data generation - Bioinformatics
Comparative
transcriptomics
of fungi
Group Nicotiana
Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki
Research objective
To study differences in gene expression in
related fungi species
Studies species:
- Reference genome
- RNA reads > 100 bp
- Preferably: Paired-end
- Related species (or at least: single-celled,
eukaryotic)
- Similar conditions
Data generation
- Cleaning reads: FastQC (cut value: 25)
- Mapping reads: TopHat
- Assemble and quantify transcripts:
Cufflinks
- Extracting the transcripts: gffread
Data processing
Extract top 100 expressed transcripts
- Unix command pipeline within Perl (sort,
head)
Determining gc-content and length of
transcripts
- Perl subroutines
Data processing
Determine intron length
- Perl: length (transcript) – length of the exons
(specified in Cufflinks output)
Codon usage
- Perl: loop to analyse codon by codon (within each
transcript)
- Count frequencies of codons, store in an array
- Calculate codon usage index (such as the effective
number of codons, NC)
Validation
- Run scripts on small, example datasets
Statistical analysis and
visualization
Use R to determine and visualize correlation
coefficients
Expression level
Species 1
Species 2
Transcript length
0.25*
-0.10*
CG-content
0.00
0.02
Codon usage bias
0.30*
-0.32*