Bioinformatics Institute work with ASAS Genomics Centre By Dan
Download
Report
Transcript Bioinformatics Institute work with ASAS Genomics Centre By Dan
Bioinformatics Institute work with
ASAS Genomics Centre
By Dan Jones
Bioinformatics Institute work with
ASAS Genomics Centre
What’s going on in the genomics scene?
Who are we?
What do we provide?
Case studies of where we’ve added value!
The genomics scene
Worldwide:
• Huge growth in sequencing and analysis capabilities (the “$1000
genome”); new technology types emerging
• NZ scientists need easy access to these capabilities
In New Zealand:
• “Big data” is becoming more common, particularly in health research
• Specialised projects are across a wide range of areas: medicine,
agriculture, horticulture, NZ flora/fauna
• Clinical sequencing is on the rise
What does all this mean?
The genomics field is changing fast!
To stay relevant, you need access to
• Excellent experimental design
• Genomics experts (like Kristine, Tim, and Liam)
• Bioinformatics experts (that’s us!)
• Computational platforms
• Assistance with turning your research into outputs
Who are we?
• The Bioinformatics Institute
o A Faculty of Science centre
• Four bioinformaticians available to help you at UoA
• We work with the ASAS Genomics Centre on
experimental design, analysis, and more
• We also work via NZGL with other genomics experts,
facilities, and bioinformaticians around NZ
What help can we provide?
• Accessible, customised genomics solutions
o Everything from design and data collection through
to analysis and training
• End-to-end service with expert help at all points,
including with research outputs
• Software and IT resources with secure data
management
Design
Data
generation
(ASAS)
Storage /
processing
Bio-IT
service
Analysis /
advice
Access any or all parts of this service as YOU need!
Bioinformatics services to help your genomics research
• Training and workshops – Introductory and specific applications
• Experimental design
• Grant writing assistance including collaborations
• Individual or group ‘coaching’ assistance – helping you work with your own
data
• Quality assessment of data from any source
• Analysis of any dataset
– Experiment-based (e.g., RNAseq, expression microarrays,
resequencing etc.)
– Project-based (e.g., simulation, annotation, network reconstruction
etc.)
Bio-IT services we can access
Infrastructure and computational environment
• A integrated mix of hardware, storage, software and
support
• Where you need it, when you need it and as much/little
as you want - accessible from anywhere
• Ideal for collaborative multi-site projects
• Software and databases updated regularly
• Direct support from IT experts and other
bioinformaticians
• “Tuned” to the needs of genomics researchers
Bio-IT software resources tailored for you
• Rich set of applications catering for a wide range of
users: command line through to web interface
• Support for collaborative work: a shareable workspace
and account for each project which you control
• Standard bioinformatics pipelines, utilities, and tools,
including key databases and Galaxy server
• Flexibility to include software you have already licensed
(e.g., Geneious)
• Access your raw data (automatically for our genomics
projects)
Why is our focus on collaborative
experimental design so valuable?
It’s really hard to talk about this in the
abstract, so… let’s look at some case
studies
Case study 1:
RNAseq and differential expression
What is the status of the reference genome?
Coverage? Completeness?
Accuracy of gene predictions? Prediction of non-genic features?
What
is an appropriate
experimental
Who published
it? Are there
likely to be design?
further revisions? Is it available for use?
Can
you
get
enough
RNA?
Is
the
tissue
recalcitrant?
Are you
some
RNA
extraction
Is it the same breed/strain/cultivar/cell line
as the system
are
working
with? methods
likely to result in biases? Is DNA contamination going to be a problem?
What is known about the transcriptome in these tissues? Do you have particular genes of
interest
and is thedo
design
going to detect them? Are you interested in mRNA, small RNAs, or
What outcomes
you want?
all RNA?
Are you
using
controls?
Do
you simply
want
a listappropriate
of differentially
expressed genes? Do you want to investigate coexpression of genes? Effects of promotors? Which isoforms are dominant? Do you want indepth investigation of a particular gene, set of genes, pathway?
How are you going to interact with your results? Do you have a genome browser set up? Do
you want to allow time for investigation of unusual or unexpected results?
An example of a fairly standard experimental design
● Known reference genome: Eukaryotic model system
● Two tissue types / two conditions
● The biological question: at the level of the
transcriptome..
○ What is the difference between the tissue types?
○ What effect does the treatment have?
Case study 1:
experimental design
● RNA extraction method was determined to be appropriate; however, we
added ERCC spike-in controls
● Literature review of similar studies in this tissue/system allowed us to
determine an appropriate volume of sequencing on the HiSeq platform
● Similarly, the likely variability of the transcriptome was assessed in a
literature review: this has implications for the appropriate number of
biological repeats
● Total RNA kits were used; client was not specifically interested in small /
ncRNA but wanted this data available
● Numerous errors were discovered in one publicly available source of the
reference genome: it turned out that this site wasn’t being maintained. We
spotted the errors and switched to another source.
Case study 1:
The process
The Process
RNA extraction
Total RNA library
generation +
multiplexing
Where we added value
NZGL supplies and adds ERCC spike-in controls
Stringent QC of the library preparation process
HiSeq sequencing
Demultiplexing +
Quality control
Bioinformatics
Stringent QC and quality trimming of the data
Data delivery, storage, backup (remote access)
The Process
Preprocessing +
quality trimming
Mapping of reads to
reference genome
Differential
expression analysis
Where we added value
NZGL Bioinformaticians have published the SolexaQA package; one of the
most commonly used QC tools for NGS data. (New version out!!)
Ongoing “sanity checks”: Checking the right reference genome is used.
Checking the right gene predictions are used.
Allowing downstream analysis of ERCC spike-in controls.
Ongoing “sanity checks”:
Are biological repeats behaving as expected?
What is the distribution of transcript lengths? Abundances? What are
the implications?
Are controls behaving as expected?
Functional
enrichment /
pathway analysis
Case study 1: Bioinformatics
The Process
Where we added value
Preprocessing +
quality trimming
Mapping of reads to
reference genome
Differential
expression analysis
Functional
enrichment /
pathway analysis
Whole-transcriptome level analyses
Analysis of individual genes, sets of genes, isoforms, “shared promotor”
genes
Publication-quality plots and graphics
Case study 2:
Deliberately degraded RNA
An example of a very non-standard experimental design!
● RNA that had been deliberately degraded
● Many different tissue types of interest
● The biological question:
○ How do particular RNA species degrade over time?
○ Are particular regions of the transcript more or less
stable?
How did we resolve this?
Challenges:
Sequencing was conducted on a “best effort” basis but with no guarantees of success.
● RNA was guaranteed to fail every quality metric.
How did we resolve this?
Used non-standard kits (Bioo Scientific) that are less likely to result in biases
● In this situation, some library preparation methods may
result
in biases
in sequencing that look like differential
How did
we resolve
this?
Modified
existing workflows;but
focusing
down
on known genes of interest; custom scripts to
degradation
are
not!
show differential coverage across genes of interest.
How did we resolve this?
ERCC spike-in controls allow us to detect sequencing bias (since the controls were not
degraded) and therefore discriminate between sequencing bias and differential degradation
● No standard analysis workflow
● Impossible to tell the difference between differential
degradation and sequencing bias
How can you initiate an enquiry?
www.nzgenomics.co.nz
...or, just talk to anyone in the team!
Bioinformaticians in the team
Now how can we help with your projects?
Thank you!