Analyzing human variation with Galaxy

Download Report

Transcript Analyzing human variation with Galaxy

Analyzing human variation with Galaxy
Belinda Giardine and Cathy Riemer
Feb 8, 2012
Outline
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Fake example dataset
 SNP calls from Complete Genomics GS12880
 5 known disease variants added for illustration
 Various genes and parts of the gene (coding, regulatory, splicing, …)
 Realistic background for search, but not a realistic SNP combination
Uploading a file
Converting file format
Shared data
Importing datasets from library
Filtering SNPs
Filter results
Outline
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
PolyPhen2
Filtering PolyPhen2 results
PolyPhen2 results
Linking identifiers
Identifier fields
Join identifiers to result
Comparative Toxicogenomics Database (CTD)
Outline
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
SIFT inputs
Shared data
Workflow
Your workflows
Running the workflow
Running SIFT
Filter SIFT results
SIFT results
Outline
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Import predicted regulatory regions
Filter with intersect tool
PRPs results
Using ENCODE data
Again filter with intersect
DNase HSS results
Conservation
Histogram of phyloP scores
Filter on phyloP greater than or equal to 0.5
phyloP results
What we covered
 Part 1: Filtering out SNPs found in genomes of healthy individuals
 Uploading files
 Using Galaxy libraries
 Basic filtering
 Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
 PolyPhen2
 Gene-based analysis
 Part 3: Running new predictions for coding SNPs likely to be detrimental
 SIFT
 Workflows
 Part 4: Finding SNPs that fall in any given set of intervals
 Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Editing the dataset name and build