Analyzing human variation with Galaxy
Download
Report
Transcript Analyzing human variation with Galaxy
Analyzing human variation with Galaxy
Belinda Giardine and Cathy Riemer
Feb 8, 2012
Outline
Part 1: Filtering out SNPs found in genomes of healthy individuals
Uploading files
Using Galaxy libraries
Basic filtering
Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
PolyPhen2
Gene-based analysis
Part 3: Running new predictions for coding SNPs likely to be detrimental
SIFT
Workflows
Part 4: Finding SNPs that fall in any given set of intervals
Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Fake example dataset
SNP calls from Complete Genomics GS12880
5 known disease variants added for illustration
Various genes and parts of the gene (coding, regulatory, splicing, …)
Realistic background for search, but not a realistic SNP combination
Uploading a file
Converting file format
Shared data
Importing datasets from library
Filtering SNPs
Filter results
Outline
Part 1: Filtering out SNPs found in genomes of healthy individuals
Uploading files
Using Galaxy libraries
Basic filtering
Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
PolyPhen2
Gene-based analysis
Part 3: Running new predictions for coding SNPs likely to be detrimental
SIFT
Workflows
Part 4: Finding SNPs that fall in any given set of intervals
Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
PolyPhen2
Filtering PolyPhen2 results
PolyPhen2 results
Linking identifiers
Identifier fields
Join identifiers to result
Comparative Toxicogenomics Database (CTD)
Outline
Part 1: Filtering out SNPs found in genomes of healthy individuals
Uploading files
Using Galaxy libraries
Basic filtering
Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
PolyPhen2
Gene-based analysis
Part 3: Running new predictions for coding SNPs likely to be detrimental
SIFT
Workflows
Part 4: Finding SNPs that fall in any given set of intervals
Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
SIFT inputs
Shared data
Workflow
Your workflows
Running the workflow
Running SIFT
Filter SIFT results
SIFT results
Outline
Part 1: Filtering out SNPs found in genomes of healthy individuals
Uploading files
Using Galaxy libraries
Basic filtering
Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
PolyPhen2
Gene-based analysis
Part 3: Running new predictions for coding SNPs likely to be detrimental
SIFT
Workflows
Part 4: Finding SNPs that fall in any given set of intervals
Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Import predicted regulatory regions
Filter with intersect tool
PRPs results
Using ENCODE data
Again filter with intersect
DNase HSS results
Conservation
Histogram of phyloP scores
Filter on phyloP greater than or equal to 0.5
phyloP results
What we covered
Part 1: Filtering out SNPs found in genomes of healthy individuals
Uploading files
Using Galaxy libraries
Basic filtering
Part 2: Selecting known coding SNPs predicted to be damaging, then finding their genes
and associated pathways
PolyPhen2
Gene-based analysis
Part 3: Running new predictions for coding SNPs likely to be detrimental
SIFT
Workflows
Part 4: Finding SNPs that fall in any given set of intervals
Predicted regulatory regions, ENCODE functional data, phyloP conserved regions
Editing the dataset name and build