In this exercise, we will do the following

Download Report

Transcript In this exercise, we will do the following

Regulatory Genomics Lab
Saurabh Sinha
PowerPoint by Casey Hanson
Regulatory Genomics | Saurabh Sinha | 2016
1
Exercise
In this exercise, we will do the following:.
1.
Use Galaxy to manipulate a ChIP track for BIN in D. Mel.
2.
Subject peak sets to MEME suite.
3.
Compare MEME motifs with Fly Factor Survey motifs for BIN.
4.
Subject peak set to a gene set enrichment test.
Regulatory Genomics | Saurabh Sinha | 2016
2
Step 0A: Local Files
For viewing and manipulating the files needed for this laboratory
exercise, insert your flash drive.
Denote the path to the flash drive as the following:
[course_directory]
We will use the files found in:
[course_directory]/07_Regulatory_Genomics/data/
Regulatory Genomics | Saurabh Sinha | 2016
3
Step 0B: Logging into Galaxy
Go to: http://biocluster.igb.illinois.edu
Click Enter
Click Login
Input your login credentials.
Click Login.
Regulatory Genomics | Saurabh Sinha | 2016
4
Computational Prediction of Motifs
In this exercise, we will upload a ChIP track of the transcription factor BIN in Drosophila
Melanogaster to Galaxy.
After performing various file manipulations, we will use the MEME suite to identify a
motif from the top 100 ChIP regions.
Subsequently, we will compare our predicted motif with the experimentally validated
motif for BIN at Fly Factor Survey.
Regulatory Genomics | Saurabh Sinha | 2016
5
Step 1: Upload BIN ChIP Track to Galaxy
Click Get Data and then Upload File
Upload our ChIP file:
[course_directory]/07_Regulatory_Genomics/data/BIN
_Fchip_s11_1000.gff
Set the File Format to gff.
Set Genome to dm3.
Click Execute
Regulatory Genomics | Saurabh Sinha | 2016
6
Step 2: Sort ChIP Track By Score
Click on Filter and Sort and Sort.
Under Sort Dataset, select our ChIP track.
Under on column, select c6 (column 6).
Under with flavor, select Numerical Sort.
Under everything in, select Descending order.
Click Execute.
Regulatory Genomics | Saurabh Sinha | 2016
7
Step 3: Obtain Top 100 ChIP Regions
Click on Text Manipulation and Select First.
Under Select first, enter 100 lines.
Under from, select our sorted ChIP data.
Click Execute.
.
Regulatory Genomics | Saurabh Sinha | 2016
8
Step 4A: Import dm3 genome
Under Shared Data, click Data Libraries
Click on dm3
Check dm3.fa
Click Go
Regulatory Genomics | Saurabh Sinha | 2016
9
Step 4B: Import dm3 genome
You should see a green box saying the dataset was imported.
Click on the Galaxy/UIUC Button to get back to your history.
You should see 3: dm3.fa in the history pane
Regulatory Genomics | Saurabh Sinha | 2016
10
Step 5A: Extract DNA of Top 100 ChIP Regions
Click on Fetch Sequences.
Click on Extract Genomic DNA.
Under Fetch sequences for intervals in select our top
100 ChIP regions.
Set Interpret features when possible to No.
Set Source for Genomic Data to History
Set Using reference file to 3: dm3.fa
Set Output data type to FASTA.
Click Execute.
Regulatory Genomics | Saurabh Sinha | 2016
11
Step 5: Download The Data
When finished, click on
our desktop.
to download the file to
This has already been done for you.
The resulting sequence is in the following file:
[course_directory]/07_Regulatory_Genomics/data/BIN_top_100.fasta
Regulatory Genomics | Saurabh Sinha | 2016
12
Step 6: Submit to MEME
DO NOT RUN THIS NOW. MEME TAKES A VERY LONG TIME.
In this step, we will submit the sequences to MEME
Go to the following address:
http://meme-suite.org/tools/meme
Upload your sequences file here
Enter your email address here.
Leave other parameters as default.
Click Start Search.
Regulatory Genomics | Saurabh Sinha | 2016
13
Step 7A: Analyzing MEME Results
Go to the following web address:
The webpage contains a summary of MEME’s findings.
It is also available on the results directory:
[course_directory]/07_Regulatory_Genomics/results/MEME.htm
Let’s investigate the top hit.
Regulatory Genomics | Saurabh Sinha | 2016
14
Step 7B: Analyzing MEME Results
To the right is a LOGO of our predicted motif, showing
the per position relative abundance of each
nucleotide
At the bottom are the aligned regions in each of our
sequences that helped produce this motif. As the pvalue increases (becomes less significant) matches
show greater divergence from our LOGO.
Regulatory Genomics | Saurabh Sinha | 2016
15
Step 7C: Analyzing MEME Results
Other predicted motifs do not seem as plausible.
Regulatory Genomics | Saurabh Sinha | 2016
16
Step 8A: Comparison with Experimentally
Validated Motif for BIN
FlyFactorSurvey is a database of TF motifs in Drosophila Melanogaster.
Go to the following link to view the motif for BIN:
http://pgfe.umassmed.edu/ffs/TFdetails.php?FlybaseID=FBgn0045759
Regulatory Genomics | Saurabh Sinha | 2016
17
Step 8B: Comparison with Experimentally
Validated Motif for BIN
Actual BIN Motif
Best MEME Motif
Best MEME Motif
Reverse
Complemented
There is strong agreement between the actual motif and the reverse
complement of MEME’s best motif. This indicates MEME was actually
able to find the motif from the top 100 ChIP regions for this TF.
Regulatory Genomics | Saurabh Sinha | 2016
18
Gene Set Enrichment Analysis
In this exercise, we will extract the nearby genes for each one of the ChIP peaks for
BIN.
We will then subject the nearby genes to enrichment analysis tests on various Gene
Ontology gene sets utilizing DAVID.
Regulatory Genomics | Saurabh Sinha | 2016
19
Step 9A: Acquire Nearby Genes
In this step, we will acquire all genes in Drosophila Melanogaster using
UCSC Main Table Browser:
https://genome.ucsc.edu/cgi-bin/hgTables .
Regulatory Genomics | Saurabh Sinha | 2016
20
Step 9B: Acquire Nearby Genes
Ensure the following settings are configured.
Click get output and then get BED.
Regulatory Genomics | Saurabh Sinha | 2016
21
Step 9C: Acquire Nearby Genes
Click Get Data and then Upload File
Upload our gene file: flygenes.bed
Set the File Format to bed.
Set Genome to dm3.
Click Execute
Regulatory Genomics | Saurabh Sinha | 2016
22
Step 9D: Acquire Nearby Genes
Select Operate on Genomic Intervals
Then Select Fetch Closest non-overlapping
interval feature.
Regulatory Genomics | Saurabh Sinha | 2016
23
Step 9E: Acquire Nearby Genes
For For every interval feature in select our original ChIP track.
For Fetch closest features from select the UCSC genes track we just
downloaded.
Click Execute
Regulatory Genomics | Saurabh Sinha | 2016
24
Step 10A: Cut Out Genes
The resulting file has the list of nearby genes in CG format in the 12th
column.
We are only interested in the genes, so we need to cut them out using
the CUT tool.
Under Text Manipulation click Cut
Regulatory Genomics | Saurabh Sinha | 2016
25
Step 10B: Cut Out Genes
For Cut Columns type c12 to denote column 12.
Under Delimited By select Tab
Under From select the track we just generated: the intersection of the
ChIP-peaks and Fly Base genes.
Click Execute.
Regulatory Genomics | Saurabh Sinha | 2016
26
Step 11A: Convert IDs
Save the resulting file. Move it to the course directory and rename it:
[course_directory]/07_Regulatory_Genomics/data/cg_transcripts.txt
The enrichment tool we will use doesn’t accept genes in this format.
We will use the FlyBase ID converter to convert these transcript ids into
FlyBase transcript ids.
Regulatory Genomics | Saurabh Sinha | 2016
27
Step 11B: Convert IDs
Go to http://flybase.org/static_pages/downloads/IDConv.html
Upload our cg_transcript.txt file and hit Go.
On the next page, click file, uniq IDs only to download the file of converted IDs.
Regulatory Genomics | Saurabh Sinha | 2016
28
Step 12A: Gene Set Enrichment - DAVID
Move the resulting file from the previous analysis to the course
directory and rename it:
[course_directory]/07_Regulatory_Genomics/data/fb_transcripts.txt
With our correct ids of transcripts of genes near ChIP peaks, we now
wish to perform a gene set enrichment analysis on various gene sets.
A tool that allows us to do this from a web interface is DAVID located at
the following address:
http://david.abcc.ncifcrf.gov/tools.jsp
Regulatory Genomics | Saurabh Sinha | 2016
29
Step 12A: Gene Set Enrichment - DAVID
We will perform a Gene Set Enrichment Analysis on our
transcript list (gene list) and see what GO categories we are
significantly enriched in.
Analyze the gene list with Functional Annotation Tool
Click Choose File on select our fb_transcripts.txt file.
Under Select Identifier select FLYBASE_TRANSCRIPT_ID.
Under Step 3: List Type check Gene List.
Click Submit List.
Regulatory Genomics | Saurabh Sinha | 2016
30
Step 12B: Gene Set Enrichment - DAVID
On the next page, select Functional Annotation Chart.
Our gene set seems to be enriched in the BP_FAT GO category!
This is consistent with the activity of the BIN transcription factor in the
literature.
Regulatory Genomics | Saurabh Sinha | 2016
31