課程講義

Download Report

Transcript 課程講義

Integrative Genomics Viewer介紹
國家基因體醫學中心
葉爾瞻
課程目標
 能自行安裝、設定與啟動IGV軟體
 能使用IGV瀏覽variants(.vcf)
 能使用IGV瀏覽alignments(.sam/.bam)
 能使用IGV開啟外部資料庫做為參考
Overview
 http://www.broadinstitute.org/software/igv/home
 high-performance visualization tool for interactive
exploration of large, integrated genomic datasets.
 supports a wide variety of data types, including arraybased and next-generation sequence data, and
genomic annotations.
Citing IGV
 Helga Thorvaldsdottir, James T. Robinson, Jill P. Mesirov.
Integrative Genomics Viewer (IGV): high-performance
genomics data visualization and exploration. Briefings
in Bioinformatics 2012.
 James T. Robinson, Helga Thorvaldsdóttir, Wendy
Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz,
Jill P. Mesirov. Integrative Genomics Viewer. Nature
Biotechnology 29, 24–26 (2011)
準備Java環境
 IGV是Java軟體,可以透過Java Web Start啟動。系
統需要有Java 1.6以上的環境。
 系統需要javaws.exe才能使用Java Web Start
 開始->在“搜尋程式及檔案”方塊中輸入javaws.exe
 叫出命令提示字元,輸入命令java –version確認
版本在1.6以上
 如果需要安裝java環境,請從http://www.java.com
下載並安裝。
 如果是64-bit作業系統,建議安裝64-bit版本的Java。
 64-bit的Java不能安裝在32-bit的作業系統中。
下載、安裝與啟動
 IGV下載網址:(需先註冊)
http://www.broadinstitute.org/software/igv/download
 Web Start:依照本機的記憶體大小下載對應的檔
案
 2Gb或更少:使用750Mb版本
(http://www.broadinstitute.org/igv/projects/current/igv.jnlp)
 2Gb以上,32-bit 作業系統:使用1.2Gb的版本
(http://www.broadinstitute.org/igv/projects/current/igv_mm.jnlp)
 更大的記憶體需要在64-bit的Java環境才能使用。
 或者下載Binary Distribution,直接解壓縮即可使用。
下載、安裝與啟動
 Binary Distribution設定
 使用文字編輯器打開igv.bat(Mac user請打開igv.sh)
 將-Xmx750m 的750改為適合您的大小,存檔。
 執行igv.bat(Mac 使用者請執行igv.sh)
IGV文件
 官網首頁有完整的文件
 使用手冊:




http://www.broadinstitute.org/software/igv/UserGuide
新版(2.0)功能說明
http://www.broadinstitute.org/software/igv/Version2.0G
uide
Forum:https://groups.google.com/forum/#!forum/igvhelp
所支援的各種檔案各有說明文件
本投影片的功能說明,大多節錄自使用手冊
Reference Genome
 IGV有預設的許多物種的Reference
Genomes可以選擇:
http://www.broadinstitute.org/softwar
e/igv/Genomes
 我們這裡使用1000 genomes project
使用的genome(1kg, b37+decoy)。
 如果自己要的reference genome沒有
在清單中,也可以透過import
genome的功能建立
Viewing the Reference Genome
 Note that the sequence and the arrow are only
displayed when zoomed in to a sufficiently small region.
 You can change the strand that is displayed by clicking
on the arrow in the title to the left of the track.
 you can optionally display a 3-band track that shows a
3-frame translation of the amino acid sequence for the
corresponding nucleotide sequence.
Viewing the Reference Genome
Feature Track(Genes and Transcripts)
 Collapsed
 Squished
 Expanded
 Query by gene name
Loading Data and Attributes
 Load from File
 see IGV File Formats for supported file formats
 Load from URL
 Load from Server
 Load from a Distributed Annotation System (DAS)
Removing Tracks and Attributes
 To remove all tracks and attributes:
 Click File>New Session. This is essentially the same as
restarting IGV.
 To remove specific tracks:
 Right-click a track name and select Remove Tracks in the
pop-up menu.
 Control-click track names for multiple selections
Viewing Variants (vcf)
 Open sample vcf file (from 1000 genomes project)
ALL.chrMT.phase1.20101123.vcf
 build index with igv tools




select File>Run igvtools
select index as Command
specify the location of the vcf file
Run
 Set locus to MT
 Zoom in/ Zoom out
 Go back/Go forward
 Set Feature Visibility Windows
Viewing Variants (vcf)
shows the allele fraction for a
single locus.
2. the genotypes for each locus in
each sample.
1.




Dark blue = heterozygous
Cyan = homozygous
Grey = reference.
To change the color coding of
the plot, select Color By>Allele
About VCF format
 less {sample vcf file}.
 Or try to open vcf file with excel
 VCF Format 4.1 Spec
 http://www.1000genomes.org/wiki/Analysis/Variant%20
Call%20Format/vcf-variant-call-format-version-41
 VCF Poster
 http://vcftools.sourceforge.net/VCF-poster.pdf
 Fixed Fields
 Genotype Fields (Optional)
 Meta-information lines
Viewing Alignments
 Remove 1000 genomes track(or Click File>New Session)
 Load sample files from URL(Ctrl-click for multi-select)
 1_149003461-149460645.bam
 1_149003461-149460645.vcf
 Set locus to 1:149022000-149023000
 View>Preferences for alignments view settings
 downsampling
 Can be Collapsed, Squished or Expanded
Viewing Alignments
 Control+click a read to find its mate
 Split Screen View
 Right-click over an alignment and select View mate
region in split screen from the drop-down list
 Return to Normal View
 To return to the “normal view”, double-click the name
panel at the top of one of the panes, or right-click in a
name panel and select Switch to standard view.
Interpreting Color by Insert Size
 Red for an inferred insert
size that is larger than
expected (deletion)
 Blue for an inferred insert
size that is smaller than
expected (insertion)
 Other colors for for paired
end reads that are coded
by the chromosome on
which their mates can be
found(Inter-chromosomal
Rearrangement)
Interpreting Color by Pair Orientation
 The orientation of paired reads can be used to detect
structural events including: inversions, duplications and
translocations
(*)Splice Junctions
 The splice junction view displays an alternative
representation of .bed files encoding splice junctions,
such as the "junctions.bed" file produced by the TopHat
program.
About SAM(BAM) format
 SAM/BAM spec
 http://samtools.sourceforge.net/SAM1.pdf
 A human-readable summary
 http://genome.sph.umich.edu/wiki/SAM
 Explan SAM flags
 http://picard.sourceforge.net/explain-flags.html
本日課程結束
謝謝