Transcript 課程講義
Integrative Genomics Viewer介紹
國家基因體醫學中心
葉爾瞻
課程目標
能自行安裝、設定與啟動IGV軟體
能使用IGV瀏覽variants(.vcf)
能使用IGV瀏覽alignments(.sam/.bam)
能使用IGV開啟外部資料庫做為參考
Overview
http://www.broadinstitute.org/software/igv/home
high-performance visualization tool for interactive
exploration of large, integrated genomic datasets.
supports a wide variety of data types, including arraybased and next-generation sequence data, and
genomic annotations.
Citing IGV
Helga Thorvaldsdottir, James T. Robinson, Jill P. Mesirov.
Integrative Genomics Viewer (IGV): high-performance
genomics data visualization and exploration. Briefings
in Bioinformatics 2012.
James T. Robinson, Helga Thorvaldsdóttir, Wendy
Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz,
Jill P. Mesirov. Integrative Genomics Viewer. Nature
Biotechnology 29, 24–26 (2011)
準備Java環境
IGV是Java軟體,可以透過Java Web Start啟動。系
統需要有Java 1.6以上的環境。
系統需要javaws.exe才能使用Java Web Start
開始->在“搜尋程式及檔案”方塊中輸入javaws.exe
叫出命令提示字元,輸入命令java –version確認
版本在1.6以上
如果需要安裝java環境,請從http://www.java.com
下載並安裝。
如果是64-bit作業系統,建議安裝64-bit版本的Java。
64-bit的Java不能安裝在32-bit的作業系統中。
下載、安裝與啟動
IGV下載網址:(需先註冊)
http://www.broadinstitute.org/software/igv/download
Web Start:依照本機的記憶體大小下載對應的檔
案
2Gb或更少:使用750Mb版本
(http://www.broadinstitute.org/igv/projects/current/igv.jnlp)
2Gb以上,32-bit 作業系統:使用1.2Gb的版本
(http://www.broadinstitute.org/igv/projects/current/igv_mm.jnlp)
更大的記憶體需要在64-bit的Java環境才能使用。
或者下載Binary Distribution,直接解壓縮即可使用。
下載、安裝與啟動
Binary Distribution設定
使用文字編輯器打開igv.bat(Mac user請打開igv.sh)
將-Xmx750m 的750改為適合您的大小,存檔。
執行igv.bat(Mac 使用者請執行igv.sh)
IGV文件
官網首頁有完整的文件
使用手冊:
http://www.broadinstitute.org/software/igv/UserGuide
新版(2.0)功能說明
http://www.broadinstitute.org/software/igv/Version2.0G
uide
Forum:https://groups.google.com/forum/#!forum/igvhelp
所支援的各種檔案各有說明文件
本投影片的功能說明,大多節錄自使用手冊
Reference Genome
IGV有預設的許多物種的Reference
Genomes可以選擇:
http://www.broadinstitute.org/softwar
e/igv/Genomes
我們這裡使用1000 genomes project
使用的genome(1kg, b37+decoy)。
如果自己要的reference genome沒有
在清單中,也可以透過import
genome的功能建立
Viewing the Reference Genome
Note that the sequence and the arrow are only
displayed when zoomed in to a sufficiently small region.
You can change the strand that is displayed by clicking
on the arrow in the title to the left of the track.
you can optionally display a 3-band track that shows a
3-frame translation of the amino acid sequence for the
corresponding nucleotide sequence.
Viewing the Reference Genome
Feature Track(Genes and Transcripts)
Collapsed
Squished
Expanded
Query by gene name
Loading Data and Attributes
Load from File
see IGV File Formats for supported file formats
Load from URL
Load from Server
Load from a Distributed Annotation System (DAS)
Removing Tracks and Attributes
To remove all tracks and attributes:
Click File>New Session. This is essentially the same as
restarting IGV.
To remove specific tracks:
Right-click a track name and select Remove Tracks in the
pop-up menu.
Control-click track names for multiple selections
Viewing Variants (vcf)
Open sample vcf file (from 1000 genomes project)
ALL.chrMT.phase1.20101123.vcf
build index with igv tools
select File>Run igvtools
select index as Command
specify the location of the vcf file
Run
Set locus to MT
Zoom in/ Zoom out
Go back/Go forward
Set Feature Visibility Windows
Viewing Variants (vcf)
shows the allele fraction for a
single locus.
2. the genotypes for each locus in
each sample.
1.
Dark blue = heterozygous
Cyan = homozygous
Grey = reference.
To change the color coding of
the plot, select Color By>Allele
About VCF format
less {sample vcf file}.
Or try to open vcf file with excel
VCF Format 4.1 Spec
http://www.1000genomes.org/wiki/Analysis/Variant%20
Call%20Format/vcf-variant-call-format-version-41
VCF Poster
http://vcftools.sourceforge.net/VCF-poster.pdf
Fixed Fields
Genotype Fields (Optional)
Meta-information lines
Viewing Alignments
Remove 1000 genomes track(or Click File>New Session)
Load sample files from URL(Ctrl-click for multi-select)
1_149003461-149460645.bam
1_149003461-149460645.vcf
Set locus to 1:149022000-149023000
View>Preferences for alignments view settings
downsampling
Can be Collapsed, Squished or Expanded
Viewing Alignments
Control+click a read to find its mate
Split Screen View
Right-click over an alignment and select View mate
region in split screen from the drop-down list
Return to Normal View
To return to the “normal view”, double-click the name
panel at the top of one of the panes, or right-click in a
name panel and select Switch to standard view.
Interpreting Color by Insert Size
Red for an inferred insert
size that is larger than
expected (deletion)
Blue for an inferred insert
size that is smaller than
expected (insertion)
Other colors for for paired
end reads that are coded
by the chromosome on
which their mates can be
found(Inter-chromosomal
Rearrangement)
Interpreting Color by Pair Orientation
The orientation of paired reads can be used to detect
structural events including: inversions, duplications and
translocations
(*)Splice Junctions
The splice junction view displays an alternative
representation of .bed files encoding splice junctions,
such as the "junctions.bed" file produced by the TopHat
program.
About SAM(BAM) format
SAM/BAM spec
http://samtools.sourceforge.net/SAM1.pdf
A human-readable summary
http://genome.sph.umich.edu/wiki/SAM
Explan SAM flags
http://picard.sourceforge.net/explain-flags.html
本日課程結束
謝謝