Transcript Document

GenomePixelizer - a visualization tool for comparative
genomics within and between species.
A. Kozik, E. Kochetkova, and R. Michelmore (Department of Vegetable Crops, UC Davis, CA)
GenomePixelizer main interface.
Program reads Run Setup file by
default during the start up.
GenomePixelizer "Matrix Color
Tuner" procedure allows user to
assign color for similarity/identity"
lines based on distance matrix file
data dynamically, without
changing the source of input file
We developed a genome visualization program, GenomePixelizer, to study evolutionary
patterns of specific gene families in whole genome(s). GenomePixelizer generates
custom images of the physical or genetic positions of specified sets of genes in one or
more genomes or parts of genomes. The positions of user-selected sets of genes are
displayed along the chromosomes based on either physical or genetic distances.
Multiple sets of genes can be shown simultaneously with user-defined characteristics
presented. It allows the analysis of duplication events within and between species by
displaying user-adjustable levels of sequence similarity. This provides comparisons
between patterns of duplication for different families of genes, investigations of the
occurrence of large versus local duplications and deletions as well as studies of
macro- and micro-synteny. We are using GenomePixelizer to study the evolution of
NBS-LRR encoding genes in comparison to other families of similar size such as
cytochrome P450 and receptor kinase encoding genes in Arabidopsis both at the whole
genome level and at the level of individual clusters. We are also adapting
GenomePixelizer to display homologs identified in EST libraries for comparative
studies. The program is written in Tcl/Tk and works on any computer platform that
supports the Tcl/Tk toolkit. GenomePixelizer generates HTML ImageMap tags for each
gene allowing links to databases. GenomePixelizer is under GNU General Public
License. Detailed program description, source code, examples, and documentation are
freely available at: http://niblrrs.ucdavis.edu/GenomePixelizer/
1. name of file containing gene coordinates: ./Trio_NBS_P450_PKLRR_Input
2. name of the distance matrix file: ./Trio_NBS_P450_PKLRR_Matrix_Color
3. number of chromosomes: 5
4. size of chromosomes: 30 20 24 18 27
5. identity upper level: 100
6. identity lower level: 75
7. window size (pixels) X: 960
8. window size (pixels) Y: 720
9. html prefix: http://mips.gsf.de/cgi-bin/proj/thal/search_gene?code=
10. Title: NBS, P450, PK-LRR clustering in Arabidopsis, 75% identity
11. Laboratory: (Michelmore lab, UCD)
########################################################
#####
for experienced users below this line
########
12. W/C correction: A
13. horizontal size of gene: 9
14. vertical size of gene: 4
15. W/C coefficient: 1
16. W/C correction value: 6
17. chromosome thickness: 5
18. gene feature mode (standard [std] or extended [ext]): std
Run Setup file
Canvas editor allows user to add text
and graphical labels to images
generated by GenomePixelizer
GenomePixelizer "Gene
Painter" procedure allows
user to paint different
set of genes in different
colors in batch mode
dynamically, without rerunning the project
Program output – Graphical genomic comparison
of clustering of three gene families:
Gene Coordinates (Input)
. . . . . .
5 At5g63410
5 At5g63450
5 At5g65240
5 At5g66900
5 At5g66910
5 At5g67200
5 At5g67280
5 At5g67310
1 At1g01280
1 At1g01600
1 At1g04210
1 At1g05700
1 At1g07560
1 At1g08590
1 At1g09970
1 At1g10860
1 At1g11600
1 At1g11680
. . . . . .
Gene ID
Chromosome #
. . . . .
At4g16890
At1g34210
At4g16860
At4g13290
At3g44480
At2g30750
At1g01600
At4g31940
At1g34540
At4g31940
At1g61180
At3g26190
At4g12310
At1g53440
. . . . .
GenomePixelizer
color scheme
GenomePixelizer "Locus Zoomer"
procedure allows user to zoom in
semi-automatic mode into regions
of interest and generate subprojects by extracting data from
whole dataset
. . . .
25.395
25.408
26.074
26.714
26.718
26.813
26.842
26.855
0.112
0.219
1.114
1.709
2.327
2.718
3.252
3.612
3.902
3.938
. . . .
. . . . . . . . .
C purple
C green
C purple
C orange
C orange
C purple
C purple
C green
W green
Gene
W green “property”
W purple
W purple
W purple
W purple
W purple
W purple
W green
W green
. . . . . . . . .
Color scheme:
- NBS-LRR
- cytochrome P450
Position on “Watson/Crick”
chromosome
orientation
. . . . . . . . .
At4g16950 0.901
At1g71830 0.900
At4g16920 0.900
At4g13310 0.895
At3g44670 0.894
At2g30770 0.893
At4g00360 0.889
At4g31950 0.886
At3g56630 0.885
At4g31970 0.885
At1g61190 0.884
At3g26200 0.883
At4g12320 0.883
At1g53430 0.883
. . . . . . . . .
. . . . .
orange Identity
purple
orange
Matrix
green
orange
File
green
green
green Identity level
green between pair
of genes
green
orange
green
green
Line color
purple
coding
. . . . .
- PK-LRR
Distribution of NBS-LRR (putative resistance genes), cytochrome
P450, PK-LRR (protein kinases) in the Arabidopsis genome.
Color scheme: NBS - orange, P450 - green, PK-LRR - purple,
lines connect genes with identity of 75% or higher.
Example Project: Fine Dissection of Segmental Duplications
in Arabidopsis Genome using GenomePixelizer
Color scheme:
- NBS-LRR
- cytochrome P450
- PK-LRR
Project implementation:
Segmental Duplications in Arabidopsis Genome
Colored lines connect genes with identity of 80% or higher.
Color scheme of lines showing identity is chosen to easy
distinguish the different pairs of chromosomes.
1. Data collection: gene coordinates, protein sequences
(predicted ORFs) at MIPS Arabidopsis database [http://mips.gsf.de]
2. Data collection: Functional Categories FUNCAT for the set of genes
at PEDANT database [http://pedant.gsf.de/]
3. Generation of matrix file by processing the results
of FASTA search “genome against genome”.
4. Running of GenomePixelizer with the whole set of genes (~26,000)
5. Selection region of interest, and data extraction for subproject
using “Locus Zoomer” procedure.
6. Re-Running of GenomePixelizer with the selected set of genes
and display different levels of identity (60% and 40% respectively)
using “Matrix Color Tuner" procedure.
7. Gene coloring according to MIPS Functional Categories using
"Gene Painter" procedure
GenomePixelizer automatically
generates HTML ImageMap tags
for each gene allowing Web
links to databases.