Aucun titre de diapositive - univ

Download Report

Transcript Aucun titre de diapositive - univ

Instruction to use the SVARAP program
Plan

Principle of SVARAP program

Use of SVARAP:



GDE Alignment
Formatting the GDE alignment
Variability analysis










Activation of « macros »
Pasting the GDE alignment
Checking-up the GDE alignment format
Rough data of variability analysis by nucleotidic site
Variability analysis by window of 50 nucleotides for 2000 nucleotides length
Variability analysis by nucleotidic site for 2000 nucleotides length
Program ASVARAP: study of amino acid variability
Examples
Download / References
Contact
Summary
Principle of SVARAP program




« SVARAP » (Sequence VARiability Analysis Program) analyses, evidences and
graphically represents variability or genetic diversity of nucleotidic sequences. Ii uses a
Microsoft Excel® file which is able to analyse simultaneously up to 100 séquences of up to
4000 nucleotides.
Variability is defined as the proportion of analysed sequences for which the nucleotide at a
given position is not the most frequently found in the studied set of sequences.
The program generates graphes and calculates mean, median, minimal and maximal values,
and coefficient of variation for windows of 50 nucleotides. It also analyses site by site.
Classically, tools aligning sequences identify sites and natures of nucleotidic differences.
Quantitative analysis of variability or diversity may increase the level of information to find
some discriminant or conserved regions, which could be aimed by PCR; or highly polymorphic
« spots ».
Thompson J. D., Gibson T. J., Plewniak F., Jeanmougin F., Higgins D. G. The CLUSTAL_X windows interface: flexible strategies
for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 1997, 25(24) : 4876-82.
Next
Summary
How SVARAP works ?

Sequences are aligned and the alignement in GDE format is copied then pasted in a cell of our
program that format the sequences to facilitate future analysis. Notably, each nucleotide stand
in a different cell to get in a same column the nucleotides corresponding to a same nucleotidic
site.

Consensus nucleotide at each nucleotidic site (defined as the most frequently found at this
position for the studied set of sequences) is automatically generated.

The program simultaneously calculates the absolute numbers of each of the 4 nucleotides (G,
A, C, T, or deletions or insertions), and their frequencies (en %). Diversity or variability is
defined as the proportion of sequences for which, at a given site, nucleotide differ of the
nucleotide which is the most frequently found for the studied set of sequences. It is calculated
with the formula: 100 – (maximal value in % of frequency for each of the four nucleotide at a
given nucleotidic site). The program also calculates the number of nucleotides of different
nature harbored at a given site. Results are analysed to calculate for windows of 50 nucleotides
the median, mean, minimal and maximal values of variability. Concommitantly, a site by site
analysis is also done and given for length of 2000 nucleotides.

Finally, SVARAP graphically represents the diversity/variability.
Summary
Alignment of sequences in GDE
format

Initial « material » is a set of sequences (maximuml 100 sequences).

SVARAP uses an alignment in GDE format (Genetic Data Environment). Firstly,
sequences are aligned with ClustalX v.1.8 [Thompson, 1997] after asking in the
Output Format Options for creation of a GDE file. Then, the alignment is copied then
pasted in a cell of our file Microsoft Excel® nommé « AnaVarNuc_Pos… ».
Next
Summary
To get an alignment in GDE format
using clustal X v1.8 (1/2)


Open ClustalX (1.8) and append sequences in FASTA format.
Select tab « Alignment », then output Format Options...
Summary
Next
To get an alignment in GDE format
using clustal X v1.8(2/2)



Select GDE format.
Start alignment.
Locate the GDE file.
Summary
Formatting the GDE alignment using
Microsoft Word®


Like for most of sequences analysis, it is necessary to format sequences.
Copy then paste in a Microsoft Word® then 1/ delete all paragraphe jump; 2/ replace the « - »
by another kind (. for instance) that do not lead to paragraph jump; 3/ add a paragraph jump
before the name of sequences. Then paste a paragraph jump (<enter>) after the name of
sequences (and before the 1st nucleotide).
Summary
Activating « macros »

The Microsoft Excel® contains « macros ». It is necessary to activate them to use the file; it is
possible to suppress this step :
Summary
Pasting the GDE alignment in
SVARAP
1
2
3
Link to final
analysis




How to
analyse > 4000
nucleotides or
> 2000
nucleotides
simultaneously
.
4
11. 2 files, analysing variability for nucleotides 1 to 2000 or 2001 to 4000, are downloadable, as
analysis for 4000 nucleotides cannot be done simultaneously.
22. When using this program: click on column B then key <Suppr> to delete prior work.
33. Paste in a same cell (white space, cell B2, the GDE alignment formatted using Microsoft
Word®).
Sheet « Paste the alignment »
Summary
Verify format of GDE alignment (1/2)



In column A, only sequence name, and in columns F, I, L and O, only sequences. Right number
of sequences.
If not: check the GDE alignment.
Next
Sheet « Sep1000 »
Summary
Verify format of GDE alignment(2/2)

In column B, only sequence name, and in column C, only sequences. Right number of
sequences.
If not: check the GDE alignment.

Sheet « Nuc 1-1000 » and « Nuc 1001-2000 »

Summary
Analysis of variability
2
5
6
3
1
4

This sheet and the table contain the main part of analysis of variability: the level of variability (1.) correspond
1
to the proportion of sequences for which, at a given nucleotidic site, the nucleotide differ compared with the
nucleotide the most frequently found in the studied set of sequences. Positions that are defined (2.)
correspond
2 to those defined in your set of sequences. The number of distinct variations (3.) correspond 3to
the number of different nucleotides observed at a given site.
This analysis is done by windows of 200 bases for reasons related to Microsoft Excel software (4.). 4
5.
5 Analysis in absolute value. 6.
6 Analysis in %

Sheets « Var...»


Next
Summary
Consensus sequence on a length of
2000 nucleotides
1



The consensus nucleotide is calculated for each of the nucleotidic sites on the whole length of
the studied sequences.
# (1.)
1 correspond to an indetermination:
examples: major representation equivalent for 2 nucleotides; insertions or deletions as
major representation.
Next
Sheet « Consensus »
Summary
Rough data of variability by nucleotidic site
on a length of 2000 nucleotides


The variability is calculated for each of the nucleotidic positions on the whole length of the
studied sequences.
Sheet « Consensus »
Summary
Analysis by window of 50
nucleotides



Variability is calculated and analysed by windows of 50 nucleotides on the whole length of the
studied sequences. The analysis is available:
in tables Sheet « Data fen 50 »
in graphe Sheet « Fig 1-2000 fen 50 »
Summary
Analysis by nucleotidic site for a length
of 2000 nucleotides (1/2)
1



A graph for variability calculated for each of the nucleotidic sites on the whole length of the
studied sequences is systematically generated.
Sheet « Fig var par position »
Each window of 250 nucleotides can be printed separately or copied then pasted in another
software (1.).
1 Or all 2000 nucleotides are printable at the same time:
Next
Summary
Analysis by nucleotidic site for a length
of 2000 nucleotides(2/2)


Look before printing of the variability calculated for each of the nucleotidic positions on the
whole length for the studied sequences.
Sheet « Fig var par position »
Summary
How to analyse more than 4000 nucleotides
This program is not only limited concerning the length of studied sequences. It can analyse
more than 4000 nucleotides, and more than 2000 nucleotides at the same time.
To analyse more than 4000 nucleotides:

Copy the file « AnaVarNuc_Pos 1-2000 »

Go to sheet « Paste alignment »

Unmask all columns (<Format><Colonnes><Afficher>)

Go to cells F2 to F201 and replace 1 by the starting site to analyse in your alignment (e.g.
8000, or 10224); then replace in column G2 to G201, respectively 1001 by a value incremented
of 1000 vs the one written in column F (e.g. 9000, or 11224)

You have so programmed the analysis of nucleotides 8000 to 10000, or 10224 to 12224.
Summary
How to analyse more than 2000 nucleotides
at the same time
This program is not only limited concerning the length of studied sequences. It can analyse
more than 4000 nucleotides, and more than 2000 nucleotides at the same time.
To analyse more than 2000 nucleotides at the same time:

Use the values of variability for 2000 nucleotidic sites ad stored in the sheet called
« consensus ». When copying in a new Microsoft Excel® file these values by 2000
nucleotides from several files, you are creating graphics for the appropriate length.
Summary
Applications for SVARAP

SVARAP produces rapidly graphical representations which can be easily interpreted.

It leads in a first step to analyse genetic diversity in a set of sequences by windows
of 50 nucleotides.

A more precise information is also available with site by site analysis.
An example of use of SVARAP
Next
Summary
Contact

For informations or questions, you can contact me at : [email protected]
Summary
Download
Download the instructions
for use of SVARAP
Link to Clustal X v1.8
References

Download SVARAP to analyse
nucleotidic positions 1 to 2000
(Microsoft Excel® v97)
Download SVARAP to analyse
nucleotidic positions 2001 to 4000
(Microsoft Excel® v97)
Download ASVARAP to analyse
amino acid positions 1 to 1000
(Microsoft Excel® v97)
URL: http://ifr48.free.fr/recherche/jeu_cadre/jeu_rickettsie.html
Summary
1/ Delete the paragraph jump
In Microsoft Word® v97 - French edition:

<Edition><Remplacer><Plus><Spécial><Marque de paragraphe><Remplacer tout>
Summary
2/ Replace dashes
In Microsoft Word® v97 - French edition:

<Edition><Remplacer>

Dans rechercher: 
Dans remplacer par: ―

<Remplacer tout>
Summary
3/ Add paragraph jumps before and after the name of
sequences.
In Microsoft Word® v97 - French edition:

<Edition>

Dans rechercher: #

Dans remplacer par: # par Marque de paragraphe#
Summary
Application ASVARAP

The study of variability can also concern amino acid sequences (amino acids 1 to
1000). The principle and use are the same as for SVARAP :
Download
Summary