Day 1. General aspects for genetic map construction

Download Report

Transcript Day 1. General aspects for genetic map construction

DAY 1. GENERAL ASPECTS FOR
GENETIC MAP CONSTRUCTION
SANGREA SHIM
INDEX
 Day 1

General aspects for genetic map construction

Genetic polymorphism and recombination frequency

Genotyping using molecular marker

Map construction (phenotype, AFLP, RFLP)

Sequencing method

Next generation sequencing

Whole genome reference sequence

Reference sequencing for Genotyping

Retrieving sequence polymorphism

Genetic map construction (SNP, InDel)
GENETIC POLYMORPHISM &
RECOMBINATION FREQUENCY
GENOTYPING USING MOLECULAR MARKER
An Integrated High-density Linkage Map of Soybean with RFLP, SSR, STS, and AFLP Markers Using A Single F2 Population
Xia et al. 2008
MAP CONSTRUCTION
An Integrated High-density Linkage Map of Soybean with RFLP, SSR, STS, and AFLP Markers Using A Single F2 Population
Xia et al. 2008
NEXT GENERATION SEQUENCING
 Sequencing


Sanger’s Dideoxy Termination

Using dNTPs

Electrophoresis in capillary gel

Read dye colors one-by-one

Average 700~900 bp
Massive Parallel Sequencing Platform

So called Next Generation Sequencing platform

SOLiD (Sequencing by Ligation), Illumina (Sequencing by synthesis), 454 (Pyrosequencing)

Read 50+35(50+50), 50~300, 700 bp

1200~1300, ~3000, 1 million reads per run
NEXT GENERATION SEQUENCING
Sequencing technologies – the next generation
Michael et al. Nature review genetics 2010
WHOLE GENOME REFERENCE SEQUENCE
 Polymorphism discovered by comparison
 Reference is required for comparison
 So, the reference genome is obligated
 Making contigs which is constituted by unique
sequences combination using PE or small size MP
 Scaffolding which includes less unique sequences (i.e.
repetitive sequences) using large insert size MP
library sequences
 Anchor the scaffold using genetic map
 But, genetic map constituted by several types of
molecular marker is not able to translate to
sequence information
RESEQUENCING FOR GENOTYPING
 GET Polymorphism!, Treat it as a marker or locus!

SNPs

Small size InDels
 Align several depth of raw read sequence against Ref.

Statistics
 Lots of alignment software is available

BLAST, BLAT, BWA, BOWTIE-series…..
 Aligner which use BWT as a main algorithm are famous

Fast, efficient
RESEQUENCING FLOW CHART
DNA/RNA
NGS platform
Alignment
pileup
bwa
bowtie2
samtools
bcftools
VCF
Raw read
Sequences
SAM
samtools
selection
BAM
Quality trimming
SolexaQA
samtools
Sorted BAM
Map construction
JoinMap4
RETRIEVING SEQUENCE POLYMORPHISM
 BOWTIE2 or BWA are just align the bulky reads to reference sequence
 Making SAM(sequence alignment/mapping)/BAM(binary sequence alignment/mapping) as a result
 Several types of statistics or inferences can be adapted to retrieving polymorphism (Picard, GATK)
 Samtools package is used in retrieving variants
 The VCF(variant calling format) is the ouput file
GENETIC MAP CONSTRUCTION
Selection of a core set of RILs from Forrest x Williams 82 to develop a framework map in soybean
Wu et al. 2011
HURDLES ON THE ROAD TO GENETIC MAP
 Output of calling variation is a VCF format
 JoinMap input file is LOC format
 Is there a Converter between the VCF and LOC?
 Make converter program, Make genetic map yourself
 These are the final goal of this courses
TODAY’S PRACTICE
 Make a connection to remote computer
 Get used to Linux system
 Get familiar with python2.7
THANK YOU
 If you have a question, please ask me.
DAY 1.
PRACTICE - BASIC LINUX COMMAND
TAEYOUNG LEE
CONNECTING
 Server is located in Seoul National University campus
 Connect to server computer using putty SSH client program
 Download at http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
CONNECTING
 Execute putty
 Put IP address (147.46.250.193) at Host Name and click OPEN
CONNECTING
 ID : trainee
 PW : bogor
 Then you are in server now
 Only white character on black background
BASIC COMMAND IN LINUX
 ls
 Listing files and directories
 cd
 Change directory
 Practice) enter into /data2/python
BASIC COMMAND IN LINUX
 mkdir
 Make directory
 Usage) mkdir dir_name
 Practice) make directory named as your name
BASIC COMMAND IN LINUX
 vi
 Open text editing program
 Make new text file
 usage) vi filename_to_edit
vi filename_to_make
 Practice) make text file named as yourname in your directory, write something and save it
 Insert, replace, esc
 :q :w :wq :q!
BASIC COMMAND IN LINUX
 mv
 Moving files or directories
 Rename files or directories
 Usage) mv present_file_path file_path_to_move
 Practice)
 Change directory into upper directory
 cm) cd ..
 Make some text file by vi
 Move text file to your directory
 Rename text file
BASIC COMMAND IN LINUX
 cp
 Coping files or directories
 Usage) cp file_path file_path_to_copy
 cp can rename file
 If you want to copy directory, you have to use –r option
 Cp –r dir_path dir_path_to_copy
 Practice)
 Make directory in your directory
 Copy some file into directory with rename and w/o rename
BASIC COMMAND IN LINUX
 rm
 Removing files or directories
 Usage) rm file_name
 If you want to remove directory, you have to use –r option
 rm –r dir_name
 Practice)
 Remove the directory and file
BASIC COMMAND IN LINUX
 less
 Read only text viewer
 Have advantage for large size text file
 Usage) less file_name
 Searching function
 /
 Practice)
 Open large text file by vi and less
 /data2/python/Gmax_109_gene_exons.gff3
 Use searching function
 /Gm12
 wget ftp://ftp.arabidopsis.org/
home/tair/Sequences/whole_chromosomes/tai
r9_Assembly_gaps.gff
BASIC COMMAND IN LINUX
 cat
 Concatenate files
 Print out files
 Usage cat file_name1 file_name2 …
 Practice)
 Print out file by cat
 Print out file three times
BASIC COMMAND IN LINUX

grep

Grep the lines contain some words

Usually use with cat

Usage) cat file_name | grep ‘word’



‘|’ mean after

This usage mean we grep line which contain some word after print out file
Various useful options

-v : vanish

-c : count

‘word1\|word2’ = word1 or word2

grep ‘word1’ | grep ‘word2’ = word1 and word2
Practice)

Grep ‘Gm12’ in /data2/python/Gmax_109_gene_exons.gff3

Grep ‘Gm12’ or ‘Gm15’ in same file

Grep ‘gene’ and ‘mRNA’

Count line contain ‘Gm12’

Vanish line contain exon or CDS or mRNA
BASIC COMMAND IN LINUX
 sort

Sorting file

Usually use with cat

Usage) cat file_name | sort

Various useful options


-k sort by column

-u sort and remove redundancy

-n numeric sort

-r reverse

-d delimiter setting
Practice)

Sort /data2/python/Gmax_109_gene_exons.gff3 by start position(by column and numeric)
BASIC COMMAND IN LINUX
 cut
 Cutting column in file
 Usually use with cat
 Usage) cat file_name | cut –f n (n : integer)
 Practice)
 Retrieve chromosome, start position, end position in /data2/python_study/Gmax_109_gene_exons.gff3
BASIC COMMAND IN LINUX
 >
 Standard input, output vs. file input, output
 Input and output on screen or file
 > can save standard output to file output
 cat file_name | grep ‘word’ > output_file
 >>
 >> also can save standard output to file output
 But just adding!
HANDLE FILE
 Fasta file

/data2/python/ap2.fa
 Fastq file

/data2/python/example.fastq
 Gff file

/data2/python/Gmax_109_gene_exons.gff3
 Python file!

/data2/python/1stday.py
 Make a new text file named as new.txt
 The file contain
 Gm01,1,23
 Gm04,4,56
 Gm03,6,78
 Gm04,8,10
 Copy new.txt into new.copy
 Remove new.copy
 Using cat, print the contents of new.txt
 Using grep, print the contents the new.txt contain Gm04
 Using cut, print the first column of new.txt and save it as a file
named as new.txt.cut
THAT’S IT FOR TODAY
 Q &A