一种全新序列分类模型 - A New Kind of Science
Download
Report
Transcript 一种全新序列分类模型 - A New Kind of Science
A subsequent study of
visualized DNA sequence
comparison based on NKS
Dawei Li Ph.D
The Rockefeller University
E-mail: [email protected] or [email protected]
Basic terms about DNA sequence
1. DNA is made up of nucleotides.
2. Nucleotides include ‘A’ ‘G’ ‘C’ ‘T’.
3. For same nucleotide compositions,
different sequences show evidently
different stability of DNA helix.
For example, -GC- is much more stable
than -CG-.
4. There may be some interaction
between the nucleotides.
Experimental approaches to study
interaction between nucleotides
Atomic force microscope (AFM)
It is a very high-resolution type of scanning probe microscope, with
demonstrated resolution of fractions of a nanometer.
X-ray diffraction (XRD)
The techniques are based on the elastic scattering of x-rays from structures that
have long range order.
Neutron scattering
The deflection of neutron particles is used as a scientific probe
Nuclear magnetic resonance (NMR)
It is a physical phenomenon based upon the quantum mechanical
magnetic properties of an atom's nucleus.
Infrared spectroscopy (IR Spectroscopy)
It is the subset of spectroscopy that deals with the IR region of the EM
spectrum
Questions
Four DNA sequences:
1.
3.
4.
CTCGGGTTATCGGCGTGGTCCGGCCGAGGGCGGCATTCCAGAAGAGGGACCCTCACGCCACCA 2.
CCAGAGCGTCGCCGACCCTCTAATTGGTCTCCCCAGAAGAGGCTGAGAAGAAGGCCGAAACAG
AGAGTCCCAGGACACACTGTAGAAGATCAAAGCAGAAGAAGAGGGAAGAGTGGCTGAGGGAC
TCAGCCCATCTGCCATCCCCAAAGATAGAAGACACCCCCTTGGTTGCCCTCTAGAAGATCCAGT
Can you figure out which organisms the sequences
belong to?
Is there any “inner-difference” between the sequences?
Can you figure out what are the differences and
interaction?
The question has not been clarified successfully by the
existing approaches based on traditional mathematical
rules.
Here ‘Wolfram approach’ can do something….
It may shape the current theory of DNA
sequence analysis.
Wolfram approach
Wolfram approach which was described in “A New Kind of
Science” in 2001 has attracted biologists’ attention.
Wolfram approach provides a nontraditional visualized model for
DNA sequence comparison. Compared with traditional
approaches:
1) The Wolfram approach is based on the concept that simple
rules are able to produce highly complicated behaviors.
2) It pays attention to the interaction power of regulation
between any adjacent nucleotides.
3) It can show alterations of DNA nucleotides dynamically
including transposition, insertion, deletion, and duplication.
It has become possible to study some
biological issues that have never been
successfully clarified by traditional
mathematical methods. However, study
of DNA comparison with Wolfram
approach is still very few.
About our study
With Wolfram approach, we
1. studied some simple rules;
2. analyzed some DNA sequences of
different viruses with a special rule;
3. studied the images visually.
Hypotheses
Our studies were based on four hypotheses:
1) DNA sequence is not random, it has a rule.
2) There is an uncertain mode of nucleotide
organization that each DNA sequence follows.
3) Simple rule can also produce complex
behaviors in living organisms.
4) Wolfram approach can reflect the rule rooted
in DNA sequence.
Rule in the hypotheses
The eight arrangements can produce
very complicated behaviors.
Nested structure
The nested structure defined
by wolfram was generated
based on one single paternal
cell.
A very simple example: snowflake
The sequences of SARS
viruses
SARS BJ01, partial genome;
SARS BJ02, partial genome;
SARS BJ03, partial genome;
SARS BJ04, partial genome;
SARS CUHK-W1, complete genome;
SARS GZ01, partial genome;
SARS HKU-39849, complete genome;
SARS TOR2, complete genome;
SARS Urbani, complete genome;
SARS coronavirus CUHK-Su10, complete genome;
SARS coronavirus isolate SIN2774 complete genome;
SARS coronavirus TW1, complete genome;
SARS coronavirus, complete genome.
Machine
Computer
SGI Origin 3000 (Silicon Graphics, Inc.
64 500 MHZ IP35 processors) was used
throughout our study. Each sequence
was run using the same programs.
Images
More than 3,000 images (200,000 Mega)
were generated.
The results of 13 SARS viruses
The images were arranged according to the color order in chromatogram
(Ref.1).
The SARS viruses behaved quite
differently from other viruses. There was
a very large nested structure across the
beginning 10 kb region.
By comparison, we found the nested
structures mainly located in the regions
of replicases 1A and 1B. The replicase 1A
protein gene may control the activities of
the replication complex of SARS viruses.
Comparisons between SARS-CUHK and SARS-GZ
Possible origin of SARS virus (Ref.1).
Comparison of images among five different viruses
(a) and (b), SARS virus and equine rhinovirus (ER), respectively, showing the nested
structures.
(c) Another virus in which only two small nested structures were found.
(d) A typical behavior of a common virus.
(e) The behavior of HIV.
Note: Images in (a), (c), (d), and in (b), (e) are reduced by 15,000 and 1,600 folds, respectively
(Ref.1)..
The whole genomes of equine rhinovirus (ER)
and SARS virus shared similar nested
structures.
SARS virus and human coronavirus 229E were
very different in behavior.
No nested structure was found in HIV.
To study whether the nested structures exist in
other organisms, we analyzed other ten virus
genomes with Wolfram approach.
The ten types of viruses are as follow:
Avian infectious bronchitis virus (and avian infectious bronchitis virus messenger ribonucleic acid (mRNA)),
Bovine coronavirus,
Dengue virus type 4 strain 814669,
Human rhinovirus 1B,
Japanese encephalitis virus strain K94P05,
Murine hepatitis virus,
Pestivirus type 2,
Porcine epidemic diarrhea virus,
Porcine transmissible gastroenteritis virus minigenome,
West Nile virus.
The results showed that all the viruses can be classified
into two groups by their behaviors: Group 1 with left
bottom growth of white lines; Group 2 with right bottom
growth of black lines. No nested structure was found.
We also studied the behavior of mRNA sequence of avian
infectious bronchitis virus.
The current results may suggest that:
•1) the region of the nested structure may be involved
in the reproduction of the virus;
•2) the coding sequence of the virus may share some
kind of similar complex gene regulation cycle with
SARS viruses. The nested structure may contain some
special bio-information.
Results in summary
1. SARS viruses showed the nested structure
behaviors. The results suggested that the
genome sequences should have specific mode of
nucleotide organization.
2. HIV showed another
nucleotide organization.
3. The unique characteristics found in the DNA
sequence of SARS viruses and the mRNA
sequence of avian virus suggested the
importance of the nested structure behaviors.
type
of
mode
of
Discussion
Advantages
Wolfram approach has some advantages:
1. It can magnify the tiny changes in whole genome
sequence for both overall and detailed analyses.
2. It can also be used in a single nucleotide scale, such as
DNA mutation and polymorphisms (SNPs and
microsatellites).
3. It pays more attention to the interaction network of
power and regulation among the adjacent nucleotides.
4. It is not only appropriate to DNA/RNA sequences, but
also to protein sequences.
Disadvantages:
There are some disadvantages in our study, such as only
one of the 256 rules was adapted.
As for encoding the nucleotides, quaternary system
should be better than binary system, however, quaternary
system will result in more rules. (48)
A typical feature of Wolfram approach:
Each cell of the DNA sequence has interaction
with its adjacent cells.
Scores of power:
The power from two sides is not always equal,
it is scored between 1 and 0, which represent
‘for’ and ‘against’, respectively.
Four behaviors
The behaviors of the DNA sequences can be
classified into four categories:
1 purely repetition with left growth as most common
viruses showed;
2 purely repetition with right growth as HIV showed;
3 nested structure as SARS viruses showed;
4 simply identical white or black.
Nested Structure
1.
2.
The nested structure may result from the
interaction of aggregation between black and
white cells.
It may represent a regulation cycle.
A black line may signify the beginning of a
protein production cycle, and a white line may
signify the end by closing off the triangle.
Interaction network of power
1. The interaction between nucleotides includes
power and anti-power. Each nucleotide receives
power from adjacent nucleotides and exerts
power as well.
2. The balance can be easily broken by
sequence alteration because of its sensitivity. A
single mutation can cause death, this may be
because the original nucleotide has a key role in
the whole genome. It can also explain why some
mutations can be ignored.
Mutation & SNP
Mutation is change to the nucleotide of
DNA or RNA sequence.
Single Nucleotide Polymorphism (SNP) is
a DNA sequence variation occurring
when a single nucleotide in DNA
sequence or the genome differs between
members of a species.
An example for SNP model
In conclusion
The traditional intuition is that the behavior should be
simple if the rule is simple. This is not true based on the
data demonstrated by both Wolfram’s work and our
study. The simple rule can actually capture the essential
mechanisms responsible for complex phenomena in living
organisms.
We applied Wolfram approach in the DNA sequence
analysis. Our results supported that the approach is
appropriate for visualized sequence comparison, and the
approach is a useful categorizer tool.
The results may be fundamental but interesting for the
subsequent studies. Further systematic investigations are
necessary and the results also need experimental work to
be confirmed.
Reference
Ref.1: Li. D et al. Understanding SARS with Wolfram approach. Acta Biochimica et
Biophysica Sinica. 2004; 36(1):1-10.
Acknowledgement
Lin He, Zhende Huang, Jurg Ott et al.
Thank you !