Automated Searching of Polynucleotide Sequences

Download Report

Transcript Automated Searching of Polynucleotide Sequences

Automated Searching of
Polynucleotide Sequences
Michael P. Woodward
Supervisory Patent Examiner - Art Unit 1631
571 272 0722
[email protected]
John L. LeGuyader
Supervisory Patent Examiner - Art Unit 1635
571 272 0760
[email protected]
1
Standard Databases
•
•
•
•
•
GenEMBL
N_Genseq
Issued_Patents_NA
EST
Published_Applications_NA
.rge
.rng
.rni
.rst
.rnpb
2
Databases at Time of
Allowability
• Pending_Patents_NA_Main
• Pending_Patents_NA_New
.rnpm
.rnpn
3
Types of Nucleotide Sequence Searching
•
•
•
•
Standard (cDNA)
Oligomer
Length Limited Oligomer
Score over Length
4
Types of Nucleotide Sequence Searching
• Standard (cDNA)
– useful for finding full length hits
– the query sequence is typically the full length of
the SEQ ID NO:
– the search parameters are the default parametersGap Opening Penalty & Gap Extension Penalty of
10
– standard suite of NA databases are searched
– normally 45 results and the top fifteen alignments
are provided, however, additional results and
alignments can be provided.
5
Standard (cDNA) search
• Fragments and genomic sequences
are often difficult to find
• Fragments are buried in the hit list
• The presence of introns in the
database sequence results in low
scores.
6
Types of Nucleotide Sequence Searching
• Standard Oligomer
– finds longest matching hits
– mismatches not tolerated in region of hit
match
• Length Limited Oligomer
– returns database hits within length range
requested
– mismatches not tolerated in region of hit
match
7
Standard Oligomer Searching
• Only provides the longest oligomer
present in the sequence
• A thorough search of fragments requires
multiple searches
• Can be an effective way of finding
genomic sequences
8
Standard Oligomer Searching
• the search parameters are the default
parameters-Gap Opening Penalty &
Gap Extension Penalty of 60mismatches not tolerated
• Consequently inefficient means of
finding small sequences, and with
<100% in correspondence
9
Claim 1
• An isolated polynucleotide comprising
SEQ. ID. No: 1.
10
Searching Claim 1
• A standard search looking for full length
hits is performed.
11
Standard (cDNA) search result
0001 CGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGATGG 0060
2031 CGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGG---CAGATGG 2090
12
Claim 2
• An isolated polynucleotide comprising at
least 15 contiguous nucleotides of SEQ.
ID. No: 1.
13
Searching Claim 2
• An standard oligomer search is performed
with an oligomer length of 15 nucleotides
set as the lower limit for a hit.
14
Oligomer Search Results
Standard Oligomer
CAAATGCAGGCCCCCGGACCTCCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG
Query
Database
CCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG 0060
CCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG 2500
Length Limited Oligomer
CAAATGCAGGCCCCCGGACCTCCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG
Query
Database
CCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG 0060
CCCTGCTCCTGGCTTTCGCCCTGCTCTGCCTGCCCTGG 0039
15
Claim 3
• An isolated polynucleotide comprising a
polynucleotide encoding a polypeptide of
SEQ ID No: 2.
• (SEQ ID No: 2 is an Amino Acid (AA)
sequence)
16
Searching Claim 3
• Seq ID No: 2 is searched against the
Polypeptide databases and it is “back
translated” and searched against the
polynucleotide databases.
17
Claim 4
• An isolated polynucleotide comprising a
polynucleotide with at least 90% identity
to SEQ ID No: 1.
18
Searching Claim 4
• A standard search looking for full length
hits is performed.
• Hits having at least 90% identity will
appear in the results.
19
Claim 5
• An isolated polynucleotide comprising a
polynucleotide which hybridizes under
stringent conditions to SEQ ID No: 1.
20
Searching Claim 5
• A standard oligomer search is performed
as well as a standard search.
21
Searching Small Nucleotide
Sequences
John L. LeGuyader
22
Types of Small Nucleotide
Sequences Claimed
•
•
•
•
•
•
•
•
•
Fragments
Complements/Antisense
Primers/Probes
Oligonucleotides/Oligomers
Antisense/RNAi/Triplex/Ribozymes (inhibitory)
Accessible Target/Region within Nucleic Acids
Aptamers
Nucleic Acid Binding Domains
Immunostimulatory CpG Sequences
23
Small Nucleotide Sequences
Claimed as Sense or Antisense?
• What is being claimed?
– Requesting the correct sequence search starts
with interpreting what is being claimed
• Complementary Sequences
– DNA to DNA: C to G
– DNA to RNA: A to U
• Matching Sequences
– A to A
– U to U
• DNA, RNA, Chimeric
• cDNA, Message (mRNA), Genomic DNA
24
Impact of Sequence Identity and Length
• Size and Identity Matter
• Complements/Matches
• 100% correspondence
• Mismatches
- Varying Degrees of Percent Identity
• Gaps
- Insertion or Deletions
- Gap Extensions
• Wild Cards
• % Query Match value approximates identity
• Adjustment of search parameters (e.g.
Smith-Waterman Gap values) influences %
Query Match value
25
Types of Nucleotide Sequence Searching
• Standard Search (cDNA)
• Oligomer
– finds database hits with longest regions of
matching residues
– mismatches not tolerated in region of hit match
• Length Limited Oligomer
– returns database hits within requested length
range
– mismatches not tolerated in region of hit match
• Score Over Length
– finds mismatched sequence database hits based
on requested length and identity range
26
Why doesn’t a standard search of the cDNA
provide an adequate search of fragments?
• Long length sequence hits with many matches
and mismatches score higher and appear first
on the hit list, compared to short sequences
having high correspondence
– lots of regional local similarity in a long sequence
scores higher than a 10-mer with 100% identity
• Consequence
– small sequences, of 100% identity or less, are
buried tens of thousands of hits down the hit list
– most small sequence hits effectively lost
– especially for hits with <100% correspondence
27
Why doesn’t a standard search of the cDNA
provide an adequate search of fragments?
• Fragments and types of sequence
searches
– Standard Search (cDNA): fragment hits
buried
– oligomer: fragment hits buried
– searching multiple fragments: millions of
hits and alignments to consider
• Each fragment of a specified sequence
and length requires a separate search
28
Standard Oligomer Searching
• Won’t provide thorough search of fragments
since longer hits score higher on hit table
• Smaller size hits lost, effectively not seen
• Does not tolerate mismatches in region of
matches
• Consequently inefficient means of finding
small sequences, and with <100% in
correspondence
• Better suited to finding long sequences
29
Length Limited Oligomer Searching
• Sequence request needs to set size limit
consistent with the size range being claimed
• Does not tolerate mismatches in region of
matches
• Consequently inefficient means of finding
small sequences with <100% in
correspondence
• Better suited to finding small sequences with
100% correspondence
30
Score Over Length Searching
• Small oligos with <100% correspondence
– within requested length and identity (>60%) range
• Manual manipulation of first 65,000 hits
– necessitates 2+ additional hrs. of searcher’s time
– does not include computer search time
• Calculation
– Hit Score divided by Hit Length
– for first 65,000 hits of table
• Hits then sorted by Score/Length value
• First 65,000 hits likely to contain small length
sequence hits down to 60% identity
31
Searching Small Sequences: Example
Consider the following claim:
• An oligonucleotide consisting of 8 to 20
nucleotides which specifically hybridizes
to a nucleic acid coding for mud loach
growth hormone (Seq. Id. No. X).
• The specification teaches that
oligonucleotides which specifically
hybridize need not have 100% sequence
correspondence.
32
Mud Loach Growth Hormone cDNA
• 670 nucleotides long
• 630 nucleotides in the coding region
• 210 amino acids
33
Standard Search GenBank Hit Table Against cDNA
34
Standard Search GenBank Hit Table Against cDNA
35
Standard Search GenBank Alignments Against cDNA
36
Standard Search GenBank Alignments Against cDNA
37
Oligomer Search GenBank Hit Table Against cDNA
38
Oligomer Search GenBank Hit Table Against cDNA
39
Oligomer Search GenBank Alignments Against cDNA
40
Oligomer Search GenBank Alignments Against cDNA
41
Length-Limited (8 to 20) Oligomer Search GenBank Hit Table cDNA
42
Length-Limited (8 to 20) Oligomer Search GenBank Hit Table cDNA
43
Length-Limited (8 to 20) Oligomer Search GenBank Alignments cDNA
44
Score/Length GenBank Hit Table Against cDNA: 8-20-mers down to 80%
45
Score/Length GenBank Hit Table Against cDNA: 8-20-mers down to 80%
46
Score/Length Alignments Against cDNA: 8-20-mers down to 80%
47
Score/Length Alignments Against cDNA: 8-20-mers down to 80%
48
QUESTIONS?
Michael P. Woodward
Supervisory Patent Examiner - Art Unit 1631
571 272 0722
[email protected]
John L. LeGuyader
Supervisory Patent Examiner - Art Unit 1635
571 272 0760
[email protected]
49