Class 38 - University of Virginia
Download
Report
Transcript Class 38 - University of Virginia
Class 38:
Fixed Points
and Biological
Computing
CS200: Computer Science
University of Virginia
Computer Science
David Evans
http://www.cs.virginia.edu/~evans
Menu
• Making Recursive Definitions without
define
• Computing with DNA
• How Biology Programs
24 April 2002
CS 200 Spring 2002
2
Lambda Calculus
term ::= variable |term term | (term)| variable . term
-reduction
(renaming)
y. M v. (M [y v])
where v does not occur in M.
-reduction
(substitution)
(x. M)N M [ x N ]
24 April 2002
CS 200 Spring 2002
3
Lambda Calculus is a Universal Computer
z z
z
z
z
z
z
z
z
z
z
z
z
z
z
z z
z
z
z
), X, L
), #, R
(, #, L
2:
look
for (
1
Start
(, X, R
#, 1, -
HALT
#, 0, -
Finite State Machine
We have this, but
we cheated using
to make recursive
definitions!
24 April 2002
• Read/Write Infinite Tape
Mutable Lists
• Finite State Machine
Numbers to keep track of state
• Processing
Way of making decisions (if)
Way to keep going
CS 200 Spring 2002
4
Fixed Point Theorem
• The fixed point of a function f, is
a value x such that f(x) = x
• If we can find the fixed point of
our Turing Machine simulator,
then we have something that
keeps going until it halts!
fixed-point TM input
result of running TM on input
24 April 2002
CS 200 Spring 2002
5
All Lambda Calculus Terms
have Fixed Points!
• For any Lambda Calculus term F,
there exists a Lambda Calculus
Term X such that FX = X
• Proof:
Let W = x.F(xx) and X = WW.
X = WW = ( x.F(xx))W
F (WW) = FX
24 April 2002
CS 200 Spring 2002
We can
make F a
parameter!
6
Why of Y?
• Y is f. WW:
Y f. ( x.f (xx)) ( x. f (xx))
• Y calculates a fixed point of any
lambda term!
• Hence: we don’t need define to do
recursion!
• Works in Scheme too - check the
“lecture” from the Adventure Game
24 April 2002
CS 200 Spring 2002
7
Lambda Calculus
is Turing Universal!
• All you need is beta-reduction and you can
compute anything
• This is just one way of representing
numbers, if, etc. – many others are
possible
• Integers, booleans, if, while, +, *, =, <,
classes, define, inheritance, etc. are for
wimps! Real programmers only use .
24 April 2002
CS 200 Spring 2002
8
Models of Computation
• Mechanical: Turing Machine
• Symbolic: Lambda Calculus
• Next: Biological
24 April 2002
CS 200 Spring 2002
9
Computing with DNA
Leonard Adleman
(Mathematical
Consultant for
Sneakers), 1995
24 April 2002
CS 200 Spring 2002
10
DNA
• Sequence of
nucleotides: adenine
(A), guanine (G),
cytosine (C), and
thymine (T)
• Two strands, A must
attach to T and G must
attach to C
24 April 2002
CS 200 Spring 2002
G
C
T
A
11
Hamiltonian Path Problem
• Input: a graph, start vertex and end vertex
• Output: either a path from start to end that
touches each vertex in the graph exactly
once, or false indicating no such path
exists
RIC
start: CHO
end: BWI
BWI
CHO
IAD
24 April 2002
CS 200 Spring 2002
Hamiltonian Path
is NP-Complete
12
Encoding The Graph
• Make up a two random 4-nucleotide
sequences for each city:
CHO:
RIC:
IAD:
BWI:
CHO1 = ACTT
RIC1 = TCGG
IAD1 = GGCT
BWI1 = GATC
CHO2 = gcag
RIC2 = actg
IAD2 = atgt
BWI2 = tcca
• If there is a link between two cities (AB),
create a nucleotide sequence: A2B1
CHORIC
RICCHO
24 April 2002
gcagTCGG
actgACTT
CS 200 Spring 2002
Based on Fred Hapgood’s notes
on Adelman’s talk
http://www.mitre.org/research/nanotech/hapgo
od_on_dna.html
13
Encoding The Problem
• Each city nucleotide sequence binds with
its complement (A T, G C) :
CHO: CHO1 = ACTT
CHO2 = gcag
CHO’:
TGAA
cgtc
RIC:
TCGGactg
RIC’: AGCCtgac
IAD:
GGCTatgt IAD’ = CCGAtaca
BWI: GATCtcca BWI’ = CTAGaggt
• Mix up all the link and complement DNA
strands – they will bind to show a path!
24 April 2002
CS 200 Spring 2002
14
Path Binding
BWI’
RIC’
IAD’
CHO’
TGAAcgtcCCGAtacaAGCCtgacCTAGaggt
gcagGGCTatgtTCGG actgGATC
CHOIAD IADRIC RICBWI
TCGGactg
RIC
BWI
GATCtcca
CHO
ACTTgcag
IAD
GGCTatgt
24 April 2002
CS 200 Spring 2002
15
Getting the Solution
• Extract DNA strands starting with CHO
and ending with BWI
– Easy way is to remove all strands that do not
start with CHO, and then remove all strands
that do not end with BWI
• Measure remaining strands to find ones
with the right weight (7 * 8 nucleotides)
• Read the sequence from one of these
strands
24 April 2002
CS 200 Spring 2002
16
Why don’t we solve NPComplete problems this way?
• Speed: shaking up the DNA strands does
1014 operations per second ($400M
supercomputer does 1010)
• Memory: we can store information in DNA
at 1 bit per cubic nanometer
• How much DNA would you need?
– Volume of DNA needed grows exponentially
with input size
– To solve ~45 vertices, you need ~20M gallons
24 April 2002
CS 200 Spring 2002
17
DNA-Enhanced PC
24 April 2002
CS 200 Spring 2002
18
How does Nature program?
24 April 2002
CS 200 Spring 2002
19
How Big is the
Make-a-Human Program?
• 3 Billion Base Pairs
– Each nucleotide is 2 bits (4 possibilities)
– 3 B pairs * 1 byte/4 pairs = 750 MB
1 CD ~ 650 MB
24 April 2002
CS 200 Spring 2002
20
Encoding is Redundant
• DNA encodes proteins
• Every sequence of 3 base pairs one of 20
amino acids (or stop codon)
– 21 possible codons, but 43 = 64 possible
values
– So, really only 750MB * (21/64) ~ 250 MB
24 April 2002
CS 200 Spring 2002
21
People are almost all the Same
• Genetic code for 2 humans differs in only
2.1 million bases
– 4 million bits = 0.5 MB
24 April 2002
CS 200 Spring 2002
22
How big is .5 MB?
• 1/3 of a floppy disk
• <1% of Windows 2000
• ~22 times the size of the PS6
adventure game code
24 April 2002
CS 200 Spring 2002
23
Is DNA Really a
Programming Language?
24 April 2002
CS 200 Spring 2002
24
Nerdy Linguist’s Definition
A description of pairs (S, M),
where S stands for sound, or any
kind of surface forms, and M
stands for meaning. A theory of
language must specify the
properties of S and M, and how
they are related.
24 April 2002
CS 200 Spring 2002
25
Programming Language
(Definition from Lecture 1)
A description of pairs (S, M),
where S stands for sound, or
any kind of surface forms, and
M stands for meaning intended
to be read and written by
humans and processed by
machines.
24 April 2002
CS 200 Spring 2002
26
Stuff Programming Languages
are Made Of
• Primitives
codons (sequence of 3 nucleotides that encodes a protein)
• Means of Combination
?? Morphogenesis? Not well understood (by anyone).
This is where most of the expressiveness comes from!
• Means of Abstraction
DNA itself – separate proteins from their encoding
Genes – group DNA by function (sort of)
Chromosomes – package Genes together
Organisms – packages for reproducing Genes
24 April 2002
CS 200 Spring 2002
27
Biology is (becoming) a
subfield of Computer Science
• Biological mechanisms are mostly
understood (proteomics still has a way
to go)
• What is not understood is how those are
combined to create meaning
24 April 2002
CS 200 Spring 2002
28
Charge
• Noon (now): President Casteen’s State of
the University in Old Cabal Hall
– Extra credit question: “Given that Computer
Science is the most liberal art, how come UVa
College students are not able to major in
Computer Science?”
• Friday: review
– Chance to ask questions about anything you
want
24 April 2002
CS 200 Spring 2002
29