CENTURY PATENTED INVENTIONS

Download Report

Transcript CENTURY PATENTED INVENTIONS

GENETIC PROGRAMMING
John R. Koza
Consulting Professor (Biomedical Informatics)
Department of Medicine
School of Medicine
Consulting Professor
Department of Electrical Engineering
School of Engineering
Stanford University
Stanford, California 94305
[email protected]
http://www.smi.stanford.edu/people/koza/
THE CHALLENGE OF
ARTIFICIAL
INTELLIGENCE
“How can computers learn to solve problems
without being explicitly programmed? In
other words, how can computers be made to
do what is needed to be done, without being
told exactly how to do it?”
 Attributed to Arthur Samuel (1959)
CRITERION FOR SUCCESS
"The aim [is] ... to get machines to exhibit
behavior, which if done by humans, would be
assumed to involve the use of intelligence.“
 Arthur Samuel (1983)
MAIN POINT No. 1
• Genetic programming now
routinely delivers high-return
human-competitive machine
intelligence
MAIN POINT No. 2
• Genetic programming is an
automated invention machine
MAIN POINT No. 3
• Genetic programming has
delivered a progression of
qualitatively more substantial
results in synchrony with five
approximately order-ofmagnitude increases in the
expenditure of computer time
MAIN POINT No. 1
• Genetic programming now
routinely delivers high-return
human-competitive machine
intelligence
“HUMAN-COMPETITIVE”
• The result is equal or better
than human-designed
solution to the same problem
NASA EVOLVED ANTENNA
X-Band Antenna for NASA's Space Technology 5
Mission in 2004
“HUMAN-COMPETITIVE”
• Previously patented, an
improvement over a patented
invention, or patentable
today
DEFINITION OF “HIGHRETURN”
The AI ratio (the “artificial-tointelligence” ratio) of a problemsolving method as the ratio of that
which is delivered by the automated
operation of the artificial method to
the amount of intelligence that is
supplied by the human applying the
method to a particular problem
DEFINITION OF “ROUTINE”
A problem solving method is routine if
it is general and relatively little
human effort is required to get the
method to successfully handle new
problems within a particular domain
and to successfully handle new
problems from a different domain.
PROGRESSION OF
QUALITATIVELY MORE
SUBSTANTIAL RESULTS
PRODUCED BY GP
•
•
•
•
•
Toy problems
Human-competitive non-patent results
20th-century patented inventions
21st-century patented inventions
Patentable new inventions
REPRESENTATIONS
• Decision trees
• If-then production
rules
• Horn clauses
• Neural nets
• Bayesian networks
• Frames
• Propositional logic
• Binary decision
diagrams
• Formal grammars
• Coefficients for
polynomials
• Reinforcement
learning tables
• Conceptual clusters
• Classifier systems
A COMPUTER PROGRAM
DESIRED OUTPUT OF PROGRAM
Time
Output
0
6
1
6
2
6
3
6
4
6
5
6
6
6
7
6
8
6
9
6
10
6
11
7
12
7
PROGRAM TREE
(+ 1 2 (IF (> TIME 10) 3 4))
GENETIC PROGRAMMING
• Create initial population (random)
• Main generational loop
– Execute all programs
– Evaluate fitness of all programs
– Select single individuals or pairs of individuals
based on fitness to participate in the genetic
operations (mutation, crossover, reproduction,
architecture-altering operations)
• Termination Criterion
CREATING RANDOM PROGRAMS
CREATING RANDOM
PROGRAMS
• Function Set
F = {+, -, *, %, IFLTE}
• Terminal Set
T = {X, Y, Random-Values}
CREATING RANDOM
PROGRAMS
• The random programs are:
–Of different sizes and shapes
–Syntactically valid
–Executable
DARWINIAN SELECTION
• Selection based on fitness
• Better individual more likely to be
selected
• Probabilistic selection
- Best is not always picked
- Worst is not necessarily excluded
MUTATION OPERATION
MUTATION OPERATION
•
•
•
•
Select 1 parent probabilistically based on fitness
Pick point from 1 to NUMBER-OF-POINTS
Delete subtree at the picked point
Grow new subtree at the mutation point in same
way as generated trees for initial random
population (generation 0)
• The result is a syntactically valid executable
program
• Put the offspring into the next generation of the
population
CROSSOVER OPERATION
CROSSOVER OPERATION
• Select 2 parents probabilistically based on fitness
• Randomly pick a number from 1 to NUMBER-OFPOINTS for 1st parent
• Independently randomly pick a number for 2nd parent
• The result is a syntactically valid executable program
• Put the offspring into the next generation of the
population
• Identify the subtrees rooted at the two picked points
REPRODUCTION
OPERATION
• Select parent probabilistically based on
fitness
• Copy it (unchanged) into the next
generation of the population
PROBABILISTIC STEPS
• The initial population is typically random
• Probabilistic selection based on fitness
- Best is not always picked
- Worst is not necessarily excluded
• Random picking of mutation and crossover
points
ILLUSTRATIVE GP RUN
SYMBOLIC REGRESSION
Independent Dependent
variable X
variable Y
-1.00
1.00
-0.80
0.84
-0.60
0.76
-0.40
0.76
-0.20
0.84
0.00
1.00
0.20
1.24
0.40
1.56
0.60
1.96
0.80
2.44
1.00
3.00
5 MAJOR PREPARATORY
STEPS OF GP
•
•
•
•
•
Determining the set of terminals
Determining the set of functions
Determining the fitness measure
Determining the parameters for the run
Determining the criterion for terminating a run
PREPARATORY STEPS
Objective:
Find a computer program with one input
(independent variable X) whose output
equals the given data
1
Terminal set:
T = {X, Random-Constants}
2
Function set:
F = {+,
3
Fitness:
The sum of the absolute value of the
differences
between
the
candidate
program’s output and the given data
(computed over numerous values of the
independent variable x from –1.0 to +1.0)
4
Parameters:
Population size M = 4
5
Termination:
An individual emerges whose sum of
absolute errors is less than 0.1
-,
*, %}
SYMBOLIC REGRESSION
POPULATION OF 4 RANDOMLY CREATED
INDIVIDUALS FOR GENERATION 0
SYMBOLIC REGRESSION x2 + x + 1
GENERATION 1
Mutant of (c)
Copy of (a)
picking “2”
as mutation
point
First offspring of
crossover of (a)
and (b)
picking “+” of
parent (a) and
left-most “x” of
parent (b) as
crossover points
Second offspring
of crossover of
(a) and (b)
picking “+” of
parent (a) and
left-most “x” of
parent (b) as
crossover points
CLASSIFICATION
GP TABLEAU – INTERTWINED
SPIRALS
Objective:
2
Create a program to classify a given
point in the x-y plane to the red or
blue spiral
Terminal set: T = {X,Y,Random-Constants}
Function set: F = {+,-,*,%,IFLTE}
3
Fitness:
4
5
Parameters: M = 10,000. G = 51
Termination: A program is 100% accurate
1
Accuracy of classification (0 – 194)
BOX MOVER – BEST OF GEN 0
BOX MOVER
GEN 45 – FITNESS CASE 1
GENETIC PROGRAMMING: ON THE
PROGRAMMING OF COMPUTERS BY
MEANS OF NATURAL SELECTION
(Koza 1992)
2 MAIN POINTS FROM 1992 BOOK
• Virtually all problems in artificial intelligence,
machine learning, adaptive systems, and
automated learning can be recast as a search
for a computer program.
• Genetic programming provides a way to
successfully conduct the search for a computer
program in the space of computer programs.
PROGRESSION OF QUALITATIVELY MORE
SUBSTANTIAL RESULTS PRODUCED BY GP
•
•
•
•
•
Toy problems
Human-competitive non-patent results
20th-century patented inventions
21st-century patented inventions
Patentable new inventions
COMPUTER PROGRAMS
• Subroutines provide one way to REUSE code  possibly
with different instantiations of the dummy variables
(formal parameters)
• Loops (and iterations) provide a 2nd way to REUSE code
• Recursion provide a 3rd way to REUSE code
• Memory provides a 4th way to REUSE the results of
executing code
DIFFERENCE IN VOLUMES
D = L0W0H0 – L1W1H1
EVOLVED SOLUTION
(- (* (* W0 L0) H0)
(* (* W1 L1) H1))
AUTOMATICALLY DEFINED
FUNCTION volume
AUTOMATICALLY DEFINED
FUNCTION volume
(progn
(defun volume (arg0 arg1 arg2)
(values
(* arg0 (* arg1 arg2))))
(values
(- (volume L0 W0 H0)
(volume L1 W1 H1))))
AUTOMATICALLY DEFINED
FUNCTIONS (SUBROUTINES)
• ADFs provide a way to REUSE code
• Code is typically reused with different
instantiations of the dummy variables
(formal parameters)
DIVIDE AND CONQUER
DIVIDE AND CONQUER
• Decompose a problem into sub-problems
• Solve the sub-problems
• Assemble the solutions of the subproblems into a solution for the overall
problem
CHANGE OF REPRESENTATION
CHANGE OF REPRESENTATION
• Identify regularities
• Change the representation
• Solve the overall problem
GENETIC PROGRAMMING II: AUTOMATIC
DISCOVERY OF REUSABLE PROGRAMS
(Koza 1994)
MAIN POINTS OF 1994 BOOK
• Scalability is essential for solving non-trivial
problems in artificial intelligence, machine
learning, adaptive systems, and automated
learning
• Scalability can be achieved by reuse
• Genetic programming provides a way to
automatically discover and reuse subprograms
in the course of automatically creating computer
programs to solve problems
MEMORY
Settable
(named)
Indexed
vector
variables memory
Matrix
memory
Relational memory
TRANSMEMBRANE SEGMENT
IDENTIFICATION PROBLEM
(progn
(defun ADF0 ()
(ORN (ORN (ORN (I?) (H?)) (ORN (P?) (G?))) (ORN (ORN (ORN (Y?)
(N?)) (ORN (T?) (Q?))) (ORN (A?) (H?))))))
(defun ADF1 ()
(values (ORN (ORN (ORN (A?) (I?)) (ORN (L?) (W?))) (ORN (ORN
(T?) (L?)) (ORN (T?) (W?))))))
(defun ADF2 ()
(values (ORN (ORN (ORN (ORN (ORN (D?) (E?)) (ORN (ORN (ORN
(D?) (E?)) (ORN (ORN (T?) (W?)) (ORN (Q?) (D?)))) (ORN (K?)
(P?)))) (ORN (K?) (P?))) (ORN (T?) (W?))) (ORN (ORN (E?)
(A?)) (ORN (N?) (R?))))))
(progn (loop-over-residues
(SETM0 (+ (- (ADF1) (ADF2)) (SETM3 M0))))
(values (% (% M3 M0) (% (% (% (- L -0.53) (* M0 M0)) (+ (% (%
M3 M0) (% (+ M0 M3) (% M1 M2))) M2)) (% M3 M0))))))
ADL
ADR
HUMAN-COMPETITIVE RESULTS
(NOT RELATED TO PATENTS)
Transmembrane segment identification problem for proteins
Motifs for D–E–A–D box family and manganese superoxide dismutase family of proteins
Cellular automata rule for Gacs-Kurdyumov-Levin (GKL) problem
Quantum algorithm for the Deutsch-Jozsa “early promise” problem
Quantum algorithm for Grover’s database search problem
Quantum algorithm for the depth-two AND/OR query problem
Quantum algorithm for the depth-one OR query problem
Protocol for communicating information through a quantum gate
Quantum dense coding
Soccer-playing program that won its first two games in the 1997 Robo Cup competition
Soccer-playing program that ranked in the middle of field in 1998 Robo Cup competition
Antenna designed by NASA for use on spacecraft
Sallen-Key filter
PROGRESSION OF QUALITATIVELY MORE
SUBSTANTIAL RESULTS PRODUCED BY GP
•
•
•
•
•
Toy problems
Human-competitive non-patent results
20th-century patented inventions
21st-century patented inventions
Patentable new inventions
AUTOMATIC SYNTHESIS OF
BOTH THE TOPOLOGY AND
SIZING OF ANALOG
ELECTRICAL CIRCUITS
COMPONENT-CREATING FUNCTIONS
•
•
•
•
•
Resistor R function
Capacitor C function
Inductor L function
Diode D function
Transistor Q function (3-leaded)
COMPONENT-CREATING FUNCTIONS
TOPOLOGY-MODIFYING FUNCTIONS
• SERIES division
• PARALLEL division
• VIA
• FLIP
TOPOLOGY-MODIFYING FUNCTIONS
DEVELOPMENT-CONTROLLING
FUNCTIONS
• END function
• NOP (No Operation) function
• SAFE_CUT function
DEVELOPMENTAL GP
DEVELOPMENTAL GP
(LIST (C (– 0.963 (– (– -0.875 -0.113)
0.880)) (series (flip end) (series
(flip end) (L -0.277 end) end) (L (– 0.640 0.749) (L -0.123 end)))) (flip
(nop (L -0.657 end)))))
CAPACITOR-CREATING FUNCTION
(LIST (C (– 0.963 (– (– -0.875
-0.113) 0.880)) (series (flip
end) (series (flip end) (L –
0.277 end) end) (L (– -0.640
0.749) (L -0.123 end)))) (flip
(nop (L -0.657 end)))))
CAPACITOR-CREATING FUNCTION
SERIES DIVISION FUNCTION
(LIST (C (– 0.963 (– (– -0.875
-0.113) 0.880)) (series (flip
end) (series (flip end) (L –
0.277 end) end) (L (– -0.640
0.749) (L -0.123 end)))) (flip
(nop (L -0.657 end)))))
SERIES DIVISION
EVALUATION OF FITNESS
+
IN
z0
OUT
Embryonic Circuit
Program Tree
Fully Designed Circuit (NetGraph)
Circuit Netlist (ascii)
Circuit Simulator (SPICE)
Circuit Behavior (Output)
Fitness
DESIRED BEHAVIOR OF A LOWPASS
FILTER
EVOLVED CAMPBELL FILTER
U. S. patent 1,227,113
George Campbell
American Telephone and Telegraph
1917
EVOLVED ZOBEL FILTER
U. S. patent 1,538,964
Otto Zobel
American Telephone and Telegraph Company
1925
EVOLVED SALLEN-KEY FILTER
EVOLVED DARLINGTON EMITTERFOLLOWER SECTION
U. S. patent 2,663,806
Sidney Darlington
Bell Telephone Laboratories
1953
NEGATIVE FEEDBACK
HAROLD BLACK’S RIDE ON THE
LACKAWANNA FERRY
Courtesy of Lucent Technologies
20th-CENTURY PATENTS
Campbell ladder topology for filters
Zobel “M-derived half section” and “constant K” filter sections
Crossover filter
Negative feedback
Cauer (elliptic) topology for filters
PID and PID-D2 controllers
Darlington emitter-follower section and voltage gain stage
Sorting network for seven items using only 16 steps
60 and 96 decibel amplifiers
Analog computational circuits
Real-time analog circuit for time-optimal robot control
Electronic thermometer
Voltage reference circuit
Philbrick circuit
NAND circuit
Simultaneous synthesis of topology, sizing, placement, and routing
PROGRESSION OF QUALITATIVELY MORE
SUBSTANTIAL RESULTS PRODUCED BY GP
•
•
•
•
•
Toy problems
Human-competitive non-patent results
20th-century patented inventions
21st-century patented inventions
Patentable new inventions
SIX POST-2000 PATENTED
INVENTIONS
EVOLVED HIGH CURRENT LOAD CIRCUIT
REGISTER-CONTROLLED CAPACITOR
CIRCUIT
LOW-VOLTAGE CUBIC CIRCUIT
VOLTAGE-CURRENT-CONVERSION
CIRCUIT
LOW-VOLTAGE BALUN CIRCUIT
TUNABLE INTEGRATED ACTIVE FILTER
21st-CENTURY PATENTED INVENTIONS
Low-voltage balun circuit
Mixed analog-digital variable capacitor circuit
High-current load circuit
Voltage-current conversion circuit
Cubic function generator
Tunable integrated active filter
PROGRESSION OF QUALITATIVELY MORE
SUBSTANTIAL RESULTS PRODUCED BY GP
•
•
•
•
•
Toy problems
Human-competitive non-patent results
20th-century patented inventions
21st-century patented inventions
Patentable new inventions
NOVELTY-DRIVEN EVOLUTION
• Two factors in fitness measure
– Circuit’s behavior
– Largest number of nodes and edges (circuit
components) of a subgraph of the given
circuit that is isomorphic to a subgraph of a
template representing the prior art.
PRIOR ART TEMPLATE
NON-INFRINGING SOLUTION NO. 1
NON-INFRINGING SOLUTION NO. 5
GP AS AN INVENTION MACHINE
AUTOMATIC SYNTHESIS OF
CIRCUIT LAYOUT
INCLUDING THE PLACEMENT OF
COMPONENTS AND ROUTING OF
WIRES
ALONG WITH THE TOPOLOGY
AND SIZING
CIRCUIT LAYOUT
• Circuit placement involves the assignment of
each of the circuit's components to a particular
physical location on a printed circuit board or
silicon wafer.
• Routing involves the assignment of a particular
physical location to the wires between the leads
of the circuit's components.
LAYOUT
LAYOUT — GENERATION 0
100%-COMPLIANT LOWPASS FILTER
GENERATION 25 WITH 5 CAPACITORS AND
11 INDUCTORS  AREA OF 1775.2
100%-COMPLIANT LOWPASS FILTER
BEST-OF-RUN CIRCUIT OF GENERATION
138 WITH 4 INDUCTORS AND 4
CAPACITORS  AREA OF 359.4
REVERSE ENGINEERING OF METABOLIC
PATHWAYS
USING A TURTLE TO DRAW TWODIMENSIONAL ANTENNA
BEST-OF-RUN ANTENNA FROM
GENERATION 90
3-DIMENSIONAL ANTENNA
EVOLVED SORTING NETWORK
GENETIC NETWORK FOR lac
operon
EVOLVED NETWORK
(IF (< LACTOSE_LEVEL 9.139 ) (IF (<
REPRESSOR_LEVEL 6.270 ) (IF (> GLUCOSE_LEVEL
5.491 ) 2.02 (IF (< CAP_LEVEL 0.639 ) 2.033 (IF
(< CAP_LEVEL 4.858 ) (IF (> LACTOSE_LEVEL 2.511 )
(IF (> CAP_LEVEL 7.807 ) 5.586 (IF (>
LACTOSE_LEVEL 2.114 ) 1.978 2.137 ) ) 0.0 ) (IF
(> REPRESSOR_LEVEL 4.015 ) 0.036 (IF (<
GLUCOSE_LEVEL 5.128 ) 10.0 (IF (< REPRESSOR_LEVEL
4.268 ) 2.022 9.122 ) ) ) ) ) ) (IF (> CAP_LEVEL
0.842 ) 0.0 5.97 ) ) (IF (< CAP_LEVEL 1.769 )
2.022 (IF (< GLUCOSE_LEVEL 2.382 ) (IF (>
LACTOSE_LEVEL 1.256 ) (IF (> LACTOSE_LEVEL 1.933
) (IF (> GLUCOSE_LEVEL 2.022 ) (IF (<
GLUCOSE_LEVEL 5.183 ) 6.323 (IF (> CAP_LEVEL
1.208 ) 9.713 0.842 ) ) 10.0 ) (IF (>
GLUCOSE_LEVEL 6.270 ) 2.109 ) 1.965 ) ) 0.665 )
1.982 ) ) )
OTHER STRUCTURES
GENETIC PROGRAMMING III: DARWINIAN
INVENTION AND PROBLEM SOLVING
(Koza, Bennett, Andre, Keane 1999)
SUBROUTINE DUPLICATION
SUBROUTINE CREATION
SUBROUTINE DELETION
ARGUMENT DUPLICATION
ARGUMENT DELETION
16 ATTRIBUTES OF A SYSTEM FOR
AUTOMATICALLY CREATING COMPUTER
PROGRAMS
• Starts with "What needs
to be done"
• Tells us "How to do it"
• Produces a computer
program
• Automatic determination
of program size
• Code reuse
• Parameterized reuse
• Internal storage
• Iterations, loops, and
recursions
• Self-organization of
hierarchies
• Automatic determination of
program architecture
• Wide range of
programming constructs
• Well-defined
• Problem-independent
• Wide applicability
• Scalable
• Competitive with humanproduced results
GENETIC PROGRAMMING PROBLEM
SOLVER (GPPS)
PROGRESSION OF QUALITATIVELY MORE
SUBSTANTIAL RESULTS PRODUCED BY GP
•
•
•
•
•
Toy problems
Human-competitive non-patent results
20th-century patented inventions
21st-century patented inventions
Patentable new inventions
AUTOMATIC SYNTHESIS
OF BOTH THE TOPOLOGY
AND TUNING OF
CONTROLLERS
PARAMETERIZED TOPOLOGY FOR
GENERAL-PURPOSE CONTROLLER
EVOLVED EQUATIONS FOR GENERALPURPOSE CONTROLLER
EVOLVED EQUATIONS FOR GENERALPURPOSE CONTROLLER
PATENTABLE NEW INVENTIONS
PID tuning rules that outperform the Ziegler-Nichols and Åström-Hägglund tuning rules
General-purpose controllers outperforming Ziegler-Nichols and Åström-Hägglund rules
PARALLELIZATION WITH SEMI-ISOLATED
SUBPOPULATIONS
GP PARALLELIZATION
• Like Hormel, Get Everything Out of the
Pig, Including the Oink
• Keep on Trucking
• It Takes a Licking and Keeps on Ticking
• The Whole is Greater than the Sum of
the Parts
PETA-OPS
• Human brain operates at 1012 neurons
operating at 103 per second = 1015 ops per
second
• 1015 ops = 1 peta-op = 1 bs (brain second)
GP 1987–2002
System
Dates
Speed-up
over first
system
HumanProblem Category
competitive
results
Serial LISP
1987–1994
1 (base)
0
toy problems
64 transputers
1994–1997
9
2
human-competitive
results not related to
patented inventions
64 PowerPC’s
1995–2000
204
12
20th-century
patented inventions
70 Alpha’s
1999–2001
1,481
2
20th-century
patented inventions
1,000 Pentium II’s
2000–2002
13,900
12
21st-century
patented inventions
4-week runs on
1,000 Pentium II’s
2002-2003
130,000
2
patentable new
inventions
PROMISING GP APPLICATION AREAS
• Problem areas involving many variables
that are interrelated in highly non-linear
ways
• Inter-relationship of variables is not well
understood
• Discovery of the size and shape of the
solution is a major part of the problem
PROMISING GP APPLICATION AREAS
(CONTINUED)
• "Black art" problems
• Areas where you simply have no idea how
to program a solution, but where you
know what you want
PROMISING GP APPLICATION AREAS
(CONTINUED)
• Problem areas where a good approximate
solution is satisfactory
 design
 control
 bioinformatics
 classification
 data mining
 system identification
 forecasting
PROMISING GP APPLICATION AREAS
(CONTINUED)
 Areas where large computerized databases are
accumulating and computerized techniques are
needed to analyze the data
 genome, protein, microarray data
 satellite image data
 astronomical data
 petroleum databases
 financial databases
 medical records
 marketing databases
PROMISING GP APPLICATION AREAS
(CONTINUED)
 Areas for which humans find it very
difficult to write good programs
 parallel computers
 cellular automata
 multi-agent strategies
 field-programmable game arrays
 digital signal processors
 swarm intelligence
DIFFERENCES BETWEEN GP AND
ARTIFICIAL INTELLIGENCE (AI)
AND MACHINE LEARNING (ML)
APPROACHES
REPRESENTATION
Genetic programming overtly conducts it
search for a solution to the given problem
in program space
POINT-TO-POINT
TRANSFORMATIONS IN THE
SEARCH
Genetic programming does not conduct its
search by transforming a single point in the
search space into another single point, but
instead transforms a set of points into
another set of points
HILL CLIMBING IN THE SEARCH
Genetic programming does not rely
exclusively on greedy hill climbing to
conduct its search, but instead allocates a
certain number of trials, in a principled
way, to choices that are known to be
inferior
DETERMINISM IN THE SEARCH
Genetic programming conducts its search
probabilistically
EXPLICIT KNOWLEDGE BASE
Genetic programming does NOT make use
of a knowledge base
ROLE OF FORMAL LOGIC IN THE
SEARCH
Genetic programming does not utilize
formal logic in it’s search strategy.
Contradictory alternatives are created
and actively maintained.
SOURCE
Genetic programming is biologically
inspired.
RESULTS
• Genetic programming now
routinely delivers high-return
human-competitive machine
intelligence