Parallel Machine Scheduling with Sequence

Download Report

Transcript Parallel Machine Scheduling with Sequence

Utilizing Condor to Support Genetic
Algorithm Design Research
Mary Beth Kurz, PhD
Assistant Professor
Department of Industrial Engineering
Clemson University
Genetic algorithms are metaheuristics
for optimization
evaluation
Feasible
solutions
Solution space
Objective space
Infeasible
solutions
Chromosome 1
decoding
Chromosome 2
Chromosome N
Chromosome space
2
My research focuses on what
chromosomes should look like
and asks whether the “solution
representation” impacts the
quality of a genetic algorithm
Let’s make this more concrete: the TSP
Feasible Solutions:
tours that visit all
cities exactly once
evaluation
Solution space
Each feasible solution’s
total travel time is that
solution’s objective
Objective space
Infeasible
Solutions:
anything else
Chromosome 1
decoding
Chromosome 2
Chromosome N
Chromosome space
3
The Traveling Salesman Problem
asks how to route the salesman
through his cities so he returns home
as quickly as possible
How do I represent the tours?
• Directly – the city list?
• Indirectly – the list of roads taken?
My Hypothesis: Solution representation
affects GA design significantly
Feasible
solutions
Solution space
No chromosomes
map to these
solutions
Optimal
objective
value
Objective space
Infeasible
solutions
Chromosome 1
Chromosome 2
This is the optimal
solution!!!
Fix or forbid these
chromosomes?
Chromosome space
4
Kurz
Genetic algorithms are motivated by an
analogy to “real” genetics
A chromosome is
composed of genes,
generally randomly
selected initially
Chromosome 1
Chromosome 2
Chromosome N
Genetic Operators
Randomness comes here
Population(t)
Selection picks
Crossover
some creates
Mutation changes a
chromosomes
new as
chromosomes
by
small number
of genes
potential parents
taking genes
in infrom
2
the entire
population
crossover parents
5
Chromosome 1
Chromosome 2
Chromosome N
Population(t+1)
Kurz
This research is empirical and requires
immense computational time
 Genetic Algorithms are inherently random
 Is it possible that some representation consistently finds better





6
solutions for a specific problem?
Most GA research currently uses 50 replications on numerous
data files
180 problem types, 10 files each, 3 representations = 5400 files
Simplest representation – 1800 files would take about 45 hours
in my Lab (a few years ago)
50 replications of 5400 files → at least 241 days of running
time!
This is simply not feasible
Kurz
Grid computing is saving me
7
Spring 2007
325,000 hrs
Summer – Fall 2007
212,000 hrs
Spring 2008
124,000 hrs
Total: about 660,000 hrs
Kurz
Since last spring, I’ve had to relearn
how to do research!
 How do I compile all this data?
 VBA and Excel!
120
100
80
 What can I actually analyze?
60
 Not pictures like this
40
 What statistics do I need to use?
20
0
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
208
217
 Reduce the data to correlations
 Needed to learn non-parametric statistics
 Needed to use SPSS for the analysis
 Used VBA to create the input files
 Reran to get different output data
 Summer – Fall 2007
8
212,000 hrs
Kurz
I don’t know about random numbers
 I started using rand() in C!
 I use up to 600,000,000 random numbers in each run
 I have 270,000 runs (5400 * 50)
 Trying to use Mersenne Twister
 Period is 219937 – 1 which is plenty big
 How do I make sure I have independent sets of random numbers?
 Use the same initial seed, then “burn” through (n-1) numbers until we
get to the nth set
 Would take over 4000 days to burn through 269,999 sets for the
last run
 Again … not feasible!
 Tried to initialize using run number
9
 Spring 2008
124,000 hrs
Kurz
I still love Condor
 But I don’t know about random numbers
 Thought about saving the random numbers in an input file of
600,000,000 numbers each
 Stopped generating the first file after it got to 3 GB
 This would mean 3*270,000 GB of random number files!
 Looking at dynamic streams from Mersenne Twister
 Just heard about SPRNG from Todd on Tuesday
 Gearing up for another set of runs … all I need is this set of
runs to get a paper out!
10
Kurz