Transcript PICO-intro

Reconnect ‘04
Introduction to PICO
Cynthia Phillips, Sandia National Laboratories
Joint work with:
Jonathan Eckstein, Rutgers
William E. Hart, Sandia National Laboratories
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,
for the United States Department of Energy under contract DE-AC04-94AL85000.
Parallel Computing Systems
• A set of processors (from 2 up to tens of thousands) working together on
a problem
• communicating by messages (even if hidden from user)
Architectures:
• Grid
• Network of workstations (LAN)
• Beowulf cluster
• Tightly-coupled system
Slide 2
Parallelism in Branch and Bound
Two sources of parallelism in B&B:
• Within subproblems
• Across subproblems
Warning:
• Can solve problems otherwise unsolvable but
– A constant-factor increase in # processors (even 10,000) cannot
overcome exponential growth.
– We still have to be clever
Slide 3
Parallelism Issues for Branch and Bound
• In the best of cases, all the processors are busy all the time doing useful,
independent work
– Overhead (coordination, exchange of data)
– Load balancing
• What to do when the tree is small?
• Tree shape depends on order of node evaluation
– Can lead to slowdown anomalies
– Try to emulate a good serial ordering
– We’d do a lot better with a single processor 1000x faster
Slide 4
Parallel Experimental Algorithmics/Engineering Issues
• Inherent nondeterminism
• Parallel random number generators
– e.g. for randomized algorithms
• Debugging
Slide 5
Solution Options for Integer Programming
• Commercial codes (ILOG’s cplex)
– Good and getting better
– Expensive
– Serial (or modest SMP)
• Free serial codes (ABACUS, MINTO, BCP)
• Modest-level parallel codes (Symphony)
• Grid parallelism (FATCOP)
• In development: ALPS/BiCePs/BLIS
• Massive parallelism: PICO (Parallel Integer and Combinatorial Optimizer)
Note: Parallel B&B for simple bounding: PUBB, BoB/BOB++, PPBB-lib,
Mallba, Zram
Slide 6
Parallel Integer and Combinatorial Optimizer (PICO)
Distributed memory (MPI), C++
• Massively parallel (scalable)
• General parallel Branch & Bound environment
• Portable, flexible
– Serial, small LAN, Cplant, ASCI Red, Red Storm
• Allows exploitation of problem-specific knowledge/structure
• Open Source release
– Always support a free LP solver
Slide 7
PICO Features for Efficient Parallel B&B
• Efficient processor use during ramp-up
• Integration of heuristics to generate good solutions early
• Efficient work storage/distribution
• Load balancing
• Non-preemptive proportional-share “thread” scheduler
• Flexible hub/worker interaction
• Subproblem states with flexible search strategy
• Correct termination
• Early output
Slide 8
What To Do With 9000 Processors and One Subproblem?
Option 1: Presplitting
Make log P branching choices and expand all ways (P problems)
P = # processors
BAD!
Expands many problems that would be fathomed in a serial solution.
Slide 9
PICO MIP Ramp-up
• Serialize tree growth
– All processors work in parallel on a single node
• Parallelize
– LP bounding
– Preprocessing
– Cutting plane generation
– Incumbent Heuristics
– Pseudocost (gradient) initialization
• Work division by processor ID/rank
• Crossover to parallel with perfect load balance
– When there are enough subproblems to keep the processors busy
– When single subproblems cannot effectively use parallelism
Slide 10
Parallel Incumbent Search
• Genetic algorithms
• Decomposition-based methods (general)
• Pivot, cut, and dive general heuristic
• Custom Methods
Slide 11
Interior-Point Method for Solving the Root Problem
• Mehrotra’s predictor-corrector (primal-dual) method
• Iterative method where the computational core of each iteration is the
solution of a linear system with constraint matrix:
AD2 AT
– A is the original LP constraint matrix.
– D is a diagonal matrix that changes each iteration.
• Direct Cholesky Solvers OK for moderate parallelism

• Iterative methods
– Preconditioning is a big issue
– Support theory can help if the matrix has network structure
Slide 12
Resolving LP on Subproblems
Original LP
Feasible region
Cutting plane
(valid inequality or branch constraint)
LP optimal solution
Integer optimal
• Dual simplex is much faster than starting over
• Need parallel dual simplex!
Slide 13
Hubs and Workers
Each hub controls some number of workers (can work itself)
• Setting parameters, can go from fully centralized to fully distributed
Subproblem pools at both the hub and workers
• Heap (best-first), stack (depth-first), queue (breadth-first), custom
• Hubs only keep tokens
Slide 14
Subproblem Movement
Hub
Worker
• When worker has low load or low-quality local pool
Worker
Hub
• Draw back when hub out of work and cluster unbalanced
• Send new subproblem tokens to hub (probabilistically) depending on load
• Probabilistically scatter tokens to a random hub. If load in cluster is high
relative to others, scatter probability increases.
Setting parameters, go from pure master-slave to local
• Tradeoffs: Communication, Processor utilization, approximation of serial search
order
Slide 15
Subproblem Movement/Data Storage
T
Hub
T
?
Worker
T
T
T
T
SP
SP
SP
SP
SP
SP
SP Server
SP
SP
SP
SP
SP Receiver
SP
SP Server
Slide 16
Load Balancing
• Hub pullback
• Random scattering
• Rendezvous
– Hubs determine load (function of quantity and quality)
– Use binary tree of hubs
– Determine donors and receivers, match them, exchange
Slide 17
Non-Preemptive Scheduler is Sufficient for PICO
• Processes are cooperating
• Control is returned voluntarily so data structures left in clean state
– No memory access conflicts, no locks
• PICO has its own “thread” scheduler
– High priority, short threads are round robin and done first
• Hub communications, incumbent broadcasts, sending subproblems
• If these are delayed by long tasks could lead to
– Idle processors
– Processors working on low-quality work
– Compute threads are proportional share (stride) scheduling
• Adjust during computation (e.g. between lower and upperbounding)
Slide 18
Slide 19
Subproblem States
Boundable
Being Bounded
Bounded
Being Separated
Separated
Dead
Handlers: lazy, eager, hybrid, build your own
Slide 20
Early Output
• Problem: If you have to abort a long run, want to know variable settings
for the incumbent
– May be good enough to stop
– Otherwise seed new search with the incumbent value
• PICO will save a new incumbent if
– It is a strict improvement over the last saved value (or is the first)
– A sufficient time has passed since the last write
• Requires a new message-triggered thread in parallel
– Hub, incumbent holder, I/O processor
Slide 21
Serial Class Structure - Inheritance
PICO Core
AMPL Interface
(optional)
Knapsack
Nonlinear
Branch&Prune
PICO MIP CORE
PICO B&C
CORE
• Branching classes - control search
• Branchsub classes - subproblems (tree nodes)
• Problem data classes - derived only
Slide 22
MIP Application
Required Methods for Derived Node Class
• bGlobal( ) - subproblem pointer to branching (search control)
• setRootComputation( ) - create the root of the search tree
• boundComputation( ) - compute subproblem lower bound (for min)
• splitComputation( ) - determine how to partition the subproblem
• makeChild( ) - create a child subproblem from a split parent
• candidateSolution( ) - determine whether a proposed solution is viable
candidate for optimality
Slide 23
Optional Customizations
• Incumbent heuristic
• Incumbent representation/update
• Solution output
• Solution validation
• Preprocessing
• Override default parameters
In MIP:
• Custom cutting planes
• Adjust branching priorities
– Plan to add more complex branching strategies
Slide 24
All PICO’s Parallelism Comes (Almost) For Free
PICO serial Core
PICO parallel Core
Serial application
Parallel application
User must
• Define serial application (debug in serial)
• Describe how to pack/unpack data (using a generic packing tool)
C++ inheritance gives parallel management
User may add threads to
• Share global data
• Exploit problem-specific parallelism
– MIP: pseudocosts
Slide 25
Utilib
• Predates STL
• Abstract data types: arrays, heaps, hash tables, balanced trees
• Random number generators
• Hash tables
– work well for doubles very close in value
• Arrays offer
– Protected access (bounds checking)
– Sharing
• PackBuffer methods facilitate parallelization
Slide 26
Pieces of PICO
PICO requires
• utilib (for data structures, math, etc)
• COIN (an IBM-sponsored optimization interface standard)
– Base interface to LP solvers
• We add more PICO-specific functionality
– Cut generation library
• An LP solver
– Currently support cplex, soplex, CLP
Slide 27
Using A Math Programming Language
• How easily can one bring up applications?
– In our world, applications are a moving target; need agility
Slide 28
A Mathematical Programming Language (AMPL)
Data
Files
AMPL
Solver:
Eg. cplex
Model
Files
• AMPL builds the matrix.
• Nice cross between programming language and LaTeX (math view)
Slide 29
AMPL-PICO Interface
Data
Files
AMPL
Solver:
PICO
Model
Files
Cutting Planes
IP
Exact
LP
Compute
Approximate
Solution
• Write cutting-plane and approximate-solution code using AMPL
variables
• Mapping transparent
Slide 30
AMPL-PICO Interface
Standard AMPL interfaces
AMPL Model
File
AMPL
Software
AMPL Problem
Specification
Files
PICO
Executable
AMPL Solver
Output
AMPL
Software
AMPL Solution
Specification
Files
Customized PICO Interface
AMPL Model
File
Slide 31
Unix
Scripts
Tailored PICO
C++ Files
C++
Compiler
PICO Output
Tailored PICO
Executable
Availability
• PICO will be free under GNU lesser public license
– MIP Requires serial LP solver
• Cplex is expensive, but many companies/universities have it
• CLP is free (through COIN)
• Part of ACRO (A Common Repository for Optimizers)
• http://software.sandia.gov/Acro/
[Need password for CVS checkout; otherwise tarballs]
Slide 32
Open Problems (Wish List)
Tools we’d like to see:
• Parallel matrix generation from a math-programming interface
• Parallel (sparse) dual simplex solver for linear programming
Open algorithms questions:
• Ramp up management: multiple subproblems in parallel
Slide 33
Development Team
Core Team
• Jonathan Eckstein (RUTCOR): PICO core
• Bill Hart (Sandia): scheduler, utilib, AMPL interface, design, etc
• Cindy Phillips (Sandia): MIP layer, MIP applications
Other Developers
• Harvey Greenberg (UCD): preprocessor design
• Vitus Leung (Sandia): preprocessor
• Tod Morrison (UCD, student): soplex interface, porting
• Mikhail Nediak (RUTCOR student, now McMaster): MIP heuristic
• Konrad Borys (RUTCOR student): core templatization, heuristic integration
• Mike Eldred (Sandia): DAKOTA optimization framework
• Ojas Parekh (ex-CMU student, Sandia): soplex interface
• Mario Alleva (Sandia): porting
Slide 34