Sources of Parallelism and Locality in Simulation

Download Report

Transcript Sources of Parallelism and Locality in Simulation

Distributed, Internet and Grid Computing
1
Distributed Computing
• Current supercomputers are too expensive
• ASCI White (#1 in TOP500) costs more than $110 and needed
a new building
• Few institutions or research groups can afford this level of
investment
• There are more than 500 million PCs around the world
• some as powerful as early 90s supercomputers
• they are idle most of the time (60% to 90%), even when being
used (spreadsheet, typing, printing,...)
• corporations and institutions have hundreds or thousands of
PCs on their networks
Try to harness idle PCs on a network and use them
on computationally intensive problems
2
Entropia network
• Born in 1997 to apply idle computers worldwide to
problems of scientific interest
• In 2 years grew to more than 30,000 computers with
aggregate speed of over 1 Tflop/second
• Several scientific achievements, e.g. Identification of
largest known prime number
• Gone commercial: www.entropia.com and used for
applications from:
• Life sciences
• Financial services
• Product design, etc.
3
SETI @ home project
setiathome.ssl.berkeley.edu
• SETI = Search for ExtraTerrestrial Intelligence
• Started in 1996 to enlist PCs to work on analysing data
from the Arecibo radio telescope
• Good mix of popular appeal and good technology
• Now running on more than ½ million PCs
• delivering ~ 1,200 CPU years per day
• ~ 35 Tflops/sec
• fastest (but special-purpose) computer in the world
4
Folding @ home project
www.stanford.edu/group/pandegroup/Cosm
• Enlists PCs to work on the protein folding problem
• most important problem in modern molecular biology
• From genome to structure:
• Genome sequence of DNA specifies amino acids that make up
proteins, but says little about their functions: what is needed is
how a protein fold (3D structure)
• Protein folding is very fast (microseconds) and complex
• Simulation timescale is of the order of nanoseconds
 10^3 gap  distributed computing
• Currently around 20,000 users
5
Great Internet Mersenne Prime Search
mersenne.org
• Started in 1996 to find large Mersenne Prime numbers
(i.e. primes of the form 2^p – 1)
• 3, 7, 31, 127, 8191,...are Mersenne primes,
corresponding to p=2, 3, 5, 7, 13, ...
• Currently 39 Mersenne primes are known; GIMPS found
the largest 5:
• 2^6972593 - 1 found on June 99
• 2^13466917 - 1 found on November 2001 (current largest;
more than 4 million digits)
• Are there infinitely many Mersenne primes? Not known
• Uses Entropia Network and runs at ~ 3.4 Tflops/sec
6
• More Internet computing projects:
•
•
•
•
Genome @ home genomeathome.stanford.edu
Compute-against-Cancer www.parabon.com/cac.jsp
Fight AIDS @ home www.fightaidsathome.org
Climate simulation www.climate-dynamics.rl.ac.uk
• More Internet computing companies:
• Parabon www.parabon.com
• United Devices www.uniteddevices.com
• See more at www.aspenleaf.com/distributed
7
The GRID
• Internet computing is just a special case of communities
sharing resources to tackle common goals
• Grid technologies: link data, computers, devices and
other resources of teams (from different institutions,
states, countries, continents) into a single virtual
laboratory
• Needed: protocols, services, software kits for flexible
and controlled resource sharing on a large scale
Internet Protocol (TCP-IP)
Grid Protocol ?
8
• Grid Forum is working to create a formal standard:
main tool is
• Globus Toolkit: open-architecture and open-source
infrastructure for Grid applications such as
security, resource management, data access and sharing
• Mostly driven by physics and CS groups (in Europe the
Large Hadron Collider (LHC) at CERN, cost > 2 billion
euro)
•
•
•
•
Global Grid Forum www.gridforum.org
Globus Project www.globus.org
Grid Physics Network project www.griphyn.org
European Data Grid eu-datagrid.web.cern.ch
9