16_soman_FOSS4G_2k9

Download Report

Transcript 16_soman_FOSS4G_2k9

Parallelizing GIS applications for IBM Cell Broadband engine
and x86 Multicore platforms
Bharghava R, Jyothish Soman, K S Rajan ([email protected])
International Institute of Information Technology, Hyderabad, India
(This is an ongoing project (Aug 2009 onwards) at the IIIT-H, with Hardware Support from IBM)
•The Limitations in VLSI Technology has resulted in minimal gains from increasing the compute
power of a single processor cores. Moore’s Law of 2X increase in performance every 18 months
seems unachievable.
10000
•All leading processor development companies
have shifted focus to design and development
of Multicore processors, which are aimed at tackling
the 3 walls as mentioned below.
Performance (vs. VAX-11/780)
??%/year
1000
52%/year
100
Existing Parallel Platforms
•Clusters made of Off the Shelf Components (COTS).
•This platform consists of multiple CPUs, communication
over a LAN. Supercomputer are also based on Clusters.
•Overhead for installation and maintenance is huge.
•Wide availability, and easy to design
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
Uniprocessor Performance (SPECint)
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach
•Computer Architecture has hit 3 walls:
•Power Wall - Power is expensive, but transistors are “free”
•Can put more transistors on a chip than have the power to turn on
•Memory Wall - Loads and stores are slow, but multiplies fast
•200 clocks to DRAM, but even FP multiplies only 4 clocks
•ILP Wall - Diminishing returns on finding more Instruction Level Parallelism
•Power Wall + Memory Wall + ILP Wall = Brick Wall
•In the single core era, software developers took the architecture for granted and relied on efficient
compilers. The same approach cannot be followed for multicore processors, as compiler
technology has not advanced as much as its hardware counterpart. Also, existing compilers for
parallelizing code make use of inherent parallelism alone, which is not sufficient to exploit the
compute power of parallel systems.
•Coprocessors
•This provides application specific acceleration.
E.g.. General Purpose Graphics Processing Units (GPGPU)
Good performance for massively parallel algorithms.
•Best Performance/dollar for specific applications, which
are limited in number.
•Suffers from memory limitation
•Multicore Architectures
•These consists of multiple instances of simple processing cores,
wither homogeneous or heterogeneous. These can also be
part of the above two system architectures. Processors which fall
into this category are: Intel/AMD multicores, Cisco Metro,
IBM Power Series, STI Cell Broadband Engine, Sun Niagara T1/T2
Cell Processor – Advantages and Feasibility
Parallelization of GIS Applications
•Developed through joint collaboration between Sony, IBM, Toshiba
•1 Cell Processor core consists of
•1 PPE core – Control processor, compatible with Power 64b
•8 computer intensive SPE cores
•SPEs also have a DMA engine for data movement
•Element Interconnect Bus (EIB) – For communication
•Token Ring architecture, 8 concurrent rings.
•Due to suboptimal compiler based parallelization, most of the parallel algorithms are manually written,
and have to be tailor-made for the host processor architecture.
•Also, programmers are used to thinking more on terms of ILP. Developing parallel algorithms in itself
is a daunting task.
•There are parallelizing frameworks available for the 3 categories mentioned above:
•MPI for Clusters
•CUDA SDK for NVIDIA GPUs, and Stream SDK for AMD GPUs
•OpenMP for x86 based multicore processors
•Commercial Availability
•Blade Server
•2 Cell cores +16 GB RAM per blade
•High Installation & Maintenance costs
•Playstation 3
•1 Cell core, 6 usable SPEs (one is used for the GAME OS,
The other is unused to increase yield)m 256 MB RAM
•Low cost system, but limited memory.
•Graphics processor not open to programming.
•Add-on PCI-e cards
•1 Cell core + 1 GB RAM
•Easy installation. High cost.
References
[1] "The landscape of parallel computing research: A view from berkeley", Electrical Engineering and Computer Sciences, University of California at Berkeley,
Technical Report No. UCB/EECS-2006-183, December.
[2] Hazel, T., Toma, L., Vahrenhold, J., and Wickremesinghe, R. 2008. Terracost: Computing least-cost-path surfaces for massive grid terrains. J. Exp.
Algorithmics 12 (Jun. 2008), 1-31.
[3] Neteler, M. and Mitasova, H. "Open source GIS: a GRASS GIS approach", Kluwer Academic Pub. http://grass.itc.it/
According to “A Berkeley View: A New Framework & a New Platform for Parallel Research” [1], there
are 14 motifs, or parallelization patterns in majority of scientific applications. GIS Applications are
inherently data parallel, and can exploit current parallel architectures.
Algorithms pertaining to Geospatial Information Systems (GIS) focus on the following categories:
•Network Analysis / Graph Traversal
•Nearest Neighbour Searches
•Dense Linear Algebra
•Sparse Linear Algebra
•Structured Grids
•As an example, parallel Teracost [2,3] algorithm is implemented in STI’s Cell Broadband Engine
(CBE), and r.cost has been implemented on stock Intel multicore processors. r.mapcalc is used to
show the raw performance improvements of embarrassingly parallel applications.
•Preliminary Results: We achieved a speedup of ~6 for MapCalc over a single processor
implementation. The absence of linear speedup is due to the communication overhead of the Element
Interconnect Bus. This can be masked further by buffering.