An Ultrahigh Performance MPI Implementation on SGI® ccNUMA Altix® Systems
Silicon Graphics, Inc.
Received:
Rec. 11 September 2005
DOI: 10.12921/cmst.2006.SI.01.67-70
OAI: oai:lib.psnc.pl:605
Abstract:
The SGI® Message Passing Toolkit (MPT) software has implemented algorithms that provide extremely high-performance message passing on SGI Altix® systems based on the SGI NUMAlink™ interconnect technology. Using Linux® OS infrastructure and SGI XPMEM cross-host memory-mapping software, SGI MPI delivers extremely high MPI performance on shared-memory single host/SMP Altix systems as well as multihost superclusters. This paper outlines the Altix hardware features, OS features, and library software algorithms that have been developed to provide the low-latency and high-bandwidth capabilities. We present high-performance features like direct copy send/receive, collectives, and the ultralow-latency SHMEM™ data transfer library. We include MPI benchmark results, including an MPI ping pong latency that ranges from 1.2 to 2.3 microseconds on a 512-CPU Altix system with 1.5 GHz Intel® Itanium® 2 Processors.
Key words:
Altix, ccNUMA, high bandwidth, low latency, memory access, message passing, MPI, MPT, shared memory
References:
[1] http://www.mpi-forum.org/docs/docs.html
[2] intro_shmem man page at http://docs.sgi.com
[3] http://www.shmem.org
[4] R. Barriuso and A. Knies, SHMEM User’s Guide for Fortran, Cray Research Inc. (June 1994)
[5] Michael Woodacre, Derek Robb, Dean Roe and Karl Feind, The SGI Altix 3000 Global Shared-Memory Architecture, http://www.sgi.com/pdfs/3474.pdf
[6] Exploiting the Scalability and Power of FLUENT: The SGI Message Passing Toolkit on the SGI Altix High-Performance Computing Platform powered by the Intel Itanium 2 Processor; Whitepaper by Intel, Fluent, and SGI; http://www.sgi.com/pdfs/3807.pdf
[7] shmem_ptr() and MPI_SGI_globalptr() man pages at http://docs.sgi.com
The SGI® Message Passing Toolkit (MPT) software has implemented algorithms that provide extremely high-performance message passing on SGI Altix® systems based on the SGI NUMAlink™ interconnect technology. Using Linux® OS infrastructure and SGI XPMEM cross-host memory-mapping software, SGI MPI delivers extremely high MPI performance on shared-memory single host/SMP Altix systems as well as multihost superclusters. This paper outlines the Altix hardware features, OS features, and library software algorithms that have been developed to provide the low-latency and high-bandwidth capabilities. We present high-performance features like direct copy send/receive, collectives, and the ultralow-latency SHMEM™ data transfer library. We include MPI benchmark results, including an MPI ping pong latency that ranges from 1.2 to 2.3 microseconds on a 512-CPU Altix system with 1.5 GHz Intel® Itanium® 2 Processors.
Key words:
Altix, ccNUMA, high bandwidth, low latency, memory access, message passing, MPI, MPT, shared memory
References:
[1] http://www.mpi-forum.org/docs/docs.html
[2] intro_shmem man page at http://docs.sgi.com
[3] http://www.shmem.org
[4] R. Barriuso and A. Knies, SHMEM User’s Guide for Fortran, Cray Research Inc. (June 1994)
[5] Michael Woodacre, Derek Robb, Dean Roe and Karl Feind, The SGI Altix 3000 Global Shared-Memory Architecture, http://www.sgi.com/pdfs/3474.pdf
[6] Exploiting the Scalability and Power of FLUENT: The SGI Message Passing Toolkit on the SGI Altix High-Performance Computing Platform powered by the Intel Itanium 2 Processor; Whitepaper by Intel, Fluent, and SGI; http://www.sgi.com/pdfs/3807.pdf
[7] shmem_ptr() and MPI_SGI_globalptr() man pages at http://docs.sgi.com