Columbia Application Performance Tuning Case Studies
NASA Advanced Supercomputing Division
Computer Sciences Corporation
NASA Ames Research Center
Moffett Field, California 94035-1000, USA
jchang@mail.arc.nasa.gov
Received:
Rec. 12 June 2006
DOI: 10.12921/cmst.2006.SI.01.13-21
OAI: oai:lib.psnc.pl:597
Abstract:
This paper describes four case studies of application performance enhancements on the Columbia supercomputer. The Columbia supercomputer is a cluster of twenty SGI Altix systems, each with 512 Itanium 2 processors and 1 terabyte of global sharedmemory, and is located at the NASA Advanced Supercomputing (NAS) facility in Moffett Field. The code optimization techniques described in the case studies include both implicit and explicit process-placement to pin processes on CPUs closest to the processes’ memory, removing memory contention in OpenMP applications, eliminating unaligned memory accesses, and system profiling. These techniques enabled approximately 2- to 20-fold improvements in application performance.
Key words:
Code tuning, memory contention, OpenMP scaling, process-placement, unaligned memory access
References:
[1] Y.-T. Chang and J. Chang, Getting Good Performance on OpenMP and Hybrid MPI+OpenMP Codes on SGI Altix, SGIUG 2005 Technical Conference and Tutorials, June 13-16, 2005, Munich, Germany.
[2] J. Chang, Columbia Application Performance Tuning Case Studies, SGIUG 2006 Technical Conference and Tutorials, June 5-9, 2006, Las Vegas, Nevada.
[3] SGI Altix 3000, http://www.sgi.com/products/servers/altix/3000/
[4] October 26, 2004 press release, http://www.sgi.com/ company_info/newsroom/press_releases/2004/october/worlds_fastest.html, http://news.com.com/SGI+claims+lead+in+supercomputer+race/2100-1010_3-5426813.html?tag=nl
[5] November 5, 2004 press release, http://news.com.com/IBM+set+to+take+supercomputing+crown/2100-1010_3-5439523.html
[6] Top500, http://www.top500.org
[7] B.-W. Shen, R. Atlas, J.-D. Chern, O. Reale, S.-J. Lin, T. Lee, J. Chang, The 0.125 degree finite-volume general circulation model on the NASA Columbia supercomputer:
Preliminary simulations of mesoscale vortices, Geophys. Res. Lett., 33, L05801, doi:10.1029/2005GL024594 (2006).
http://www.agu.org/pubs/crossref/2006/2005GL024594.sht ml
[8] Bron Nelson, private communication.
[9] Art Lazanoff, private communication.
[10] MPI Standard, http://www-unix.mcs.anl.gov/mpi/
[11] Scott Emery, private communication.
This paper describes four case studies of application performance enhancements on the Columbia supercomputer. The Columbia supercomputer is a cluster of twenty SGI Altix systems, each with 512 Itanium 2 processors and 1 terabyte of global sharedmemory, and is located at the NASA Advanced Supercomputing (NAS) facility in Moffett Field. The code optimization techniques described in the case studies include both implicit and explicit process-placement to pin processes on CPUs closest to the processes’ memory, removing memory contention in OpenMP applications, eliminating unaligned memory accesses, and system profiling. These techniques enabled approximately 2- to 20-fold improvements in application performance.
Key words:
Code tuning, memory contention, OpenMP scaling, process-placement, unaligned memory access
References:
[1] Y.-T. Chang and J. Chang, Getting Good Performance on OpenMP and Hybrid MPI+OpenMP Codes on SGI Altix, SGIUG 2005 Technical Conference and Tutorials, June 13-16, 2005, Munich, Germany.
[2] J. Chang, Columbia Application Performance Tuning Case Studies, SGIUG 2006 Technical Conference and Tutorials, June 5-9, 2006, Las Vegas, Nevada.
[3] SGI Altix 3000, http://www.sgi.com/products/servers/altix/3000/
[4] October 26, 2004 press release, http://www.sgi.com/ company_info/newsroom/press_releases/2004/october/worlds_fastest.html, http://news.com.com/SGI+claims+lead+in+supercomputer+race/2100-1010_3-5426813.html?tag=nl
[5] November 5, 2004 press release, http://news.com.com/IBM+set+to+take+supercomputing+crown/2100-1010_3-5439523.html
[6] Top500, http://www.top500.org
[7] B.-W. Shen, R. Atlas, J.-D. Chern, O. Reale, S.-J. Lin, T. Lee, J. Chang, The 0.125 degree finite-volume general circulation model on the NASA Columbia supercomputer:
Preliminary simulations of mesoscale vortices, Geophys. Res. Lett., 33, L05801, doi:10.1029/2005GL024594 (2006).
http://www.agu.org/pubs/crossref/2006/2005GL024594.sht ml
[8] Bron Nelson, private communication.
[9] Art Lazanoff, private communication.
[10] MPI Standard, http://www-unix.mcs.anl.gov/mpi/
[11] Scott Emery, private communication.