A scalability Study of SGI Clustered XFS Using HDF Based AMR Application
Saini Subhash, Talcott Dale, Yeung Herbert, Myers George, Ciotti Robert
Terascale Systems Group
NASA Ames Research Center
Moffett Field, California 94035-1000, USA
e-mail: {ssaini/dtalcott/hyeung/gmyers/ciotti}@mail.arc.nasa.gov
Received:
Rec. 12 July 2006
DOI: 10.12921/cmst.2006.SI.01.47-54
OAI: oai:lib.psnc.pl:602
Abstract:
Many large-scale parallel scientific and engineering applications, especially climate modeling, often run for lengthy periods and require data checkpointing periodically to save the state of the computation for a program restart. In addition, such applications need to write data to disks for post-processing, e.g., visualization. Both these scenarios involve a write-only pattern using Hierarchal Data Format (HDF) files. In this paper, we study the scalability of CXFS by HDF based Structured Adaptive Mesh Refinement (AMR) application for three different block sizes. The code used is a block-structured AMR hydrodynamics code that solves compressible, reactive hydrodynamic equations and characterizes physics and mathematical algorithms used in studying nuclear flashes on neutron stars and white dwarfs. The computational domain is divided into blocks distributed across the processors. Typically, a block contains 8 zones in each coordinate direction (x, y, and z) and a perimeter of guard cells (in this case, 4 zones deep) to hold information from the neighbors. We used three different block sizes of 8 × 8 × 8, 16 × 16 × 16, and 32 × 32 × 32. Results of parallel I/O bandwidths (checkpoint file and two plot files) are presented for all three-block sizes on a wide range of processor counts, ranging from 1 to 508 processors of the Columbia system.
Key words:
adaptive mesh refinement, AMR., benchmarking, clustered file system (CXFS), HDF5, parallel I/O, performance evaluation
References:
[1] Global Modeling and Assimilation Office, http://gmao.gsfc.nasa.gov/ (2006).
[2] The Weather Research and Forecasting (WRF) Model, http://www.wrf-model.org/index.php (2006).
[3] HDF5, http://hdf.ncsa.uiuc.edu/HDF5/ (2006).
[4] The National Center for Atmospheric Research (NCAR), http://www.ncar.ucar.edu/ (2006).
[5] WRF, http://www.nsf.gov/ (2006).
[6] CXFS, http://www.sgi.com/products/storage/tech/file_systems.html.
[7] S. Saini Hot Chips and Hot Interconnects for High End Computing Systems, M4, IEEE SC 2004, Pittsburgh (2004).
[8] S. Saini, Performance Comparison of Columbia 2048 and IBM Blue Gene/L, SGIUG 2005 Technical Conference and Tutorials, June 13-16, 2005 – Munich (2005).
[9] B. Fryxell, K. Olson, P. Ricker, F. X. Timmes, M. Zingale, D. Q. Lamb, P. MacNeice, R. Rosner, J. W. Truran and H. Tufo, FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes, The Astrophysical Journal Supplement Series, 131, 273-334 (2000).
[10] P. MacNeice, K. M. Olson, C. Mobarry, R. de Fainchtein, & C. Packer, PARAMESH: A parallel adaptive mesh refinement community toolkit, Comput. Phys. Commun. 126, 330 (2000).
[11] S. A. Jarvis, D. P. Spooner, H. N. Lim, C. Keung, J. Cao, S. Saini and G. R. Nudd, Performance Prediction and its Use in Parallel and Distributed Computing Systems, Future Generation Computer Systems special issue on System Performance Analysis and Evaluation, (in press) (2006).
[12] S. Saini, R. Ciotti, T. N. Gunney, T. E. Spelce, A. Koniges, D. Dossa, P. Adamidis, R. Rabenseifner, S. R. Tiyyagura, M. Mueller and R. Fatoohi, Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks, IPDPS 2006, PMEO, April 25-29, Rhodes, Greece (2006).
[13] S. Saini and R. Fatoohi and R. Ciotti, Interconnect Performance Evaluation of SGI Altix 3700 BX2 Cray X1, Cray Opteron Cluster, and Dell PowerEdge, IPDPS 2006, PMEO, April 25-29, Rhodes, Greece (2006).
[14] S. Saini, R. Ciotti, T. N. Gunney, T. E. Spelce, A. Koniges, D. Dossa, P. Adamidis, R. Rabenseifner, S. R. Tiyyagura, M. Mueller and R. Fatoohi, Performance Comparison of Cray X1 and Cray Opteron Cluster with Other Leading Platforms Using HPCC and IMB Benchmarks, CUG 2006, May 8-11, 2006 Lugano, Switzerland (2006).
[15] S. Saini, P. Adamidis, R. Fatoohi, J. Chang and R. Ciotti, Performance Analysis of Cray X1 and Cray Opteron Cluster, CUG 2006, May 8-11, 2006 Lugano, Switzerland (2006).
[16] B. B. Karki, V. Yerraguntla, H. Kikuchi, and S. Saini, A Parallel Molecular Dynamics Algorithm for Polycrystalline Minerals, The 2005 International MultiConference in Computer Science & Computer Engineering Las Vegas, Nevada, USA, June 27-30, 2005 MSV 2005: 201-207 (20050
[17] R. Biswas, M. J. Djomehri, R. Hood, H. Jin, C. Kiris and S. Saini, An Application-Based Performance Characterization of the Columbia Supercluster, IEEE/ACM SC 2005: 26 (2006).
[18] S. Saini, R. Biswas, S. Gavali, H. Jin, D. C. Jespersen, M. J. Djomehri and N. Madavan, NAS Experience with the Cray X1, CUG 2005, May 16-19, Albuquerque, New Mexico, USA (2006).
[19] S. Saini and D. Talcott, unpublished (2006).
Many large-scale parallel scientific and engineering applications, especially climate modeling, often run for lengthy periods and require data checkpointing periodically to save the state of the computation for a program restart. In addition, such applications need to write data to disks for post-processing, e.g., visualization. Both these scenarios involve a write-only pattern using Hierarchal Data Format (HDF) files. In this paper, we study the scalability of CXFS by HDF based Structured Adaptive Mesh Refinement (AMR) application for three different block sizes. The code used is a block-structured AMR hydrodynamics code that solves compressible, reactive hydrodynamic equations and characterizes physics and mathematical algorithms used in studying nuclear flashes on neutron stars and white dwarfs. The computational domain is divided into blocks distributed across the processors. Typically, a block contains 8 zones in each coordinate direction (x, y, and z) and a perimeter of guard cells (in this case, 4 zones deep) to hold information from the neighbors. We used three different block sizes of 8 × 8 × 8, 16 × 16 × 16, and 32 × 32 × 32. Results of parallel I/O bandwidths (checkpoint file and two plot files) are presented for all three-block sizes on a wide range of processor counts, ranging from 1 to 508 processors of the Columbia system.
Key words:
adaptive mesh refinement, AMR., benchmarking, clustered file system (CXFS), HDF5, parallel I/O, performance evaluation
References:
[1] Global Modeling and Assimilation Office, http://gmao.gsfc.nasa.gov/ (2006).
[2] The Weather Research and Forecasting (WRF) Model, http://www.wrf-model.org/index.php (2006).
[3] HDF5, http://hdf.ncsa.uiuc.edu/HDF5/ (2006).
[4] The National Center for Atmospheric Research (NCAR), http://www.ncar.ucar.edu/ (2006).
[5] WRF, http://www.nsf.gov/ (2006).
[6] CXFS, http://www.sgi.com/products/storage/tech/file_systems.html.
[7] S. Saini Hot Chips and Hot Interconnects for High End Computing Systems, M4, IEEE SC 2004, Pittsburgh (2004).
[8] S. Saini, Performance Comparison of Columbia 2048 and IBM Blue Gene/L, SGIUG 2005 Technical Conference and Tutorials, June 13-16, 2005 – Munich (2005).
[9] B. Fryxell, K. Olson, P. Ricker, F. X. Timmes, M. Zingale, D. Q. Lamb, P. MacNeice, R. Rosner, J. W. Truran and H. Tufo, FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes, The Astrophysical Journal Supplement Series, 131, 273-334 (2000).
[10] P. MacNeice, K. M. Olson, C. Mobarry, R. de Fainchtein, & C. Packer, PARAMESH: A parallel adaptive mesh refinement community toolkit, Comput. Phys. Commun. 126, 330 (2000).
[11] S. A. Jarvis, D. P. Spooner, H. N. Lim, C. Keung, J. Cao, S. Saini and G. R. Nudd, Performance Prediction and its Use in Parallel and Distributed Computing Systems, Future Generation Computer Systems special issue on System Performance Analysis and Evaluation, (in press) (2006).
[12] S. Saini, R. Ciotti, T. N. Gunney, T. E. Spelce, A. Koniges, D. Dossa, P. Adamidis, R. Rabenseifner, S. R. Tiyyagura, M. Mueller and R. Fatoohi, Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks, IPDPS 2006, PMEO, April 25-29, Rhodes, Greece (2006).
[13] S. Saini and R. Fatoohi and R. Ciotti, Interconnect Performance Evaluation of SGI Altix 3700 BX2 Cray X1, Cray Opteron Cluster, and Dell PowerEdge, IPDPS 2006, PMEO, April 25-29, Rhodes, Greece (2006).
[14] S. Saini, R. Ciotti, T. N. Gunney, T. E. Spelce, A. Koniges, D. Dossa, P. Adamidis, R. Rabenseifner, S. R. Tiyyagura, M. Mueller and R. Fatoohi, Performance Comparison of Cray X1 and Cray Opteron Cluster with Other Leading Platforms Using HPCC and IMB Benchmarks, CUG 2006, May 8-11, 2006 Lugano, Switzerland (2006).
[15] S. Saini, P. Adamidis, R. Fatoohi, J. Chang and R. Ciotti, Performance Analysis of Cray X1 and Cray Opteron Cluster, CUG 2006, May 8-11, 2006 Lugano, Switzerland (2006).
[16] B. B. Karki, V. Yerraguntla, H. Kikuchi, and S. Saini, A Parallel Molecular Dynamics Algorithm for Polycrystalline Minerals, The 2005 International MultiConference in Computer Science & Computer Engineering Las Vegas, Nevada, USA, June 27-30, 2005 MSV 2005: 201-207 (20050
[17] R. Biswas, M. J. Djomehri, R. Hood, H. Jin, C. Kiris and S. Saini, An Application-Based Performance Characterization of the Columbia Supercluster, IEEE/ACM SC 2005: 26 (2006).
[18] S. Saini, R. Biswas, S. Gavali, H. Jin, D. C. Jespersen, M. J. Djomehri and N. Madavan, NAS Experience with the Cray X1, CUG 2005, May 16-19, Albuquerque, New Mexico, USA (2006).
[19] S. Saini and D. Talcott, unpublished (2006).