MODIFICATION OF THE WEIGHTED CHECKSUM METHOD FOR DERIVING FAULT TOLERANT VERSIONS OF THE MAIN LINEAR ALGEBRA ALGORITHMS
Department of Electronics, Technical University of Koszalin
Partyzantów 17, 75-411 Koszalin, Poland
email: oleg@ie. tu. koszalin.pl
Received:
Received 25 November, 2001
DOI: 10.12921/cmst.2002.08.01.79-96
OAI: oai:lib.psnc.pl:530
Abstract:
The modified weighted checksum method is proposed, which can be used for deriving fault tolerant versions of most linear algebra algorithms. The purpose is the detection and correction of calculation errors occurred due to transient hardware faults during algorithm execution. Using the proposed method, the fault-tolerant versions of Jordan-Gauss and Faddeeva algorithms are designed. The computational complexity of new algorithms is increased approximately on O(N2) multiply-add operations in comparison with the original algorithms. However, new algorithms enable to detect and to correct a single error in an arbitrary row or column of input data matrices at the each algorithm step. Hence, it is possible to correct up to N2 and (N2/2 + N • P) single errors during realization of whole Jordan-Gauss and Faddeeva algorithms respectively. Finally, the results of experimental verification of the proposed algorithms are represented.
Key words:
algorithm-based fault tolerance, linear algebra algorithms, weighted checksum method
References:
[1] S. Y. Kung, H. J. Whitehouse, T. Kailath, VLSI and Modern Signal Processing, Prentice-Hall,
Englewood Cliffs, New Jersey (1988).
[2] G. H. Golub, C. F. V. Loan, Matrix Computations, Baltimore: John Hopkins Univ. Press (1983).
[3] S. Y. Kung, VLSI Array Processors. Englewood Cliffs, N.J. Prentice Hall (1988).
[4] M. Cosnard, D. Trystram, Parallel Algorithms and Architectures, International Thomson Computer Press, Boston (1995).
[5] D. K. Faddeev, V. N. Faddeeva, Computational methods of linear algebra, W. H. Freeman and Company (1963).
[6] K. H. Huang, J. A. Abraham, Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput., C-33, 518 (1984).
[7] J. Y. Jou, J. A. Abraham, Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures, Proc. IEEE, 5(74), 732 (1986).
[8] L. Hammond, B. Nayfeh, K. Olukotun, A Single-Chip Multiprocessor, Computer, 30(9), 79 (1997)
[9] R. Wyrzykowski, J. Kanevski, N. Maslennikova, O. Maslennikov, Fault-Tolerant Matrix Decomposition and its Implementation on Processor Arrays, Engineering Simulation, 15(6), 779 (1998), Gordon&Breach Science Publishers, England.
[10] O. Maslennikov, A. Guzinski, J. Kanevski, R. Wyrzykowski, Fault tolerant QR-Decomposition
Algorithm Based on Householder Reflections and its Parallel Implementation, Proc.4-th Int.
Workshop Parallel Numerics’97, Zakopane (Poland), 177 (1997).
[11] O. Maslennikov, J. Kanevski, R. Wyrzykowski, Fault tolerant QR-decomposition algorithm and its parallel implementation, Lecture Notes in Computer Science, D. Pritchardand, J. Reeve (Eds.), Springer, 1470, 798 (1998).
[12] D. E. Schimmel, F. T. Luk, A practical real time SVD machine with multi-level faidt tolerance, SPIE Real time signal processing IX, 698, 142 (1986).
[13] V. S. S. Nair, J. A. Abraham, Real-number codes for faidt-tolerant matrix operations on processor arrays, IEEE Trans.on Comp., 39(4), 426 (1990).
[14] F. T. Luk, H. Park, An analysis of algorithm- based tolerance techniques, SPIE Vol.696, Advanced algorithms and architectures for signal processing, pp. 222-227 (1986).
[15] R. Wyrzykowski, J. Kanevski, O. Maslennikov, Systolic-type implementation of matrix computations based on the Faddeeva algorithm, Proc. IEEE Int. Conf. Massively Parallel Computing Systems, Ischia (Italy), pp. 31-42 (1994).
[16] A. Jennings, J. J. McKeown, Matrix computations, Willey&Sons, Chichester (1992).
[17] J. M. Ortega, Introduction to parallel and vector solution of linear systems, Plenum Press,
New York (1988).
[18] M. Vijay, R. Mittal, Algorithm-based fault tolerance: a review, Microprocessors and Microsystems, 21, 151 (1997).
The modified weighted checksum method is proposed, which can be used for deriving fault tolerant versions of most linear algebra algorithms. The purpose is the detection and correction of calculation errors occurred due to transient hardware faults during algorithm execution. Using the proposed method, the fault-tolerant versions of Jordan-Gauss and Faddeeva algorithms are designed. The computational complexity of new algorithms is increased approximately on O(N2) multiply-add operations in comparison with the original algorithms. However, new algorithms enable to detect and to correct a single error in an arbitrary row or column of input data matrices at the each algorithm step. Hence, it is possible to correct up to N2 and (N2/2 + N • P) single errors during realization of whole Jordan-Gauss and Faddeeva algorithms respectively. Finally, the results of experimental verification of the proposed algorithms are represented.
Key words:
algorithm-based fault tolerance, linear algebra algorithms, weighted checksum method
References:
[1] S. Y. Kung, H. J. Whitehouse, T. Kailath, VLSI and Modern Signal Processing, Prentice-Hall,
Englewood Cliffs, New Jersey (1988).
[2] G. H. Golub, C. F. V. Loan, Matrix Computations, Baltimore: John Hopkins Univ. Press (1983).
[3] S. Y. Kung, VLSI Array Processors. Englewood Cliffs, N.J. Prentice Hall (1988).
[4] M. Cosnard, D. Trystram, Parallel Algorithms and Architectures, International Thomson Computer Press, Boston (1995).
[5] D. K. Faddeev, V. N. Faddeeva, Computational methods of linear algebra, W. H. Freeman and Company (1963).
[6] K. H. Huang, J. A. Abraham, Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput., C-33, 518 (1984).
[7] J. Y. Jou, J. A. Abraham, Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures, Proc. IEEE, 5(74), 732 (1986).
[8] L. Hammond, B. Nayfeh, K. Olukotun, A Single-Chip Multiprocessor, Computer, 30(9), 79 (1997)
[9] R. Wyrzykowski, J. Kanevski, N. Maslennikova, O. Maslennikov, Fault-Tolerant Matrix Decomposition and its Implementation on Processor Arrays, Engineering Simulation, 15(6), 779 (1998), Gordon&Breach Science Publishers, England.
[10] O. Maslennikov, A. Guzinski, J. Kanevski, R. Wyrzykowski, Fault tolerant QR-Decomposition
Algorithm Based on Householder Reflections and its Parallel Implementation, Proc.4-th Int.
Workshop Parallel Numerics’97, Zakopane (Poland), 177 (1997).
[11] O. Maslennikov, J. Kanevski, R. Wyrzykowski, Fault tolerant QR-decomposition algorithm and its parallel implementation, Lecture Notes in Computer Science, D. Pritchardand, J. Reeve (Eds.), Springer, 1470, 798 (1998).
[12] D. E. Schimmel, F. T. Luk, A practical real time SVD machine with multi-level faidt tolerance, SPIE Real time signal processing IX, 698, 142 (1986).
[13] V. S. S. Nair, J. A. Abraham, Real-number codes for faidt-tolerant matrix operations on processor arrays, IEEE Trans.on Comp., 39(4), 426 (1990).
[14] F. T. Luk, H. Park, An analysis of algorithm- based tolerance techniques, SPIE Vol.696, Advanced algorithms and architectures for signal processing, pp. 222-227 (1986).
[15] R. Wyrzykowski, J. Kanevski, O. Maslennikov, Systolic-type implementation of matrix computations based on the Faddeeva algorithm, Proc. IEEE Int. Conf. Massively Parallel Computing Systems, Ischia (Italy), pp. 31-42 (1994).
[16] A. Jennings, J. J. McKeown, Matrix computations, Willey&Sons, Chichester (1992).
[17] J. M. Ortega, Introduction to parallel and vector solution of linear systems, Plenum Press,
New York (1988).
[18] M. Vijay, R. Mittal, Algorithm-based fault tolerance: a review, Microprocessors and Microsystems, 21, 151 (1997).