.TH MVMULT CMU 1/13/93
.SH NAME
mvmult \- multiply sparse matrix by dense vector on the Y-MP
.SH SYNOPSIS
.nf
Setup:
call makedesc(descriptor, pntr, nrows, scratch)
Matrix-vector multiply:
call mvmult(result, matrix, vector, indx, pntr, descriptor, nrows, scratch)
.fi
.SH DESCRIPTION
.PP
.I mvmult
multiplies a sparse matrix stored in compressed row format by a dense vector.
Storage format:
.nf
| 11 21 0 0 51 |
| 21 22 32 52 0 |
A = | 0 32 33 0 0 |
| 0 0 0 44 0 |
| 51 52 0 0 55 |
The matrix is stored using three one-dimensional arrays
MATRIX, INDX, PNTR where
MATRIX = ( 11 21 51 21 22 32 52 32 33 44 51 52 55 )
INDX = ( 1 2 5 2 2 3 5 3 3 4 5 5 5 )
PNTR = ( 1 4 8 10 11 14)
.fi
The arguments are as follows:
.IP descriptor 0.8i
Holds additional information about the internal representation of
.I
matrix.
This array is created by
.I
makedesc
and used by
.I
mvmult.
For a matrix with m rows and n nonzero values, this array
must be at least 3000 + n/64 + m/64 words.
.IP nrows 0.8i
The number of rows in the matrix
.IP scratch 0.8i
a scratch array that must be at least as big as
.I matrix
.IP result 0.8i
a dense vector of length
.I nrows
which will contain the product of
.I matrix
and
.I vector
.IP vector 0.8i
a dense vector of length
.I nrows
.IP matrix 0.8i
the nonzero values of the matrix stored row-by-row (in compressed row format).
.IP indx 0.8i
The column indices of the corresponding elements of the matrix.
.IP pntr 0.8i
An array of nrows+1 indices. PNTR(I) points to the location in MATRIX
of the first element of row I. By convention, PNTR(M+1) contains the
total number of nonzero elements.
.SH METHOD
The implementation is based on segmented scans. MORE HERE.
.SH PERFORMANCE
On the Y-MP, for a matrix with m rows and n nonzero values, the time
for
.I mvmult
is approximately 2.7n + 2.5m clock periods (6 nsec). The time is
independent of the number of elements in each row. The setup time
(for
.I makedesc)
is approximately 3n + 4m clock periods, i.e., the time for one to two
calls to
.I mvmult.
On the C90,
the time
for
.I mvmult
is approximately 1.8n + 1.7m clock periods (4.2 nsec).
.SH COMPILING AND LINKING
To compile the sorting routines, type 'make mvmult.o smvmult.o'. To
link with the routines, add 'mvmult.o smvmult.o' to the end of the
link command.
.SH SEE ALSO
SPARSE(3SCI)
.SH BUGS
Earlier versions have only been tested on a Cray Y-MP. This version
will only work on the Cray Y-MP/C90. Future versions should work
on all types of Y-MP's.
If you would like to be notified of future improvements to the code,
send e-mail to marco.zagha@cs.cmu.edu.
.SH FUTURE WORK
In the future, support may be added for multiple processors. Someday,
the routine may be generalized to support sparse matrix by dense
matrix multiply.
.SH REFERENCE
The technique used is an extension of the segmented scan algorithm in:
Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha, "Scan
Primitives for Vector Computers", in Proceedings of Supercomputing '90,
666-675.
.SH DISTRIBUTION
Copyright (c) 1992 Carnegie Mellon University, Marco Zagha, and Guy Blelloch.
All Rights Reserved.
Permission to use, copy, modify and distribute this software and its
documentation is hereby granted, provided that both the copyright
notice and this permission notice appear in all copies of the
software, derivative works or modified versions, and any portions
thereof, and that both notices appear in supporting documentation.
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR
ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
We request users of this software to return to
Marco Zagha and Guy Blelloch marco.zagha@cs.cmu.edu
School of Computer Science guy.blelloch@cs.cmu.edu
Carnegie Mellon University
5000 Forbes Ave.
Pittsburgh PA 15213-3890
any improvements or extensions that they make and grant Carnegie Mellon
the rights to redistribute these changes.