[Trilinos-Users] Calculating properties of CrsMatrices

Hoemmen, Mark mhoemme at sandia.gov
Sun Feb 8 23:37:35 MST 2015


On 2/8/15, 12:00 PM, "trilinos-users-request at software.sandia.gov"
<trilinos-users-request at software.sandia.gov> wrote:
>Date: Sat, 07 Feb 2015 16:21:28 -0700
>From: "Pate Motter" <pate.motter at colorado.edu>
>To: "" <trilinos-users at software.sandia.gov>
>Subject: [Trilinos-Users] Calculating properties of CrsMatrices
>Message-ID: <4683ce99-6a9f-4588-a5c5-f5553eb39b1f at getmailbird.com>
>Content-Type: text/plain; charset="utf-8"
>
>Hi,?
>
>I am very new to Trilinos and am trying to calculate various properties
>of sparse matrices. Currently, to calculate the maximum row variance?I
>am?reading in a Matrix Market file to a CrsMatrix object. I
>then?use?getGlobalRowCopy to view each row individually and calculate the
>variance using the given entries of that row.?
>
>My two questions are:
>1) The above does not seem efficient,?

Could you express your efficiency concerns more clearly?  For example, are
you worried about copying the entries in and out?  If so, see below.

>what would be the best way to use Trilinos?to view each row's entries and
>work with them??

You may use getGlobalRowView if the matrix does not yet have a column Map,
or getLocalRowView if the matrix does have a column Map.  The matrix has a
column Map if it was given one in its constructor, or if fillComplete was
called on that matrix.  Both of these methods return a view (not a copy)
of the row.  They do not copy any entries of the matrix.

>2) For calculating the column variance, is there a preferred manner to
>access each column? I can view each row individually, but it doesn't seem
>that an equivalent exists for columns.?

That's correct.  There is no "getGlobalColumnCopy" or anything like that.
The reason is that CrsMatrix is stored and distributed (more or less) by
rows.  Furthermore, in the typical use case of a block row distribution of
rows over MPI processes, getting all of the entries in a given column
generally requires communication.

Here is a quick but inefficient way to compute the column variance:

  1. Use Tpetra::RowMatrixTransposer to compute the explicit transpose of
the matrix
  2. Compute the row variance of the result

Here is a possibly more efficient way:

  1. Each process iterates over the rows that it owns.  If it sees an
entry (i,j) in row i, it updates the column variance for column j.
  2. Use MPI communication to compute global column variances for each
column j.

In this approach, it would help to use a numerically stable one-pass
variance algorithm (see e.g., Higham, "Accuracy and Stability of Numerical
Algorithms").  Some statistics textbooks show a one-pass variance
algorithm that can compute entirely wrong answers given reasonable data.
The two-pass approach (compute mean in first pass, and variance in second
pass) is fine.

>void calcRowVariance(RCP<MAT> &A) {
>int dim = A->getGlobalNumCols();

I would not recommend just writing 'int' here.  That makes your code
incorrect if your matrices are large (have over 2 billion columns).

>Teuchos::Array<int> indices(dim);

  1. It's a bad idea to allocate an array this large.  That's as many
entries as the sparse matrix has global columns!  Use
getNumEntriesIn{Local,Global}Row instead; loop over all locally owned rows
to find the max.  You could also use a dynamic allocation strategy (resize
array if not enough entries) but that might be slower, depending on the
matrix's sparsity pattern.

  2. Instead of 'int', it would be better to use MAT::global_ordinal_type
here.  (See above note.)

>Teuchos::Array<double> values(dim);
>std::size_t numEntries = 0;
>double mean, sum, variance, maxVariance = 0.0;
>
>for (int row = 0; row < dim; row++) {
>mean = 0.0; variance = 0.0;
>A->getGlobalRowCopy(row, indices, values, numEntries);
>for (int col = 0; col < numEntries; col++) {
>mean += values[col];
>}?
>mean /= dim;
>for (int col = 0; col < numEntries; col++) {
>variance += (values[col] - mean) * (values[col] - mean);
>}
>variance /= dim;
>if (variance > maxVariance) maxVariance = variance;
>}
>std::cout << maxVariance << std::endl;

If you are building with MPI enabled, you should only print on one MPI
process.  See also above notes.

Best,
Mark Hoemmen



More information about the Trilinos-Users mailing list