[Trilinos-Users] [EXTERNAL] Re: integrate out-of-core matrix implementation with Trilinos

Wed Mar 18 10:36:52 MDT 2015

On 3/18/15, 10:04 AM, "Zheng Da" <zhengda1936 at gmail.com> wrote:

>Hello Richard and Mark,
>
>Thank you very much for your replies. It's indeed great news for us.
>I think I prefer Option 1. The less work, the better :)

This will work fine.  You have three choices for vectors:

  1. Write your matrix class so that it accepts Tpetra::MultiVector or
Epetra_MultiVector input and output, and wrap your matrix class in a
subclass of Tpetra::Operator or Epetra_Operator.  Use the existing Tpetra
or Epetra specializations of Anasazi::MultiVecTraits and
Anasazi::OperatorTraits.

  2. Write your matrix class so that it accepts Tpetra::MultiVector or
Epetra_MultiVector input and output.  Specialize Anasazi::OperatorTraits,
but use the existing Tpetra or Epetra specialization of
Anasazi::MultiVecTraits.

  3. Implement your own multivector class to work with your matrix class.
Write your own specializations of Anasazi::MultiVecTraits and
Anasazi::OperatorTraits.

I would recommend #1.  It should require the least amount of code, as long
as you plan to leave vectors in memory (not on the SSD).  If you want
vectors on the SSD, you need to do #3.

>We are targeting the sparse matrix with billions of rows and columns,
>so we probably can't keep too many vectors in memory.
>For a matrix of this size, MultiVec will be as large as terabytes if
>we want to compute tens of eigenvalues/vectors, right?

It might help to know a little bit more about your eigenproblem.  Could we
talk offline about this?

>I'm kind of hoping to integrate with more linear algebra routines, but
>I'm worried about the performance of the memory mapping approach on
>SSDs.
>Currently, my SSD matrix/vector implementation accesses data from SSDs
>explicitly through the filesystem.
>Maybe I should seriously think about the memory mapping approach and
>see what performance we can get.

You wouldn't have to rely on performance of memory mapping for the
computational kernels.  Here's how it would work:

  1. Write a new Kokkos memory space for a memory-mapped SSD
  2. Optionally, define Kokkos::deep_copy (equivalent of memcpy) specially
for that memory space, to avoid any memory mapping performance issues
  3. Plug in custom computational kernels for the SSD

The main advantage of this approach is that it simplifies code that isn't
performance critical, without a massive performance cost.  We can talk
more about this offline if you are interested.  We haven't explored
plugging memory-mapped SSD storage into Kokkos, but we have done
experiments using Kokkos to interface between e.g., CPU and GPU memory in
the same computational kernel (a sparse mat-vec), where the matrix is too
large to fit in GPU memory.  Kokkos performs well for that case.

>Yes, our matrix for QR factorization is also dense and has many more
>rows than columns, but I think it has more than 20 columns.
>If I want QR factorization, do I have to port Tpetra?
>Does Anasazi also use QR factorization internally? Does it mean when I
>port Anasazi, I also need to provide a QR implementation?

No, you don't need to provide a QR factorization.  Anasazi implements its
own orthogonalization routines.

>The Anasazi TOMS paper mentions that it also uses Lapack. I assume
>Lapack is only used on very small matrices?

Anasazi uses LAPACK to solve the projected problems, not the full problems.

mfh