[Trilinos-Users] integrate out-of-core matrix implementation with Trilinos

Wed Mar 18 10:04:43 MDT 2015

Hello Richard and Mark,

Thank you very much for your replies. It's indeed great news for us.
I think I prefer Option 1. The less work, the better :)
We are targeting the sparse matrix with billions of rows and columns,
so we probably can't keep too many vectors in memory.
For a matrix of this size, MultiVec will be as large as terabytes if
we want to compute tens of eigenvalues/vectors, right?

I'm kind of hoping to integrate with more linear algebra routines, but
I'm worried about the performance of the memory mapping approach on
SSDs.
Currently, my SSD matrix/vector implementation accesses data from SSDs
explicitly through the filesystem.
Maybe I should seriously think about the memory mapping approach and
see what performance we can get.

Yes, our matrix for QR factorization is also dense and has many more
rows than columns, but I think it has more than 20 columns.
If I want QR factorization, do I have to port Tpetra?
Does Anasazi also use QR factorization internally? Does it mean when I
port Anasazi, I also need to provide a QR implementation?

The Anasazi TOMS paper mentions that it also uses Lapack. I assume
Lapack is only used on very small matrices?

Thanks,
Da

On Wed, Mar 18, 2015 at 10:57 AM, Hoemmen, Mark <mhoemme at sandia.gov> wrote:
>
> On 3/17/15, 5:33 PM, "trilinos-users-request at software.sandia.gov"
> <trilinos-users-request at software.sandia.gov> wrote:
>>Message: 1
>>Date: Tue, 17 Mar 2015 14:01:03 -0400
>>From: Zheng Da <dzheng5 at jhu.edu>
>>To: trilinos-users at software.sandia.gov, maherou at sandia.gov,
>>       hkthorn at sandia.gov,     rblehou at sandia.gov
>>Cc: joshua vogelstein <jovo at jhu.edu>, Randal Burns
>>       <randal at cs.jhu.edu>,    Disa Mhembere <dmhembe1 at jhu.edu>
>>Subject: [Trilinos-Users] integrate out-of-core matrix implementation
>>       with    Trilinos
>>Message-ID:
>>       <CAFLer831az4AJ1K_mOj1U=F-V_69za57LxvG1GJ1QQR6jdMrzA at mail.gmail.com>
>>Content-Type: text/plain; charset="utf-8"
>>
>>Hello,
>>
>>We are a team from Johns Hopkins University. We are implementing SSD-based
>>matrix operations and especially interested in optimizing sparse matrix
>>multiplication on a large SSD array. Our goal is to have matrix operations
>>run at the speed comparable to the in-memory implementation as we did for
>>graph analysis using SSDs (https://github.com/icoming/FlashGraph).
>
> Hi!  This is a VERY interesting project.  As Rich mentioned, you have many
> different implementation options for using Anasazi's iterative
> eigensolvers with your matrices.  Here are some possibilities:
>
>   1. Write your own matrix and vector classes.  Implement
> Anasazi::MultiVecTraits and Anasazi::OperatorTraits for these classes.
> (This does not require Epetra or Tpetra integration.)
>
>   2. If only the matrix lives in SSD, not the vectors, then you may
> implement Tpetra::Operator or Epetra_Operator using your matrix, and use
> Tpetra's or Epetra's vectors.
>
>   3. If you want to explore tighter integration of SSD storage into linear
> algebra classes, let's talk.  Trilinos' Kokkos package (which Tpetra uses)
> has some new facilities that could help you do that, as long as it is
> possible to memory-map SSD (so that we can access it using pointers,
> rather than the file system or some library).  This could be especially
> helpful if you plan to distribute your SSD matrices and vectors over
> multiple MPI processes, because it would save you the trouble of needing
> to implement MPI communication and redistribution.
>
> NONE of these options require reimplementing Epetra or Tpetra.  I have
> ranked the options in increasing order of interesting-ness, and increasing
> order of software development time.  I am especially interested in Option
> 3.
>
>
>>Another matrix decomposition we like to have is QR decomposition. I'm not
>>sure what Trilinos package implements it.
>
> If your matrices are dense and have many more rows relative to columns, we
> might be able to help.  Tpetra has a "tall skinny QR" (TSQR) subpackage
> that implements a parallel QR factorization for this case.  It is
> optimized for <= 20 columns, because that is the case that occurs most
> often in block iterative eigensolvers.  We don't provide a general dense
> parallel QR decomposition.
>
>>Another question is about maintenance. Reimplementing Epetra or Tpetra
>>will
>>be a lot of work. If we do implement it, we also hope our implementation
>>will work with Trilinos' future release. What are your suggestions of
>>making our work more maintainable?
>
>   1. Don't reimplement Epetra or Tpetra (see above) :-)
>   2. Test often (at least weekly) against the development branch of
> Trilinos, using the public git repository
>   3. Test-driven development (write tests before you write code)
>
> This is probably a lot less work than you might think :-)
>
> mfh
>