[Trilinos-Users] Matvec-operation for distributed memory machines/MPI and dense matrix/vector

Tue Aug 20 05:28:45 EDT 2019

Hei,

my problem is rather simple: I have a constant dense matrix (double,
complex), which has to be multiplied with a vector (without being
related to Krylov-methods), i.e. similar to the . When running my
program using MPI threads, I can decide between doing the multiplication
only on one thread, leaving the others without work during the
multiplication, and distributing the result afterwards, or distributing
the matrix once beforehand, and do the multiplication on each thread
independent from each other. The latter approach will blow my memory for
certain matrix sizes, though. Thus, to distribute the work more evenly
without blowing my memory, I intend to distribute the multiplication
onto all MPI threads (and thereby resulting in my question).

I can rewrite my matrix into a triagonal matrix, if that would help,
though? Currently I am not able to use threads, unfortunately, due to
other constraints in my program. A possible extension to CUDA
(especially multi-GPU) would be nice, even though I am currently using
another approach for realizing that.

Thanks!

Regards,

Roland

Am 12.08.2019 um 18:54 schrieb Heroux, Michael A:
> Roland,
>
> What are the details of your dense matrix use?  
>
> If you are looking for a function to work with a square or nearly-square dense matrix, Trilinos doesn't have a ready-to-use approach.  For MPI-only execution (no GPU or shared memory parallel execution under MPI), we do have a package called Pliris that provides dense LU factorization for square matrices with an optimal distribution.  There is surely a function in Pliris for computing a dense matvec, but I don't think that it is exposed as a separate function.  If your matrix is square or nearly square, a sqrt(p) by sqrt(p) distribution of the matrix is typically optimal, where p is the number of MPI ranks.  We do not have easy-to-use functionality for this kind of distribution.  Outside of Pliris, which is holistically focused on dense LU, we don't expose basic kernels for sqrt(p) by sqrt(p).
>
> If you are looking for support for dense matvec in the context of block Krylov methods or something similar, the Epetra package provides this with good performance (when linked with optimized BLAS).  In this case, the dense matrix is strongly rectangular with many more rows than columns, or columns than rows.  The matrix has a 1D distribution over the number of MPI ranks, with typically m/p rows per MPI rank, where m is the number of rows and p is the number of MPI ranks (or the equivalent when the number of columns >> number of rows).
>
> The class that provides this functionality is Epetra_MultiVector and the key method is called Multiply().  If the BLAS you link to have shared memory parallel thread support, e.g., MKL blas from Intel, you can also see multicore performance improvements under MPI.
>
> If you want explicit support for threads or GPUs, the Trilinos package Tpetra provides a MultiVector class with this capability in combination with Kokkos, but this support requires a fairly significant ramp up for most users.
>
> I hope this is helpful.
>
> Mike
>
> On 8/7/19, 6:09 AM, "Trilinos-Users on behalf of Roland Richter" <trilinos-users-bounces at trilinos.org on behalf of roland.richter at ntnu.no> wrote:
>
>     Hei,
>     
>     I would like to use Trilinos for MatVec-operations (i.e. A*x = b), with
>     A and x a distributed dense matrix/vector. The whole operation should
>     run using MPI threads. I found possible approaches for sparse matrices,
>     but none so far for dense matrices. Are there options of doing that in
>     Trilinos (and maybe some examples)?
>     
>     Thanks!
>     
>     
>     _______________________________________________
>     Trilinos-Users mailing list
>     Trilinos-Users at trilinos.org
>     https://trilinos.org/mailman/listinfo/trilinos-users
>     
>