[Trilinos-Users] [EXTERNAL] Re: direct invertion of a projected matrix (Amesos and Epetra_CrsMatrix?)

Fri Mar 24 02:59:33 EDT 2017

Dear Mike,

That makes perfect sense, thank you for detailed answer. 
I will do some benchmarking on representative problem sizes and see how things compare before going for design decision. 

Sincerely,
Denis.

> On 23 Mar 2017, at 09:18, Heroux, Michael A <maherou at sandia.gov> wrote:
> 
> Denis,
> 
> If your projected matrix problem will only have a few thousand equations and 100 nonzeros per row, it is probably best to redundantly solve the problem on each MPI rank.  I doubt that a distributed memory solver will be much faster, and it will complicated your coding quite a bit.  
> 
> Also, even though you will apply the inverse multiple times, it is still best to compute a factorization and call the triangular solve multiple times.
> 
> It is also not clear that sparse will be better than dense.  With 10% nonzero in the original matrix, call it A, the factors will likely be quite dense.
> 
> For a 1000x1000 matrix with 10% density, storage for A is 8MB dense, 1.2MB sparse (.8MB for matrix values, .4MB for 32-bit column indices).  The dense factorization will be done in-place, so there is no extra storage.  The sparse factorization will be done in a copy, so with any reasonable increase in fill you will soon be using 50% or more total storage of the dense approach.  The sparse solver will do fewer operations (probably), but will not perform as well, assuming you have optimized BLAS and a reasonable version of LAPACK.
> 
> A 1000x1000 matrix dense factorization is fewer than 1B operations.  A good version of DGETRF/DGETRS will perform the factorization and all solves in less than 1 second wall time.
> 
> Solving the dense system redundantly will be the same wall clock time.  If your matrix size is a few thousand, the dense solver may take 10 seconds, but using a highly optimized LAPACK, like the MKL version for example, you can reduce the time.  If you have extra cores available for threaded execution, the time will drop substantially.
> 
> Perhaps I have missed an important detail, but hopefully this helps.
> 
> Mike
> 
> From: Denis Davydov <davydden at gmail.com>
> Sent: Wednesday, March 22, 2017 5:32 AM
> To: Rajamanickam, Sivasankaran
> Cc: Heroux, Michael A; trilinos-users
> Subject: Re: [EXTERNAL] Re: [Trilinos-Users] direct invertion of a projected matrix (Amesos and Epetra_CrsMatrix?)
>  
> Dear Mike and Siva,
> 
> Thanks for the prompt reply.
> 
> Problem sizes I have in mind are from few hundreds (test case with full matrices without accounting for sparsity) to 
> few thousands (planned actual runs with sparse matrices, roughly up to 100 non-zero elements per row).
> 
> I understand that the performance will not be ideal, yet I assume it should be comparable to what I would get
> by using ScaLAPACK directly with full matrices to do LU factorization with MPI, right?
> And most importantly it should be better than what I currently have: 
> a completely sequential matrix with its serial inverse on each MPI core using Lapack.
> 
> p.s. for this application I consider direct solvers because (i) ScaLAPACK was used by other researchers in the same context; 
> (ii) matrix size is still small compared to the index space of FE and I need to use the inverse of a single matrix several times
> in different parts of the algorithm.
> 
> Regards,
> Denis.
> 
>> On 22 Mar 2017, at 01:11, Rajamanickam, Sivasankaran <srajama at sandia.gov <mailto:srajama at sandia.gov>> wrote:
>> 
>> Denis,
>>   As Mike said, the performance may not be what you want. Depending on the problem size, number of MPI ranks and number of factorizations vs number of solvers, you might be better of creating distributed sparse matrix and still use a simple sequential sparse solver (assuming memory is not a bottleneck).  The asymptotic number of operations will come into play very quickly in this case.
>> 
>> -Siva
>>  
>> From: Trilinos-Users <trilinos-users-bounces at trilinos.org <mailto:trilinos-users-bounces at trilinos.org>> on behalf of Heroux, Michael A <maherou at sandia.gov <mailto:maherou at sandia.gov>>
>> Sent: Tuesday, March 21, 2017 6:01 PM
>> To: Denis Davydov; trilinos-users
>> Subject: [EXTERNAL] Re: [Trilinos-Users] direct invertion of a projected matrix (Amesos and Epetra_CrsMatrix?)
>>  
>> *Performance will be quite poor, I think.
>> From: Trilinos-Users <trilinos-users-bounces at trilinos.org <mailto:trilinos-users-bounces at trilinos.org>> on behalf of Heroux, Michael A <maherou at sandia.gov <mailto:maherou at sandia.gov>>
>> Sent: Tuesday, March 21, 2017 6:43 PM
>> To: Denis Davydov; trilinos-users
>> Subject: [EXTERNAL] Re: [Trilinos-Users] direct invertion of a projected matrix (Amesos and Epetra_CrsMatrix?)
>>  
>> Denis,
>> 
>> You can store a full matrix as a Epetra_CrsMatrix and use Amesos.  Performance will be quite, I think.
>> 
>> Can you give more details about the dimensions of the problems?
>> 
>> Also, are you using an optimized BLAS for your LAPACK computations?
>> 
>> Mike
>> 
>> From: Trilinos-Users <trilinos-users-bounces at trilinos.org <mailto:trilinos-users-bounces at trilinos.org>> on behalf of Denis Davydov <davydden at gmail.com <mailto:davydden at gmail.com>>
>> Sent: Tuesday, March 21, 2017 9:38 AM
>> To: trilinos-users
>> Subject: [EXTERNAL] [Trilinos-Users] direct invertion of a projected matrix (Amesos and Epetra_CrsMatrix?)
>>  
>> Dear all,
>> 
>> I would like to have a MPI-parallel inversion of (eventually sparse) matrix of size N being several hundreds.
>> I put “sparse” in brackets, because for starters i need to consider it full. The matrix represents the 
>> projection of a FE operator on a vector subspace. I guess something similar is probably done in 
>> Anasazi in the context of eigensolvers.
>> 
>> So the main question is if I can use Epetra_CrsMatrix but for now set the sparsity to be full?
>> If that is possible, i suppose i should be able to use Amesos direct solvers to get LU factorization
>> of the matrix (say using ScaLAPACK) and then do whatever I want with it.
>> 
>> If there are other classes in Trilinos to do the job better, please let me know.
>> 
>> p.s. currently i do inversion in serial, eventually using Lapack, but this takes too much time.
>> 
>> Sincerely,
>> Denis.
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20170324/b4d6319f/attachment.html>