[Trilinos-Users] SuperLU_dist, KLU and reordering

Phipps, Eric T etphipp at sandia.gov
Fri Nov 28 10:54:54 MST 2008

Hi Jonas,

I have seen good performance out of MUMPS for CG and DG discretizations of Navier-Stokes compared to SuperLU and KLU.  However compiling MUMPS is a serious pain in the neck.  It requires BLACS and Scalapack, and depending on what compiler/mpi library you are using, getting BLACS to work can be a nightmare.


On 11/27/08 12:43 PM, "Boman, Erik G" <egboman at sandia.gov> wrote:

Hi Jonas,

There are several algorithmic differences between KLU and SuperLU/SuperLU_dist, so hard to know what you're seeing.  One feature that sometimes makes KLU 10X faster, is that it will reorder the matrix into Block Triangular Form (BTF), if possible. SuperLU doesn't do this. BTF is particularly helpful if your matrix is reducible (and thus decouples).

I believe SuperLU-dist will always redistribute the matrix initially since it uses its own 2D data distribution.  Usually this is not a bottleneck, and the savings in the factorization phase are worth it.

I know some people like MUMPS, but I have no experience myself. Could be worth a try.


From: trilinos-users-bounces at software.sandia.gov [trilinos-users-bounces at software.sandia.gov] On Behalf Of Jonas Thies [J.Thies at rug.nl]
Sent: Thursday, November 27, 2008 5:54 AM
Cc: Trilinos Users
Subject: [Trilinos-Users] SuperLU_dist, KLU and reordering


can someone give me a hint on SuperLU_Dist in Amesos?

One of my systems is currently solved sequentially using KLU,
which is very efficient for some reason. I'd like to do it in parallel
now, but SuperLU seems a lot slower even sequentially.
So I was wondering if KLU does some reordering by default which SuperLU
doesn't. I'm using the default settings for both solvers, which flags
are worth trying? Or is there a fundamental algorithmic difference that
could make KLU about ten times faster than SuperLU for a particular
sparse nonsymmetric real matrix?
Finally I noticed that SuperLU_dist always seems to distribute the
matrix even if I pass in a distributed matrix, is that intended or a
wrong parameter setting?
Is it a good idea to use SuperLU_dist or are other parallel sparse
solvers in Amesos a better choice?


Heroux, Michael A wrote:
> Daniele,
> If I understand what you are doing, then one way to eliminate the extra
> copy of x values is to use View mode with the Epetra_Vector objects.
>  You will still have overlappedXExtMap and xExtMap, and the
> corresponding vectors overlappedXExt and xExt.  Furthermore, you would
> create overlappedXExt in the standard way.  However, when you create
> xExt, you should pass in the pointer to the first value in
> overlappedXExt using View mode.
> If you do this, Epetra's import function is smart enough to detect that
> all but the last value are the same address and it will not do a
> redundant copy.
> A further optimization would be to avoid the import, since you are
> really doing a broadcast.  You could use the Broadcast() method on
> Epetra_Comm to send to all processors and pass a pointer to the end of
>  overlappedXExt vector as the target buffer for the broadcast.
> I hope this makes sense.
> Mike
> On 11/24/08 6:53 AM, "Daniele Avitabile" <d.avitabile at gmail.com> wrote:
>     Hi everybody,
>     the solution to the PDEs I am solving is stored in an Epetra_Vector
>     x, based on a certain linear Epetra_Map xMap. Now, I need to add an
>     extra ODE to my system, defined on the boundary. At the present
>     stage, whenever I want to compute the right-hand side of my PDE, I
>     need to access a new "bordered" Epetra_Vector xExt, based on a new
>     map Epetra_Map xExtMap.
>     The problem is that, from each processor, I need access to the last
>     component of xExt, the one on the boundary.
>     The way I am doing this is
>     1. define an Epetra_Map overlappedXExtMap. All the GID of xExtMap
>     are also contained in overlappedXExtMap, plus the last component of
>     xExt is replicated on all processors.
>     2. define an importer with xExtMap as source and overlappedXExtMap
>     as target
>     3. each time I evaluate the right-hand side of my equation, use the
>     importer.
>     I rely on the importer massively, because I also use it for printing
>     purposes: I want to separate the solution in the interior of my PDE
>     from the additional scalar component on the boundary.
>     My questions:
>     a. do you have a more efficient way to do this? It seems to me that
>     I am copying a lot of unnecessary data, namely all the components
>     relative to the GID of xEtxMap that are also in overlappedXExtMap.
>     b. how does the importer Work? Does he really copy all the elements?
>     Thanks in advance.
>     Daniele
> ------------------------------------------------------------------------
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users

Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov

Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://software.sandia.gov/mailman/private/trilinos-users/attachments/20081128/df4b7154/attachment-0001.html 

More information about the Trilinos-Users mailing list