[Trilinos-Users] Speedup for solving linear systems

Chris Jackson chris.jackson at mayahtt.com
Thu Apr 3 19:13:50 MDT 2014

Hi Mike,

Thanks for the response.  My post was related to Lucie's post - we are working together.

Lucie figured out that the Trilinos configuration being used for the tests had OMP_NUM_THREADS set to 8.  When we were launching multiple processes on a quadcore, this bogged things down.  It was an oversight.  Setting OMP_NUM_THREADS to 1 gives us the kind of scalability we were expecting.

Thanks again,


From: Heroux, Michael A [mailto:maherou at sandia.gov]
Sent: Monday, March 31, 2014 4:28 PM
To: Chris Jackson; trilinos-users at software.sandia.gov
Subject: Re: [Trilinos-Users] Speedup for solving linear systems


Here is one older paper that is in the range of processors you requested (up to 4096):


There is a journal version of this paper:

Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling
P.T. Lin and J. N. Shadid
Journal of Computational Physics
Volume 228, Issue 17, 20 September 2009, Pages 6250-6267

In general, the AztecOO-GMRES+ML combination is very scalable and robust.

I don't have specific results for BiCGStab.  We are often dealing with very challenging linear systems for which GMRES is the only reliable method.  The AztecOO implementation of BiCGStab is very scalable, so you should see no qualitative difference between GMRES and BiCGStab.


From: "-Ing. Chris Jackson" <chris.jackson at mayahtt.com<mailto:chris.jackson at mayahtt.com>>
Date: Thursday, March 27, 2014 4:56 PM
To: "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>" <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Subject: [EXTERNAL] [Trilinos-Users] Speedup for solving linear systems


At the risk of asking a very open-ended question:

Are there any publications showing parallel speedup of solving sparse, non-symmetric linear systems using AztecOO, with BiCGStab or GMRES and ML as a preconditioner?    We are interested in problem sizes in the range of 500,000-20,000,000 unknowns on say 8-128 processes.  Are there any guidelines as to what kind of parallel efficiencies can be obtained?  We have noted some very poor efficiencies for a large range of problems.  (We suspect we are doing something wrong, and would like an example of how to use the package correctly.)


Chris Jackson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20140404/bb6e5b1b/attachment.html>

More information about the Trilinos-Users mailing list