[Trilinos-Users] Sparse direct solver and language options for GPU

Thu Sep 18 12:50:03 MDT 2014

Tom,

>From the example you gave, the matrix looks like a “saddle point system”,

which is a natural matrix form in many problems.  The matrix will not

be semi-definite in this case but it will be indefinite.  Hence you saw the

warning about negative eigenvalue.  It will be useful to take advantage

of the structure of saddle point matrices (whether Trilinos is used or cholmod).

I’m no expert here but you might benefit from elimination of extra variables since

it looks like the constraints are very simple (v_b - v_a = V1 etc).  After elimination,

you could use a solver/preconditioner pair that works well with positive

definite matrices.  The other approach is to “solve” the m x m portion rather than

going for the full (m+n) x (m+n) matrix.  The right choice depends on the relative

sizes of m and n and the simplicity of the “constraint” block vs the simplicity

of the top-left m x m block.

Consult the following popular paper if you want to solve saddle point problems

by either of these approaches.

http://web.stanford.edu/class/msande312/restricted/benzi05saddle.pdf

Numerical solution of saddle point problems

Chetan

From: trilinos-users-bounces at software.sandia.gov [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of Tom Anderson
Sent: Thursday, September 18, 2014 1:18 AM
To: trilinos-users at software.sandia.gov
Subject: Re: [Trilinos-Users] Sparse direct solver and language options for GPU

> I am a bit confused by the combination of well-conditioned and semi-definite in your description

I probably didn't use the correct matrix terminology.

My matrix is a Modified Nodal Analysis circuit solver matrix.

The only elements I have are resistors, constant current sources, and constant voltage sources,

so it really looks just like example 3 in http://www.swarthmore.edu/NatSci/echeeve1/Ref/mna/MNA2.html

except larger.

I used Anasazi.BasicEigenproblem to find the largest and smallest eigenvalues.

The condition number is the ratio of the largest to the smallest eigenvalue.

There were messages from AztecOO complaining about negative eigenvalues.

"Warning : The smallest eigenvalue of the Lanczos matrix

                is negative or zero (-6.975319e-01)"

My solution looks right, has energy balance, and agrees with Xyce for the circuit solution, so I think the input matrix is okay.

I didn't find great results with the iterative solvers. 

Using PyTrilinos and AztecOO, the best result I found was with AZ_cg_condnum:

    solver = AztecOO.AztecOO(A, x, b)

    solver.SetAztecOption(AztecOO.AZ_solver, AztecOO.AZ_cg_condnum)

    ierr = solver.Iterate(iterations, 1e-8)

This iterative solution converges slowly, and more slowly with large problems.

I can send a comparison of the direct and iterative solutions, if that would be interesting.

This is the direct approach in PyTrilinos:

    problem= Epetra.LinearProblem(A, x, b)

    solver= Amesos.Klu(problem)

    solver.SymbolicFactorization()

    solver.NumericFactorization()

    ierr = solver.Solve()

> There are very few direct sparse solvers available on a GPU

Since I was mistaken in my original question and Amesos2 doesn't use the GPU for direct sparse solutions,

I think I should try something other than a GPU.

Perhaps I can cram the problem into one big machine, or try MPI on smaller machines.

It looks like there is an Amazon EC2 instance that might be sufficient to test the idea:

r3.8xlarge 244GB

Do the performance problems with EC2 noted here:

http://www.csm.ornl.gov/workshops/SOS17/documents/HerouxDataOwnership.pdf

apply to a single large instance running problems such as an Amesos sparse direct solver?

> you could work in either real or complex arithmetic

I haven't needed a complex matrix for this project yet, although it may become interesting in the future.

Thanks for the advice, it is all helpful.

-Tom Anderson

On Wed, Sep 17, 2014 at 10:17 AM, Heroux, Mike <MHeroux at csbsju.edu> wrote:

Tom,

With such a well-conditioned matrix, it might be the case that an iterative method would be more effective, especially if your target device is a GPU.  There are very few direct sparse solvers available on a GPU right now (AFAIK).  Even those that are available on the GPU tend to do problem setup on the host and only push out portions of the computation to the GPU.

I am a bit confused by the combination of well-conditioned and semi-definite in your description, since the latter implies that some eigenvalues could be zero making the matrix singular.  Do you pre-process the matrix to filter out some equations?  Assuming you have a way to keep the condition number small, a diagonally scaled CG may be all you need.

Also, you could work in either real or complex arithmetic.  The equivalent real formulation of a Hermitian matrix is real positive definite with the same set of unique eigenvalue (multiplicity increases by a factor of two), so CG will converge (assuming equivalent preconditioning) in the same number of iterations regardless of using the complex or real version of CG.

I hope this helps.

Mike

From: Tom Anderson <tomacorp at gmail.com<mailto:tomacorp at gmail.com>>
Date: Wednesday, September 17, 2014 at 11:49 AM
To: "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>" <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Subject: [Trilinos-Users] Sparse direct solver and language options for GPU

Is C++ and Amesos2 the only GPU-capable sparse direct solver option using Trilinos?
Have I missed any good alternatives?

Here is a longer version of the same question with a description of the application:

I am working on a thermal simulation problem, using a fine square mesh instead of a more complex course mesh.

I think about the thermal problem as the solution to an equivalent electronic circuit.

I have constructed a matrix using Modified Nodal Analysis as described in
http://www.swarthmore.edu/NatSci/echeeve1/Ref/mna/MNA2.html .
My PyTrilinos matrix code agrees with the Xyce circuit simulator.
I could just use Xyce instead of writing my own matrix code,
but Xyce gets slow on large problems.
My equivalent circuit only needs voltage sources, current sources, and resistors.
The solution I want is the DC operating point of the circuit.

The matrix is real, Hermitian, positive semi-definite, if I understand matrix lingo correctly.
The condition number is about 430 and I expect it to always be less than about 7000.

I am aiming for 200e6 unknowns (circuit nodes) and 1.4e9 matrix entries (1.2e9 resistors).
As of now, I can solve 1e6 nodes using Amesos/PyTrilinos on an old Mac mini, single threaded.
Solve time is reasonable and linear in the number of unknowns, and memory usage is about 1kB per unknown.
My existing code uses PyTrilinos and is 2D.  I plan to move to 2.5D (3D planar) soon.
I expect to deploy the application on Linux.

Next I would like to use GPU processing. I didn't find a way to use the GPU with PyTrilinos.
I have been avoiding Fortran because of the install process on OSX.
It seems that C++ and Amesos2 is the only GPU-capable sparse direct solver using Trilinos.
Have I missed any good alternatives?

Any advice about my project is welcome, including suggestions for different approaches, platforms, or software.

Thanks,

Tom Anderson

tom_anderson at keysight.com<mailto:tom_anderson at keysight.com>

Keysight Technologies
Santa Rosa, California

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20140918/341ab130/attachment.html>