[Trilinos-Users] [EXTERNAL] Re: Sparse direct solver and language options for GPU

Siva Rajamanickam srajama at sandia.gov
Thu Sep 18 09:01:04 MDT 2014


Tom,
   The solver capabilities in Amesos2 *can* include GPU support in the future. 
For example, CHOLMOD supports GPUs already and there is an interface for CHOLMOD 
in Amesos2. For a problem like your direct solvers have been shown to be effective.

  Like Mike suggested, KLU will be good, but you really need machines with lots 
of memory for that. ShyLU has special options for circuit problems and can be 
called from Xyce. I am not sure what configuration of Xyce did you try these 
problems on, but Xyce is an MPI parallel code with options to use ShyLU for very 
large problems. I would recommend giving that a shot.

Thanks
Siva

On 09/18/2014 08:28 AM, Heroux, Mike wrote:
> Tom,
>
> You might try using a preconditioner with AztecOO.  For example, Ifpack with incomplete Cholesky would give you some idea about how responsive the problem is to preconditioning.  But if you avoid the GPU, direct solvers are probably the best way to go anyway.  KLU is very robust and designed for problems like yours.
>
> My results on EC2 are now fairly old, and we should go back and run the scalability tests again (but honestly I don’t have time right now).  EC2 is certainly competitive on single process performance and, since KLU is serial-only, you may be just fine on a big memory node.
>
> Mike
>
> From: Tom Anderson <tomacorp at gmail.com<mailto:tomacorp at gmail.com>>
> Date: Thursday, September 18, 2014 at 3:17 AM
> To: "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>" <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
> Subject: Re: [Trilinos-Users] Sparse direct solver and language options for GPU
>
>> I am a bit confused by the combination of well-conditioned and semi-definite in your description
> I probably didn't use the correct matrix terminology.
> My matrix is a Modified Nodal Analysis circuit solver matrix.
> The only elements I have are resistors, constant current sources, and constant voltage sources,
> so it really looks just like example 3 in http://www.swarthmore.edu/NatSci/echeeve1/Ref/mna/MNA2.html
> except larger.
>
> I used Anasazi.BasicEigenproblem to find the largest and smallest eigenvalues.
> The condition number is the ratio of the largest to the smallest eigenvalue.
> There were messages from AztecOO complaining about negative eigenvalues.
> "Warning : The smallest eigenvalue of the Lanczos matrix
>
>                  is negative or zero (-6.975319e-01)"
>
> My solution looks right, has energy balance, and agrees with Xyce for the circuit solution, so I think the input matrix is okay.
>
> I didn't find great results with the iterative solvers.
> Using PyTrilinos and AztecOO, the best result I found was with AZ_cg_condnum:
>      solver = AztecOO.AztecOO(A, x, b)
>      solver.SetAztecOption(AztecOO.AZ_solver, AztecOO.AZ_cg_condnum)
>      ierr = solver.Iterate(iterations, 1e-8)
> This iterative solution converges slowly, and more slowly with large problems.
> I can send a comparison of the direct and iterative solutions, if that would be interesting.
>
> This is the direct approach in PyTrilinos:
>      problem= Epetra.LinearProblem(A, x, b)
>      solver= Amesos.Klu(problem)
>      solver.SymbolicFactorization()
>      solver.NumericFactorization()
>      ierr = solver.Solve()
>
>> There are very few direct sparse solvers available on a GPU
> Since I was mistaken in my original question and Amesos2 doesn't use the GPU for direct sparse solutions,
> I think I should try something other than a GPU.
>
> Perhaps I can cram the problem into one big machine, or try MPI on smaller machines.
> It looks like there is an Amazon EC2 instance that might be sufficient to test the idea:
> r3.8xlarge 244GB
> Do the performance problems with EC2 noted here:
> http://www.csm.ornl.gov/workshops/SOS17/documents/HerouxDataOwnership.pdf
> apply to a single large instance running problems such as an Amesos sparse direct solver?
>
>> you could work in either real or complex arithmetic
> I haven't needed a complex matrix for this project yet, although it may become interesting in the future.
>
> Thanks for the advice, it is all helpful.
>
> -Tom Anderson
>
>
> On Wed, Sep 17, 2014 at 10:17 AM, Heroux, Mike <MHeroux at csbsju.edu<mailto:MHeroux at csbsju.edu>> wrote:
> Tom,
>
> With such a well-conditioned matrix, it might be the case that an iterative method would be more effective, especially if your target device is a GPU.  There are very few direct sparse solvers available on a GPU right now (AFAIK).  Even those that are available on the GPU tend to do problem setup on the host and only push out portions of the computation to the GPU.
>
> I am a bit confused by the combination of well-conditioned and semi-definite in your description, since the latter implies that some eigenvalues could be zero making the matrix singular.  Do you pre-process the matrix to filter out some equations?  Assuming you have a way to keep the condition number small, a diagonally scaled CG may be all you need.
>
> Also, you could work in either real or complex arithmetic.  The equivalent real formulation of a Hermitian matrix is real positive definite with the same set of unique eigenvalue (multiplicity increases by a factor of two), so CG will converge (assuming equivalent preconditioning) in the same number of iterations regardless of using the complex or real version of CG.
>
> I hope this helps.
>
> Mike
>
> From: Tom Anderson <tomacorp at gmail.com<mailto:tomacorp at gmail.com><mailto:tomacorp at gmail.com<mailto:tomacorp at gmail.com>>>
> Date: Wednesday, September 17, 2014 at 11:49 AM
> To: "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>" <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>>
> Subject: [Trilinos-Users] Sparse direct solver and language options for GPU
>
> Is C++ and Amesos2 the only GPU-capable sparse direct solver option using Trilinos?
> Have I missed any good alternatives?
>
> Here is a longer version of the same question with a description of the application:
>
> I am working on a thermal simulation problem, using a fine square mesh instead of a more complex course mesh.
>
> I think about the thermal problem as the solution to an equivalent electronic circuit.
>
> I have constructed a matrix using Modified Nodal Analysis as described in
> http://www.swarthmore.edu/NatSci/echeeve1/Ref/mna/MNA2.html .
> My PyTrilinos matrix code agrees with the Xyce circuit simulator.
> I could just use Xyce instead of writing my own matrix code,
> but Xyce gets slow on large problems.
> My equivalent circuit only needs voltage sources, current sources, and resistors.
> The solution I want is the DC operating point of the circuit.
>
> The matrix is real, Hermitian, positive semi-definite, if I understand matrix lingo correctly.
> The condition number is about 430 and I expect it to always be less than about 7000.
>
> I am aiming for 200e6 unknowns (circuit nodes) and 1.4e9 matrix entries (1.2e9 resistors).
> As of now, I can solve 1e6 nodes using Amesos/PyTrilinos on an old Mac mini, single threaded.
> Solve time is reasonable and linear in the number of unknowns, and memory usage is about 1kB per unknown.
> My existing code uses PyTrilinos and is 2D.  I plan to move to 2.5D (3D planar) soon.
> I expect to deploy the application on Linux.
>
> Next I would like to use GPU processing. I didn't find a way to use the GPU with PyTrilinos.
> I have been avoiding Fortran because of the install process on OSX.
> It seems that C++ and Amesos2 is the only GPU-capable sparse direct solver using Trilinos.
> Have I missed any good alternatives?
>
> Any advice about my project is welcome, including suggestions for different approaches, platforms, or software.
>
> Thanks,
>
> Tom Anderson
> tom_anderson at keysight.com<mailto:tom_anderson at keysight.com><mailto:tom_anderson at keysight.com<mailto:tom_anderson at keysight.com>>
> Keysight Technologies
> Santa Rosa, California
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> https://software.sandia.gov/mailman/listinfo/trilinos-users



More information about the Trilinos-Users mailing list