[Trilinos-Users] [EXTERNAL] GPU Support for Sparse Direct Solvers KLU2 / Basker

Rajamanickam, Sivasankaran srajama at sandia.gov
Fri Jun 7 22:16:40 EDT 2019


Daniel

  a) Kokkos based does not automatically translate to GPU runs. We have not tried Basker on GPUs and don't expect it do well on GPUs as it is written now. KLU2 doesn't use Kokkos. It is just a templated version of KLU.


  b) Tacho, a sparse Cholesky solver is based on Kokkos tasking and can be run on GPUs. However, tasking on GPUs is a hard problem and the performance in Trilinos master is not that good. I have copied Kyungjoo Kim and David Hollman who are working on improving Kokkos tasking on GPUs and Tacho performance in GPUs. This is coming to Trilinos very soon (with the next Kokkos update). They can add more details.


  That said sparse direct solvers on GPUs are quite hard. What is your primary use case for requiring this ?


Thanks

Siva?



________________________________
From: Trilinos-Users <trilinos-users-bounces at trilinos.org> on behalf of Weber, Daniel <daniel.weber at igd.fraunhofer.de>
Sent: Friday, June 7, 2019 8:18 AM
To: trilinos-users at trilinos.org
Subject: [EXTERNAL] [Trilinos-Users] GPU Support for Sparse Direct Solvers KLU2 / Basker

Hi,

I'm currently trying to identify, if there a sparse direct solvers available within the Trilinos project. What I think I understood by studying documentation, tutorials, etc. :

-          there are abstractions for general sparse linear solvers (Stratimkos) and respective specializations for iterative (Belos) and direct solver (Amesos2)

-          Within Amesos2 external solvers can be used (e.g. SuperLU) or one of the two internal solvers KLU2 and Basker

-          KLU2 and Basker rely on Kokkos, which abstracts from the programming model (OpenMP, CUDA, etc.)

>From this information I conclude that theoretically KLU2 or Basker might be configured / compiled with GPU acceleration (due to the Kokkos abstraction). However, I haven't found any indications if a) this statement is true b) if it makes sense (e.g. the memory access pattern might result in poor performance) or c) how to set it up.

I really appreciate any kind of information, i.e. simple yes / no answers to a) and b), detailed answers or pointers to slides, documentation, videos etc. for a) -c).

Thank you, best regards,
Daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20190608/862e7c6c/attachment.html>


More information about the Trilinos-Users mailing list