[Trilinos-Users] Tpetra MultiGPU

Олег Рябков oleg.ryabkov.87 at gmail.com
Sat Sep 17 20:50:52 MDT 2011


  Hello, everyone!
I was testing gpu and multi gpu capabilities of Tpetra (with Belos
solvers) and noticed, that MultiGPU variant is always much slower then
alone GPU variant.
I investigated the code and discovered that Import/Export classes
simply create "views" using viewBufferNonConst method of kernel (which
is ThrustGPUNode in this case)
which means that all vectors data is copied between GPU and CPU
(however, it is obvious that in many cases of sparse matrices just
small "extra" parts should be transmitted).
Do you see any solution without changing interface of "Node"? Maybe,
launch some additional kernels (parallel_for<>??) to copy elements
really needed to temporal buffer (GPU<=>GPU) and then create its
view?..
(it is not just curiosity; i really liked trilinos design and wonder
if i can use it in my future projects).

Thanks,
 Oleg



More information about the Trilinos-Users mailing list