[Trilinos-Users] Tpetra MultiGPU
oleg.ryabkov.87 at gmail.com
Sat Sep 17 20:50:52 MDT 2011
I was testing gpu and multi gpu capabilities of Tpetra (with Belos
solvers) and noticed, that MultiGPU variant is always much slower then
alone GPU variant.
I investigated the code and discovered that Import/Export classes
simply create "views" using viewBufferNonConst method of kernel (which
is ThrustGPUNode in this case)
which means that all vectors data is copied between GPU and CPU
(however, it is obvious that in many cases of sparse matrices just
small "extra" parts should be transmitted).
Do you see any solution without changing interface of "Node"? Maybe,
launch some additional kernels (parallel_for<>??) to copy elements
really needed to temporal buffer (GPU<=>GPU) and then create its
(it is not just curiosity; i really liked trilinos design and wonder
if i can use it in my future projects).
More information about the Trilinos-Users