[Trilinos-Users] Tpetra MultiGPU

Олег Рябков oleg.ryabkov.87 at gmail.com
Mon Sep 19 11:15:00 MDT 2011

Yes, i understand that data communication is long operation and it
must be performed at each iteration. But it seems to me that current
implementation copies all data from local vector to CPU just to
transmit some small part of it to other devices and maybe performance
can be improved if we would copy just what we really need. That was my

19 сентября 2011 г. 18:31 пользователь Baker, Christopher G.
<bakercg at ornl.gov> написал:
> As Mark noted, the import/export time reflects the sparsity structure of
> the associated matrix. There are techniques that we are implementing to
> minimize some of this, but it is the nature of iterative linear solvers
> that the vectors being updated are the ones by which the matrix is
> multiplied. As a result, it is necessary to move data on and off the
> device. Our current support for GPUs is not optimal in this regard; I am
> in the process of refactoring this code. However, the ultimate solution
> will be
> a) algorithms that don't exhibit this communication pattern
> b) hardware configuration that doesn't require data movement
> Chris
> On 9/18/11 10:56 PM, "Hoemmen, Mark" <mhoemme at sandia.gov> wrote:
>>We are always working on improving the performance of Tpetra, so we
>>appreciate your observations.  It would be great if you would be willing
>>to share your benchmarks that reveal the performance issues you observed.
>>Regarding your observations about Import/Export, are you interested
>>mainly in the sparse matrix-vector multiply kernel?

Thanks for answers!

More information about the Trilinos-Users mailing list