[Trilinos-Users] Tpetra MultiGPU
Baker, Christopher G.
bakercg at ornl.gov
Mon Sep 19 11:25:51 MDT 2011
That is correct. Future versions will try to limit the amount copied. One
thing that must be investigated is whether it is better to perform some
number of transfers, totaling only the amount necessary, or a fewer number
of transfer that potentially copy more data. This is especially true in
the context of multivectors (the general case), where a single-copy
solution will necessitate copying almost the entire multivector.
On 9/19/11 1:15 PM, "Олег Рябков" <oleg.ryabkov.87 at gmail.com> wrote:
>Yes, i understand that data communication is long operation and it
>must be performed at each iteration. But it seems to me that current
>implementation copies all data from local vector to CPU just to
>transmit some small part of it to other devices and maybe performance
>can be improved if we would copy just what we really need. That was my
>19 сентября 2011 г. 18:31 пользователь Baker, Christopher G.
><bakercg at ornl.gov> написал:
>> As Mark noted, the import/export time reflects the sparsity structure of
>> the associated matrix. There are techniques that we are implementing to
>> minimize some of this, but it is the nature of iterative linear solvers
>> that the vectors being updated are the ones by which the matrix is
>> multiplied. As a result, it is necessary to move data on and off the
>> device. Our current support for GPUs is not optimal in this regard; I am
>> in the process of refactoring this code. However, the ultimate solution
>> will be
>> a) algorithms that don't exhibit this communication pattern
>> b) hardware configuration that doesn't require data movement
>> On 9/18/11 10:56 PM, "Hoemmen, Mark" <mhoemme at sandia.gov> wrote:
>>>We are always working on improving the performance of Tpetra, so we
>>>appreciate your observations. It would be great if you would be willing
>>>to share your benchmarks that reveal the performance issues you
>>>Regarding your observations about Import/Export, are you interested
>>>mainly in the sparse matrix-vector multiply kernel?
>Thanks for answers!
More information about the Trilinos-Users