As Mark noted, the import/export time reflects the sparsity structure of
the associated matrix. There are techniques that we are implementing to
minimize some of this, but it is the nature of iterative linear solvers
that the vectors being updated are the ones by which the matrix is
multiplied. As a result, it is necessary to move data on and off the
device. Our current support for GPUs is not optimal in this regard; I am
in the process of refactoring this code. However, the ultimate solution
will be
a) algorithms that don't exhibit this communication pattern
b) hardware configuration that doesn't require data movement


On 9/18/11 10:56 PM, "Hoemmen, Mark" <mhoemme at sandia.gov> wrote:

>We are always working on improving the performance of Tpetra, so we
>appreciate your observations.  It would be great if you would be willing
>to share your benchmarks that reveal the performance issues you observed.
>Regarding your observations about Import/Export, are you interested
>mainly in the sparse matrix-vector multiply kernel?

