[Trilinos-Users] Tpetra+Anasazi performance

Thu May 15 13:04:18 MDT 2014

On 5/15/14, 12:00 PM, "trilinos-users-request at software.sandia.gov"
<trilinos-users-request at software.sandia.gov> wrote:
>>David Hysom wrote:
>>> Hi,
>>>
>>> Please see the attached for an example of what we're seeing for
>>> strong scaling for LOBPCG+Tpetra, in a shared memory environment.
>>> Bottom line is, this example shows a speedup of 3.0
>>> We've run many problems with varying parameters/matrices, and typically
>>> only see speedups between 2.0 and 3.0
>>>
>>> Is this expected? Is there anything wrt upcoming trilinos development
>>> that might increase scalability?
>>>
>>> Stats are for trilinos-11.6.1
>>>
>>> A second question: we've tested with OpenMP, Pthreads, and TBB.
>>> We always find that OpenMP gives the best results (shortest execution
>>> time), although Pthreads and TBB are reasonably close. Do you know
>>> of circumstances (not limited to Anasazi) where Pthreads or TBB
>>> outperform OpenMP?

Would you mind sharing your use case (matrix, right-hand side, and Anasazi
settings) with me?  We¹re actively working on Tpetra to improve
performance with respect to threads.  Sparse mat-vec and vector operations
should be reasonably fast with threads, and generally do the right thing
with respect to NUMA.  I would recommend using the Pthreads or OpenMP
back-ends for now. 

As Mike Heroux mentioned, we are currently working on thread performance
improvements in Tpetra.  I would recommend using the latest release of
Trilinos, or even the current development version (available via our
public git repository), for best threads performance.

Thanks!
mfh