[Trilinos-Users] Why isn't BLAS used more in Epetra?

Heroux, Michael A maherou at sandia.gov
Mon Sep 21 12:12:12 EDT 2015


Sven,

Thanks for your comments.  When Epetra was originally developed, the
performance of level-1 BLAS was not much faster than raw code, and
sometimes slower, especially for small to medium sized problem where
function call overhead had an impact.  Also, since Update had broader
functionality, the use of daxpy would have complicated the implementation
without substantial performance improvement.

Epetra does use BLAS extensively for level-3 operation, primarily found in
the Multiply method, which is used for block Krylov solvers.  Linking with
a good BLAS implementation has a dramatic performance improvement for
these operations.

Your data point of the Update vs daxpy performance is intriguing.  Thanks
for letting us know.  Is this from the MKL version of daxpy?

Regarding Tpetra, all of these operations are handled by the Kokkos
package, which has custom backends for multicore, manycore and GPUs.
These operations have regularly shown excellent and portable performance
via Kokkos on all these platforms.  So for Tpetra users there should not
be the same kind of issue.

If you have had issues with threaded Epetra, please let us know.  We will
fix these errors.

Thanks again.

Mike

On 9/21/15, 3:10 AM, "Trilinos-Users on behalf of Sven Baars"
<trilinos-users-bounces at trilinos.org on behalf of s.baars at rug.nl> wrote:

>Hey everyone,
>
>I was wondering why BLAS isn't used more often. It seems to me that
>you'd want to use this as much as possible. For instance in the Update
>method of the Epetra_MultiVector. I attached an example where I test its
>performance. Here is my output on my local machine:
>
>$ ./test
>Time with Update 1.35864
>Time with daxpy_ 1.05639
>
>and it's even threaded:
>
>$ OMP_NUM_THREADS=4 ./test
>Time with Update 1.38495
>Time with daxpy_ 0.66636
>
>Note that I don't compile Epetra with OpenMP support, because for me
>it's bugged in some places. But I can't imagine that the implementation
>in Epetra is better than the BLAS one. So why isn't BLAS used more often?
>
>Cheers,
>Sven
>
>P.S. I know Teuchos has BLAS wrappers, but I just wanted to make sure I
>was actually using BLAS, and that Tpetra probably does this better, but
>Tpetra's API is still too unstable for me.



More information about the Trilinos-Users mailing list