[Trilinos-Users] Cache performance of Trilinos MatVec

Mon Aug 3 23:12:21 MDT 2009

James,

Please send a bit more detail, perhaps off-list, and I can look at it.  I have been working with OSKI myself lately.

CMake support of additional features is growing but not complete.  If we don't get them all in right away, there is always the manual definition of CXXFLAGS, etc. to help us in the mean time.

Thanks.

Mike

On 8/3/09 6:07 PM, "James C. Sutherland" <James.Sutherland at utah.edu> wrote:

Mike et al,

I have tried building trilinos with OSKI.  It appears that the OSKI examples are not built (at least in 9.0.2 which I am using).  Do you have local builds or regression tests that are functional with OSKI?  Is there a way of building the examples through the trilinos build system?

I am having runtime errors when trying to use OSKI matrices rather than CRS matrices in my application.  I am trying to discern whether this is a problem in my usage of Epetra_OskiMatrix or if there is a problem in the trilinos interface to OSKI.

FYI, it looks like the new CMake build system is even less aware of OSKI.

James

---
James C. Sutherland
Assistant Professor, Chemical Engineering
The University of Utah
50 S. Central Campus Dr, 3290 MEB
Salt Lake City, UT 84112-9203
(801) 585-1246
http://www.che.utah.edu/~sutherland <http://www.che.utah.edu/~sutherland>

On Jun 10, 2009, at 8:16 AM, Heroux, Michael A wrote:

James,

 As Alan mentioned, sparse MV is notorious for poor cache performance.  Some of the best work in addressing this issue has been done by the BeBOP project at UC-Berkeley in the OSKI library.  Epetra can use OSKI for sparse operations via Epetra_Oski* classes.  These classes rely on the OSKI library (which you download and build yourself).  The following tech report describes the interface and performance results:

 http://trilinos.sandia.gov/packages/epetra/IanKarlin.pdf

 You might also consider the Epetra_JadMatrix class, which can work well on the latest microprocessors that support streaming well.  Epetra_JadMatrix can be especially useful for very sparse matrices (the 3-4 nonzeros per row collections you have).  It is a just a few line change to your code to try these options.

 Mike

 On 6/10/09 9:01 AM, "Alan Williams" <william at sandia.gov> wrote:

 James,
 For matrix-vector product y = A*x, the core of the sparse matvec is a statement like this:
   y[i] += Acoefs[j]*x[Acols[j]]

 So depending on how the column-indices of A are ordered, lots of cache misses in the x vector could occur.
 A matrix reordering may help in some cases.

 Alan

 > -----Original Message-----
 > From: trilinos-users-bounces at software.sandia.gov
 > [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf
 > Of James C. Sutherland
 > Sent: Tuesday, June 09, 2009 5:23 PM
 > To: trilinos-users at software.sandia.gov
 > Subject: [Trilinos-Users] Cache performance of Trilinos MatVec
 >
 > Does anyone know if there has been a study of, or effort to
 > optimize,
 > cache performance of MatVec operations in Trilinos?
 >
 > Specifically, I am finding that epetra_dcrsmv (sparse matvec) has
 > extremely bad cache performance (lots of cache misses) on an intel
 > chipset that I have.  This seems to be problematic for a range of
 > matrix sizes.  I have very sparse matrices (3-10 nonzero entries per
 > row), and these can range in size from O(10^2-10^10^5) rows.
 >
 > Any thoughts?
 >
 > James
 >
 > _______________________________________________
 > Trilinos-Users mailing list
 > Trilinos-Users at software.sandia.gov
 > http://software.sandia.gov/mailman/listinfo/trilinos-users
 >
 _______________________________________________
 Trilinos-Users mailing list
 Trilinos-Users at software.sandia.gov
 http://software.sandia.gov/mailman/listinfo/trilinos-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://software.sandia.gov/pipermail/trilinos-users/attachments/20090803/55a2baa5/attachment.html