[Trilinos-Users] [EXTERNAL] 2x2 blocks: Epetra_CrsMatrix vs. Epetra_VbrMatrix vs. Epetra_*Matrix: fill time, scalability?

Mon Dec 12 19:37:58 MST 2011

Nico,

60 is huge.  With a properly organized kernel you Vbr should always beat
Crs.  The logic for small sizes is missing in Epetra_VbrMatrix.  However,
I am very surprised you have not see a cross-over at a much smaller size.
Part of it must be a poor BLAS implementation.  But I will look into it.

Mike

On 12/12/11 8:14 PM, "Nico Schlömer" <nico.schloemer at gmail.com> wrote:

>> Regarding Epetra_VbrMatrix (or Epetra_FEVbrMatrix), performance for
>>the> 2x2 case was not good the last time I used it (which was some time
>>ago).>> If you have a simple test driver, I am interested in looking at
>>this issue.
>
>Well, I compiled a simple driver (attached) that constructs a 1e5x1e5
>tridiagonal block matrix with random values and then performs 10
>matrix-vector multiplications.
>I timed this on 1024 cores on NERSC's hopper for block sizes from 10
>to 100, and it appears that Vbr_Matrix yields a better performance
>only starting from block sizes around 60 (see attached PDF for
>timings). Maybe the BLAS implementation isn't particularly good, so a
>run on another machine could be helpful.
>Anyways, 60 seems like a pretty large number, and I'm certainly not
>tempted anymore to use Vbr_Matrices for 2x2 blocks.
>
>Cheers,
>Nico
>
>
>
>
>
>
>On Thu, Dec 8, 2011 at 9:22 PM, Heroux, Michael A <maherou at sandia.gov>
>wrote:
>> Hi Nico,
>>
>> I have sometimes used standard map containers to facilitate this
>>process,
>> using a few containers to catch the incoming entries, then passing them
>> into the matrix as a larger batch.
>>
>> Regarding Epetra_VbrMatrix (or Epetra_FEVbrMatrix), performance for the
>> 2x2 case was not good the last time I used it (which was some time ago).
>>
>> If you have a simple test driver, I am interested in looking at this
>>issue.
>>
>> Thanks.
>>
>> Mike
>>
>> On 12/8/11 12:31 PM, "Nico Schlömer" <nico.schloemer at gmail.com> wrote:
>>
>>>Hi all,
>>>
>>>I have this linear problem involving a 2x2-block matrix which I fill
>>>and then solve. Luckily, I have a good preconditioner for the problem
>>>which results in the matrix construction taking 50% of the runtime.
>>>I'd like to improve this, tried this and that, but I'm running out of
>>>ideas.
>>>
>>>Basically, I loop over a set of FEM elements and insert for each edge
>>>of the element a 4x4 submatrix, using
>>>
>>>// fill v
>>>TEUCHOS_ASSERT_EQUALITY( 0, myMatrix->SumIntoMyValues(
>>>localRowIndices[0], 4, v, localColIndices ) );
>>>// fill v
>>>TEUCHOS_ASSERT_EQUALITY( 0, myMatrix->SumIntoMyValues(
>>>localRowIndices[1], 4, v, localColIndices ) );
>>>// fill v
>>>TEUCHOS_ASSERT_EQUALITY( 0, myMatrix->SumIntoMyValues(
>>>localRowIndices[2], 4, v, localColIndices ) );
>>>// fill v
>>>TEUCHOS_ASSERT_EQUALITY( 0, myMatrix->SumIntoMyValues(
>>>localRowIndices[3], 4, v, localColIndices ) );
>>>
>>>I've tried FECrsMatrix where those reduce to one single call, but I've
>>>the four calls to be somewhat faster.
>>>
>>>The matrix contains 2x2 blocks, so local{Row,Col}Indices always
>>>contain subsequent indices (as in [24,25,56,57]). I thought about
>>>using Epetra_VbrMatrix which would save me some tinkering with maps
>>>and I imagine it would also help to reduce the load of the matrix
>>>construction by filling in 2x2 blocks at once.
>>>If I remember correctly, there could be performance (scalability?)
>>>issues with Vbr_Matrices and small block sizes. Is that still true at
>>>all?
>>>
>>>Well, I'd be happy for any other suggestion.
>>>
>>>Cheers,
>>>Nico
>>>
>>>_______________________________________________
>>>Trilinos-Users mailing list
>>>Trilinos-Users at software.sandia.gov
>>>http://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>>