[Trilinos-Users] [EXTERNAL] 2x2 blocks: Epetra_CrsMatrix vs. Epetra_VbrMatrix vs. Epetra_*Matrix: fill time, scalability?

Nico Schlömer nico.schloemer at gmail.com
Mon Dec 12 19:14:48 MST 2011


> Regarding Epetra_VbrMatrix (or Epetra_FEVbrMatrix), performance for the> 2x2 case was not good the last time I used it (which was some time ago).>> If you have a simple test driver, I am interested in looking at this issue.

Well, I compiled a simple driver (attached) that constructs a 1e5x1e5
tridiagonal block matrix with random values and then performs 10
matrix-vector multiplications.
I timed this on 1024 cores on NERSC's hopper for block sizes from 10
to 100, and it appears that Vbr_Matrix yields a better performance
only starting from block sizes around 60 (see attached PDF for
timings). Maybe the BLAS implementation isn't particularly good, so a
run on another machine could be helpful.
Anyways, 60 seems like a pretty large number, and I'm certainly not
tempted anymore to use Vbr_Matrices for 2x2 blocks.

Cheers,
Nico






On Thu, Dec 8, 2011 at 9:22 PM, Heroux, Michael A <maherou at sandia.gov> wrote:
> Hi Nico,
>
> I have sometimes used standard map containers to facilitate this process,
> using a few containers to catch the incoming entries, then passing them
> into the matrix as a larger batch.
>
> Regarding Epetra_VbrMatrix (or Epetra_FEVbrMatrix), performance for the
> 2x2 case was not good the last time I used it (which was some time ago).
>
> If you have a simple test driver, I am interested in looking at this issue.
>
> Thanks.
>
> Mike
>
> On 12/8/11 12:31 PM, "Nico Schlömer" <nico.schloemer at gmail.com> wrote:
>
>>Hi all,
>>
>>I have this linear problem involving a 2x2-block matrix which I fill
>>and then solve. Luckily, I have a good preconditioner for the problem
>>which results in the matrix construction taking 50% of the runtime.
>>I'd like to improve this, tried this and that, but I'm running out of
>>ideas.
>>
>>Basically, I loop over a set of FEM elements and insert for each edge
>>of the element a 4x4 submatrix, using
>>
>>// fill v
>>TEUCHOS_ASSERT_EQUALITY( 0, myMatrix->SumIntoMyValues(
>>localRowIndices[0], 4, v, localColIndices ) );
>>// fill v
>>TEUCHOS_ASSERT_EQUALITY( 0, myMatrix->SumIntoMyValues(
>>localRowIndices[1], 4, v, localColIndices ) );
>>// fill v
>>TEUCHOS_ASSERT_EQUALITY( 0, myMatrix->SumIntoMyValues(
>>localRowIndices[2], 4, v, localColIndices ) );
>>// fill v
>>TEUCHOS_ASSERT_EQUALITY( 0, myMatrix->SumIntoMyValues(
>>localRowIndices[3], 4, v, localColIndices ) );
>>
>>I've tried FECrsMatrix where those reduce to one single call, but I've
>>the four calls to be somewhat faster.
>>
>>The matrix contains 2x2 blocks, so local{Row,Col}Indices always
>>contain subsequent indices (as in [24,25,56,57]). I thought about
>>using Epetra_VbrMatrix which would save me some tinkering with maps
>>and I imagine it would also help to reduce the load of the matrix
>>construction by filling in 2x2 blocks at once.
>>If I remember correctly, there could be performance (scalability?)
>>issues with Vbr_Matrices and small block sizes. Is that still true at
>>all?
>>
>>Well, I'd be happy for any other suggestion.
>>
>>Cheers,
>>Nico
>>
>>_______________________________________________
>>Trilinos-Users mailing list
>>Trilinos-Users at software.sandia.gov
>>http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
-------------- next part --------------
// Workaround for icpc's error "Include mpi.h before stdio.h"
#include <Teuchos_config.h>
#ifdef HAVE_MPI
    #include <mpi.h>
#endif

#include <Teuchos_Assert.hpp>
#include <Teuchos_RCP.hpp>
#include <Teuchos_Time.hpp>
#include <Teuchos_TimeMonitor.hpp>
#include <Teuchos_VerboseObject.hpp>
#include <Teuchos_StandardCatchMacros.hpp>
#include <Teuchos_CommandLineProcessor.hpp>
#include <Epetra_VbrMatrix.h>
#include <Epetra_CrsMatrix.h>
#include <Epetra_Vector.h>

#ifdef HAVE_MPI
#include <Epetra_MpiComm.h>
#else
#include <Epetra_SerialComm.h>
#endif

// =============================================================================
Teuchos::RCP<const Epetra_VbrMatrix>
buildBlockMatrix( Epetra_Comm & eComm, int blockSize, int n )
{
    Epetra_BlockMap map( n, blockSize, 0, eComm );

    // Create diagonal block matrix.
    int maxEntriesPerRow = 3;
    Teuchos::RCP<Epetra_VbrMatrix> B =
        Teuchos::rcp( new Epetra_VbrMatrix( Copy, map, maxEntriesPerRow ) );

    Epetra_SerialDenseMatrix s(blockSize,blockSize);
    for ( int k=0; k<B->RowMap().NumMyElements(); k++ )
    {
        int kGlobal = B->RowMap().GID( k );
        TEUCHOS_ASSERT_INEQUALITY( kGlobal, >, -1 );

        if ( kGlobal>0 )
        {
            int kGlobalMinusOne = kGlobal-1;
            TEUCHOS_ASSERT_EQUALITY( 0, B->BeginInsertGlobalValues(kGlobal, 1, &kGlobalMinusOne) );
//             for ( int i=0; i<blockSize; i++ )
//                 for ( int j=0; j<blockSize; j++ )
//                     s(i,j) = -1.0;
            s.Random();
            TEUCHOS_ASSERT_EQUALITY( 0, B->SubmitBlockEntry(s) );
            TEUCHOS_ASSERT_EQUALITY( 0, B->EndSubmitEntries() );
        }

        // Can't use BeginInsertMyValues() as the matrix was constructed w/o
        // column map.
        TEUCHOS_ASSERT_EQUALITY( 0, B->BeginInsertGlobalValues(kGlobal, 1, &kGlobal) );
//         for ( int i=0; i<blockSize; i++ )
//             for ( int j=0; j<blockSize; j++ )
//                 s(i,j) = 2.0;
        s.Random();
        TEUCHOS_ASSERT_EQUALITY( 0, B->SubmitBlockEntry(s) );
        TEUCHOS_ASSERT_EQUALITY( 0, B->EndSubmitEntries() );

        if ( kGlobal<n-1 )
        {
            int kGlobalPlusOne = kGlobal+1;
            TEUCHOS_ASSERT_EQUALITY( 0, B->BeginInsertGlobalValues(kGlobal, 1, &kGlobalPlusOne) );
//             for ( int i=0; i<blockSize; i++ )
//                 for ( int j=0; j<blockSize; j++ )
//                     s(i,j) = -1.0;
            s.Random();
            TEUCHOS_ASSERT_EQUALITY( 0, B->SubmitBlockEntry(s) );
            TEUCHOS_ASSERT_EQUALITY( 0, B->EndSubmitEntries() );
        }

    }
    TEUCHOS_ASSERT_EQUALITY( 0, B->FillComplete() );

    return B;
}
// =============================================================================
// From packages/epetra/test/VbrMatrix/cxx_main.cpp
void
ConvertVbrToCrs( const Teuchos::RCP<const Epetra_VbrMatrix> & VbrIn,
                 Teuchos::RCP<Epetra_CrsMatrix>             & CrsOut
               )
{
  const Epetra_Map &E_Vbr_RowMap = VbrIn->RowMatrixRowMap() ;
  const Epetra_Map &E_Vbr_ColMap = VbrIn->RowMatrixColMap() ;
  //    const Epetra_Map &E_Vbr_RowMap = VbrIn->OperatorRangeMap() ;
  //    const Epetra_Map &E_Vbr_ColMap = VbrIn->OperatorDomainMap() ;

    CrsOut = Teuchos::rcp( new Epetra_CrsMatrix( Copy, E_Vbr_RowMap, E_Vbr_ColMap, 0 ) );
    //  CrsOut = new Epetra_CrsMatrix( Copy, E_Vbr_RowMap, 0 );
    int NumMyElements = VbrIn->RowMatrixRowMap().NumMyElements() ;
    std::vector<int> MyGlobalElements( NumMyElements );
    VbrIn->RowMatrixRowMap().MyGlobalElements( &MyGlobalElements[0] ) ;

    int NumMyColumns = VbrIn->RowMatrixColMap().NumMyElements() ;
    std::vector<int> MyGlobalColumns( NumMyColumns );
    VbrIn->RowMatrixColMap().MyGlobalElements( &MyGlobalColumns[0] ) ;

    int MaxNumIndices = VbrIn->MaxNumEntries();
    int NumIndices;
    std::vector<int> LocalColumnIndices(MaxNumIndices);
    std::vector<int> GlobalColumnIndices(MaxNumIndices);
    std::vector<double> MatrixValues(MaxNumIndices);

    for( int LocalRow=0; LocalRow<NumMyElements; ++LocalRow )
    {

      TEUCHOS_ASSERT_EQUALITY( 0, VbrIn->ExtractMyRowCopy( LocalRow,
                                                           MaxNumIndices,
                                                           NumIndices,
                                                           &MatrixValues[0],
                                                           &LocalColumnIndices[0]
                                                         )
                             );

      for (int j = 0 ; j < NumIndices ; j++ )
          GlobalColumnIndices[j] = MyGlobalColumns[ LocalColumnIndices[j] ]  ;

#if 0
      if ( CrsOut->InsertGlobalValues( MyGlobalElements[LocalRow],
                                      NumIndices,
                                      &MatrixValues[0],
                                      &GlobalColumnIndices[0] )!=0)abort();
#else
      TEUCHOS_ASSERT_EQUALITY( 0, CrsOut->InsertMyValues( LocalRow,
                                                          NumIndices,
                                                          &MatrixValues[0],
                                                          &LocalColumnIndices[0] )
                             );
#endif


    }
    TEUCHOS_ASSERT_EQUALITY( 0, CrsOut->FillComplete() );

    return;
}
// =============================================================================
int main ( int argc, char *argv[] )
{
  // Initialize MPI
#ifdef HAVE_MPI
  MPI_Init( &argc, &argv );
#endif

    // Create a communicator for Epetra objects
#ifdef HAVE_MPI
    Epetra_MpiComm eComm( MPI_COMM_WORLD );
#else
    Epetra_SerialComm eComm();
#endif

    const Teuchos::RCP<Teuchos::FancyOStream> out =
        Teuchos::VerboseObjectBase::getDefaultOStream();

    bool success = true;
    try {
      // ---------------------------------------------------------------------
      // handle command line arguments
      Teuchos::CommandLineProcessor My_CLP;

      My_CLP.setDocString (
              "Timer for Epetra_CrsMatrix::Apply() and Epetra_VbrMatrix::Apply().\n"
              );

      int blockSize = 20;
      My_CLP.setOption ( "blocksize",
                         &blockSize,
                         "block size",
                         true );

      // print warning for unrecognized arguments
      My_CLP.recogniseAllOptions ( true );

      // do throw exceptions
      My_CLP.throwExceptions ( true );

      // finally, parse the command line
      Teuchos::CommandLineProcessor::EParseCommandLineReturn parseReturn;
      My_CLP.parse ( argc, argv );
      // ---------------------------------------------------------------------

      int n = 1000;
      int reps = 10;

      // Create a tridiagonal block matrix.
      Teuchos::RCP<const Epetra_VbrMatrix> B = buildBlockMatrix( eComm, blockSize, n );

      // Create the same matrix as CrsMatrix.
      Teuchos::RCP<Epetra_CrsMatrix> C;
      ConvertVbrToCrs( B, C );

      // Time Epetra_VbrMatrix::Apply().
      Teuchos::RCP<Teuchos::Time> bTime =
          Teuchos::TimeMonitor::getNewTimer("Epetra_VbrMatrix::Apply()" );
      Epetra_Vector xB( B->DomainMap() );
      xB.Random();
      Epetra_Vector yB( B->RangeMap() );
      {
      Teuchos::TimeMonitor tm(*bTime);
      for ( int k=0; k<reps; k++ )
          B->Apply( xB, yB );
      }

      // Time Epetra_CrsMatrix::Apply().
      Teuchos::RCP<Teuchos::Time> cTime =
          Teuchos::TimeMonitor::getNewTimer("Epetra_CrsMatrix::Apply()" );
      Epetra_Vector xC( C->DomainMap() );
      xC.Random();
      Epetra_Vector yC( C->RangeMap() );
      {
      Teuchos::TimeMonitor tm(*cTime);
      for ( int k=0; k<reps; k++ )
          C->Apply( xC, yC );
      }
    }
    TEUCHOS_STANDARD_CATCH_STATEMENTS(true, *out, success);

    // Print timing data.
    Teuchos::TimeMonitor::summarize();    

#ifdef HAVE_MPI
      MPI_Finalize();
#endif

    return success ? EXIT_SUCCESS : EXIT_FAILURE;
}
// =============================================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: runtime_apply.pdf
Type: application/pdf
Size: 16838 bytes
Desc: not available
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20111213/283edac0/attachment.pdf 


More information about the Trilinos-Users mailing list