[Trilinos-Users] slow mv-product with FECrsMatrix

Fri Jan 21 19:21:57 MST 2011

Hi all,

I just performed some simple timings for one matrix-vector product with 
an Epetra_FECrsMatrix, distributed over 48 cores of a shared-memory 
machine. After the matrix construction, keoMatrix.GlobalAssemble() is 
called to optimize the storage.
RangeMap and DomainMap are (about) show that rows and columns are about 
evenly spread over the cores, and when performing the actual mv-product,

    M->Apply( *epetra_x, *epetra_b );

epetra_x has the DomainMap and epetra_b has the RangeMap of M.

I expected that the process would take approximately evenly long on each 
for each of the processes, so I was surprised to see

======================================================================================
                                  TimeMonitor Results

Timer Name                      Min over procs    Avg over procs    Max 
over procs
--------------------------------------------------------------------------------------
Matrix-vector multiplication    0.009653 (1)   0.01869 (1)   0.03121 (1)
======================================================================================

There are cases where T_max/T_min > 5, too.

This of course destroys the parallel efficiency of the mv-products.

Any hint on what may possibly cause this?

Cheers,
Nico