[Trilinos-Users] slow mv-product with FECrsMatrix

Mon Jan 24 18:21:12 MST 2011

 > But the most important factor in the load-balancing of the mat-vec,
 > is the distribution of the nonzeros of your matrix. If the nonzeros
 > are evenly distributed, then the times should be even. I would be
 > interested in seeing the 'min over procs' and 'max over procs' for
 > number-of-nonzeros in your matrix.

Thanks Alan for your suggestions.

I had the code run on an a different machine, MPI with distributed 
memory, and still get similar timings.
I checked the number of nonzeros per process and indeed the figures are 
all similar.

Here's an output of the timings on 16 cores.

========================== *snip* ==========================
# 16 processes
Restart Index set, reading solution time step: 1

Process  0 has 263936 nonzeros.
Process  1 has 269796 nonzeros.
Process  2 has 266168 nonzeros.
Process  3 has 271452 nonzeros.
Process  4 has 269432 nonzeros.
Process  5 has 270224 nonzeros.
Process  6 has 267388 nonzeros.
Process  7 has 272420 nonzeros.
Process  8 has 268696 nonzeros.
Process  9 has 270256 nonzeros.
Process 10 has 267744 nonzeros.
Process 11 has 273908 nonzeros.
Process 12 has 268968 nonzeros.
Process 13 has 269748 nonzeros.
Process 14 has 272596 nonzeros.
Process 15 has 274908 nonzeros.

======================================================================================

                                  TimeMonitor Results

Timer Name                      Min over procs    Avg over procs    Max 
over procs
--------------------------------------------------------------------------------------
2-norm calcula  3.481e-05 (1)     0.000645 (1)      0.001911 (1)
Data I/O        1.128 (1)         1.635 (1)         1.874 (1)
FVM entities    0.2606 (1)        0.5043 (1)        1.022 (1)
Graph constr    0.1538 (1)        0.1563 (1)        0.1594 (1)
MVP construc    0.01956 (1)       0.02698 (1)       0.03761 (1)
Matrix cons     0.1964 (1)        0.1987 (1)        0.2011 (1)
Matrix-vector   0.003169 (1)      0.006034 (1)      0.008662 (1)
element-wise    0.0006652 (1)     0.002725 (1)      0.009558 (1)
inner product   0.04846 (1)       0.04847 (1)       0.04848 (1)
======================================================================================
========================== *snap* ==========================

For the matrix-vector product, the slowest process takes more than twice 
as much time as the fastest one.

Are there other factors to pay attention to?

--Nico

On 01/24/2011 04:28 PM, Williams, Alan B wrote:
>> I just performed some simple timings for one matrix-vector product with
>> an Epetra_FECrsMatrix, distributed over 48 cores of a shared-memory
>> machine. After the matrix construction, keoMatrix.GlobalAssemble() is
>> called to optimize the storage.
>
> GlobalAssemble *probably* optimizes storage, but it depends on your usage. GlobalAssemble optionally calls FillComplete, which optionally optimizes storage...
>
>
>> RangeMap and DomainMap are (about) show that rows and columns are about
>> evenly spread over the cores, and when performing the actual mv-
>> product,
>>
>>      M->Apply( *epetra_x, *epetra_b );
>>
>> epetra_x has the DomainMap and epetra_b has the RangeMap of M.
>
> It's often difficult to figure out how the domain-map and range-map relates to the row-map and column-map, etc.
> But the most important factor in the load-balancing of the mat-vec, is the distribution of the nonzeros of your matrix. If the nonzeros are evenly distributed, then the times should be even. I would be interested in seeing the 'min over procs' and 'max over procs' for number-of-nonzeros in your matrix.
>
> Alan
>
>