[Trilinos-Users] slow mv-product with FECrsMatrix

Sun Jan 23 12:50:37 MST 2011

 > Are you using MPI for the parallelism (or OpenMP)?  I am assuming MPI.

That's right.

 > Can you do other MPI based computations with good speedup?

I ran some more tests for Epetra-routines with the same setup, and found 
that several other methods scale poorly as well. The attached plot shows 
that only Epetra_Vector::Multiply() performs better.

All the measurements take are with respect to "Max over procs". 
Typically, however, the times can vary a *lot* although the data is 
roughly evenly distributed.

======================================================================================

                                  TimeMonitor Results

Timer Name    Min over procs    Avg over procs    Max over procs
--------------------------------------------------------------------------------------
2-norm        0.0001609 (1)     0.0006352 (1)     0.0013 (1)
Matrix-vector 0.009701 (1)      0.02007 (1)       0.0399 (1)
element-wise  0.000134 (1)      0.0004415 (1)     0.0007088 (1)
inner product 0.001586 (1)      0.001593 (1)      0.001597 (1)
======================================================================================

This could very well be related to the hardware rather than Epetra 
itself. I'll try and run the code on a larger distributed memory machine 
today to get some figures out.

Cheers,
Nico

On 01/22/2011 03:33 AM, Heroux, Michael A wrote:
> Nico,
>
> Are you using MPI for the parallelism (or OpenMP)?  I am assuming MPI.
>
> There are known issue with MPI mapping to multicore nodes.  I don't know if
> any of these are issues for you.
>
> Can you do other MPI based computations with good speedup?  If so, then it
> might be something Epetra-specific.  Is this something you can confirm?
>
> Thanks.
>
> Mike
>
>
> On 1/21/11 8:21 PM, "Nico Schlömer"<nico.schloemer at ua.ac.be>  wrote:
>
>> Hi all,
>>
>> I just performed some simple timings for one matrix-vector product with
>> an Epetra_FECrsMatrix, distributed over 48 cores of a shared-memory
>> machine. After the matrix construction, keoMatrix.GlobalAssemble() is
>> called to optimize the storage.
>> RangeMap and DomainMap are (about) show that rows and columns are about
>> evenly spread over the cores, and when performing the actual mv-product,
>>
>>      M->Apply( *epetra_x, *epetra_b );
>>
>> epetra_x has the DomainMap and epetra_b has the RangeMap of M.
>>
>> I expected that the process would take approximately evenly long on each
>> for each of the processes, so I was surprised to see
>>
>> ==============================================================================
>> ========
>>                                    TimeMonitor Results
>>
>> Timer Name                      Min over procs    Avg over procs    Max
>> over procs
>> ------------------------------------------------------------------------------
>> --------
>> Matrix-vector multiplication    0.009653 (1)   0.01869 (1)   0.03121 (1)
>> ==============================================================================
>> ========
>>
>> There are cases where T_max/T_min>  5, too.
>>
>> This of course destroys the parallel efficiency of the mv-products.
>>
>> Any hint on what may possibly cause this?
>>
>> Cheers,
>> Nico
>>
>>
>> _______________________________________________
>> Trilinos-Users mailing list
>> Trilinos-Users at software.sandia.gov
>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: speedup.pdf
Type: application/pdf
Size: 18778 bytes
Desc: not available
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20110123/401b2f1a/attachment-0001.pdf