[Trilinos-Users] slow mv-product with FECrsMatrix
Nico Schlömer
nico.schloemer at ua.ac.be
Sun Jan 23 12:50:37 MST 2011
> Are you using MPI for the parallelism (or OpenMP)? I am assuming MPI.
That's right.
> Can you do other MPI based computations with good speedup?
I ran some more tests for Epetra-routines with the same setup, and found
that several other methods scale poorly as well. The attached plot shows
that only Epetra_Vector::Multiply() performs better.
All the measurements take are with respect to "Max over procs".
Typically, however, the times can vary a *lot* although the data is
roughly evenly distributed.
======================================================================================
TimeMonitor Results
Timer Name Min over procs Avg over procs Max over procs
--------------------------------------------------------------------------------------
2-norm 0.0001609 (1) 0.0006352 (1) 0.0013 (1)
Matrix-vector 0.009701 (1) 0.02007 (1) 0.0399 (1)
element-wise 0.000134 (1) 0.0004415 (1) 0.0007088 (1)
inner product 0.001586 (1) 0.001593 (1) 0.001597 (1)
======================================================================================
This could very well be related to the hardware rather than Epetra
itself. I'll try and run the code on a larger distributed memory machine
today to get some figures out.
Cheers,
Nico
On 01/22/2011 03:33 AM, Heroux, Michael A wrote:
> Nico,
>
> Are you using MPI for the parallelism (or OpenMP)? I am assuming MPI.
>
> There are known issue with MPI mapping to multicore nodes. I don't know if
> any of these are issues for you.
>
> Can you do other MPI based computations with good speedup? If so, then it
> might be something Epetra-specific. Is this something you can confirm?
>
> Thanks.
>
> Mike
>
>
> On 1/21/11 8:21 PM, "Nico Schlömer"<nico.schloemer at ua.ac.be> wrote:
>
>> Hi all,
>>
>> I just performed some simple timings for one matrix-vector product with
>> an Epetra_FECrsMatrix, distributed over 48 cores of a shared-memory
>> machine. After the matrix construction, keoMatrix.GlobalAssemble() is
>> called to optimize the storage.
>> RangeMap and DomainMap are (about) show that rows and columns are about
>> evenly spread over the cores, and when performing the actual mv-product,
>>
>> M->Apply( *epetra_x, *epetra_b );
>>
>> epetra_x has the DomainMap and epetra_b has the RangeMap of M.
>>
>> I expected that the process would take approximately evenly long on each
>> for each of the processes, so I was surprised to see
>>
>> ==============================================================================
>> ========
>> TimeMonitor Results
>>
>> Timer Name Min over procs Avg over procs Max
>> over procs
>> ------------------------------------------------------------------------------
>> --------
>> Matrix-vector multiplication 0.009653 (1) 0.01869 (1) 0.03121 (1)
>> ==============================================================================
>> ========
>>
>> There are cases where T_max/T_min> 5, too.
>>
>> This of course destroys the parallel efficiency of the mv-products.
>>
>> Any hint on what may possibly cause this?
>>
>> Cheers,
>> Nico
>>
>>
>> _______________________________________________
>> Trilinos-Users mailing list
>> Trilinos-Users at software.sandia.gov
>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speedup.pdf
Type: application/pdf
Size: 18778 bytes
Desc: not available
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20110123/401b2f1a/attachment-0001.pdf
More information about the Trilinos-Users
mailing list