[Trilinos-Users] [EXTERNAL] Re: Epetra slow down instead of speed up on local machine (OpenMPI?)

Gyorgy Matyasfalvi matyasfalvi at gmail.com
Fri Jun 20 09:01:26 MDT 2014


Thanks to everyone for the help!
Eric and Bart you guys are great! That was the problem! Once I rebuild
OpenBLAS to work single threaded and added the --bind-to-core flag to the
runs everything worked as supposed to.
It's interesting however that other MPI applications that didn't use BLAS
for example a particle simulator I used to run had no scaling issues at
all. Anyway thanks for all the help again!
Best,
George



On Thu, Jun 19, 2014 at 1:28 PM, Phipps, Eric T <etphipp at sandia.gov> wrote:

>  Are you properly binding your MPI ranks to separate cores?  It’s acting
> like all or some of your MPI ranks are running on the same core.
>
>  -Eric
>
>   From: Gyorgy Matyasfalvi <matyasfalvi at gmail.com>
> Date: Thursday, June 19, 2014 at 9:59 AM
> To: "Heroux, Mike" <MHeroux at csbsju.edu>
> Cc: "trilinos-users at software.sandia.gov" <
> trilinos-users at software.sandia.gov>
> Subject: [EXTERNAL] Re: [Trilinos-Users] Epetra slow down instead of
> speed up on local machine (OpenMPI?)
>
>     Hi Mike,
>
>  Thanks for your response! The the largest vector I'm dealing with in this
> particular case has dimension: 277344. I call the Dot() on that vector.
> The SumAll() is used on a small vector that has dimension: 263.
>  On Stampede I get almost perfect scaling up to 128 cores so that's why
> I'm suspecting that I've made an error in the trilinos configure script,
> may be there is some additional info I have to provide so that OpenMPI and
> Epetra can work properly. Do you know of any issues, additional packages
> etc. that need to be installed for Epetra when using with OpenMPI?
>  Basically the only difference between the runs on Stampede and my local
> 16 core machine are the BLAS and MPI library. So I assume that's where
> something goes wrong.
>
>  Thank you! Best,
> Gyorgy
>
>
>
>
> On Thu, Jun 19, 2014 at 11:43 AM, Heroux, Mike <MHeroux at csbsju.edu> wrote:
>
>> Gyorgy,
>>
>> Is your problem sufficiently large to get an advantage from parallel
>> execution?  Your runtimes are sufficiently long, but if the vector lengths
>> passed in to Dot() and SumAll() are small, the overhead of MPI might be too
>> high to see improvement.
>>
>> Mike
>>
>> From: Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:
>> matyasfalvi at gmail.com>>
>> Date: Thursday, June 19, 2014 9:52 AM
>> To: Bart Janssens <bart at bartjanssens.org<mailto:bart at bartjanssens.org>>
>> Cc: "trilinos-users at software.sandia.gov<mailto:
>> trilinos-users at software.sandia.gov>" <trilinos-users at software.sandia.gov
>> <mailto:trilinos-users at software.sandia.gov>>
>> Subject: Re: [Trilinos-Users] Epetra slow down instead of speed up on
>> local machine (OpenMPI?)
>>
>> Hi Bart,
>>
>> Thanks for your advice! I've rebuilt OpenBLAS so now it's
>> single-threaded. The 1 core runtime decreased substantially it's down to 47
>> seconds compared to the previous 129 seconds. The 47 seconds is a realistic
>> time, it takes 36 seconds on Stampede.
>> Unfortunately I'm still struggling with the slow down issue. With 2 cores
>> the runtime jumps to 148 seconds. Which is three times as much as with a
>> single core.
>>
>> Does anyone have ideas what the issue could be? It seems communication
>> takes way too much time. In my code the only functions that require
>> communication are Epetra's Dot() and SumAll().
>>
>> Thanks for any advice in advance!
>> Best,
>> Gyorgy
>>
>>
>>  On Wed, Jun 18, 2014 at 2:14 PM, Gyorgy Matyasfalvi <
>> matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>> wrote:
>> Hi Bart,
>>
>> thanks for the quick response. I believe I didn't do that. This is what I
>> did:
>>
>> ***************************
>> $ make NO_AFFINITY=1
>> ***************************
>>
>> And I got the following output:
>>
>> ***********************************************************************************************
>> OpenBLAS build complete.
>>
>>   OS               ... Linux
>>   Architecture     ... x86_64
>>   BINARY           ... 64bit
>>   C compiler       ... GCC  (command line : gcc)
>>   Fortran compiler ... GFORTRAN  (command line : gfortran)
>>   Library Name     ... libopenblas_sandybridgep-r0.2.8.a (Multi threaded;
>> Max num-threads is 32)
>>
>> To install the library, you can run "make
>> PREFIX=/path/to/your/installation install".
>>
>> ***********************************************************************************************
>>
>> I'll try to rebuild it as you suggested.
>> Thank you! Best,
>> Gyorgy
>>
>>
>>
>>
>>  On Wed, Jun 18, 2014 at 2:03 PM, Bart Janssens <bart at bartjanssens.org
>> <mailto:bart at bartjanssens.org>> wrote:
>> On Wed, Jun 18, 2014 at 7:29 PM, Gyorgy Matyasfalvi <
>> matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>> wrote:
>> On the local machine I'm using OpenMPI and OpenBLAS. On Stampede MVAPICH2
>>  and Intel MKL. I wonder if this could be the problem. Does anyone have
>> experience with OpenMPI and Epetra? It seems to me there is a communication
>> issue.
>>
>>
>> Hi Gyorgy,
>>
>> Did you compile OpenBLAS without threads (i.e. set USE_THREAD = 0) ? This
>> is necessary when combining with MPI, since otherwise you may overload the
>> machine as OpenBLAS itself can spawn multiple threads.
>>
>> Cheers,
>>
>> Bart
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20140620/c824a977/attachment.html>


More information about the Trilinos-Users mailing list