[Trilinos-Users] [EXTERNAL] Re: Epetra slow down instead of speed up on local machine (OpenMPI?)

Phipps, Eric T etphipp at sandia.gov
Thu Jun 19 11:28:51 MDT 2014


Are you properly binding your MPI ranks to separate cores?  It’s acting like all or some of your MPI ranks are running on the same core.

-Eric

From: Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>>
Date: Thursday, June 19, 2014 at 9:59 AM
To: "Heroux, Mike" <MHeroux at csbsju.edu<mailto:MHeroux at csbsju.edu>>
Cc: "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>" <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Subject: [EXTERNAL] Re: [Trilinos-Users] Epetra slow down instead of speed up on local machine (OpenMPI?)

Hi Mike,

Thanks for your response! The the largest vector I'm dealing with in this particular case has dimension: 277344. I call the Dot() on that vector. The SumAll() is used on a small vector that has dimension: 263.
On Stampede I get almost perfect scaling up to 128 cores so that's why I'm suspecting that I've made an error in the trilinos configure script, may be there is some additional info I have to provide so that OpenMPI and Epetra can work properly. Do you know of any issues, additional packages etc. that need to be installed for Epetra when using with OpenMPI?
Basically the only difference between the runs on Stampede and my local 16 core machine are the BLAS and MPI library. So I assume that's where something goes wrong.

Thank you! Best,
Gyorgy




On Thu, Jun 19, 2014 at 11:43 AM, Heroux, Mike <MHeroux at csbsju.edu<mailto:MHeroux at csbsju.edu>> wrote:
Gyorgy,

Is your problem sufficiently large to get an advantage from parallel execution?  Your runtimes are sufficiently long, but if the vector lengths passed in to Dot() and SumAll() are small, the overhead of MPI might be too high to see improvement.

Mike

From: Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com><mailto:matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>>>
Date: Thursday, June 19, 2014 9:52 AM
To: Bart Janssens <bart at bartjanssens.org<mailto:bart at bartjanssens.org><mailto:bart at bartjanssens.org<mailto:bart at bartjanssens.org>>>
Cc: "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>" <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>>
Subject: Re: [Trilinos-Users] Epetra slow down instead of speed up on local machine (OpenMPI?)

Hi Bart,

Thanks for your advice! I've rebuilt OpenBLAS so now it's single-threaded. The 1 core runtime decreased substantially it's down to 47 seconds compared to the previous 129 seconds. The 47 seconds is a realistic time, it takes 36 seconds on Stampede.
Unfortunately I'm still struggling with the slow down issue. With 2 cores the runtime jumps to 148 seconds. Which is three times as much as with a single core.

Does anyone have ideas what the issue could be? It seems communication takes way too much time. In my code the only functions that require communication are Epetra's Dot() and SumAll().

Thanks for any advice in advance!
Best,
Gyorgy


On Wed, Jun 18, 2014 at 2:14 PM, Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com><mailto:matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>>> wrote:
Hi Bart,

thanks for the quick response. I believe I didn't do that. This is what I did:

***************************
$ make NO_AFFINITY=1
***************************

And I got the following output:
***********************************************************************************************
OpenBLAS build complete.

  OS               ... Linux
  Architecture     ... x86_64
  BINARY           ... 64bit
  C compiler       ... GCC  (command line : gcc)
  Fortran compiler ... GFORTRAN  (command line : gfortran)
  Library Name     ... libopenblas_sandybridgep-r0.2.8.a (Multi threaded; Max num-threads is 32)

To install the library, you can run "make PREFIX=/path/to/your/installation install".
***********************************************************************************************

I'll try to rebuild it as you suggested.
Thank you! Best,
Gyorgy




On Wed, Jun 18, 2014 at 2:03 PM, Bart Janssens <bart at bartjanssens.org<mailto:bart at bartjanssens.org><mailto:bart at bartjanssens.org<mailto:bart at bartjanssens.org>>> wrote:
On Wed, Jun 18, 2014 at 7:29 PM, Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com><mailto:matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>>> wrote:
On the local machine I'm using OpenMPI and OpenBLAS. On Stampede MVAPICH2  and Intel MKL. I wonder if this could be the problem. Does anyone have experience with OpenMPI and Epetra? It seems to me there is a communication issue.


Hi Gyorgy,

Did you compile OpenBLAS without threads (i.e. set USE_THREAD = 0) ? This is necessary when combining with MPI, since otherwise you may overload the machine as OpenBLAS itself can spawn multiple threads.

Cheers,

Bart




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20140619/4381479a/attachment-0001.html>


More information about the Trilinos-Users mailing list