[Trilinos-Users] Epetra slow down instead of speed up on local machine (OpenMPI?)

Heroux, Mike MHeroux at CSBSJU.EDU
Thu Jun 19 09:43:25 MDT 2014


Gyorgy,

Is your problem sufficiently large to get an advantage from parallel execution?  Your runtimes are sufficiently long, but if the vector lengths passed in to Dot() and SumAll() are small, the overhead of MPI might be too high to see improvement.

Mike

From: Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>>
Date: Thursday, June 19, 2014 9:52 AM
To: Bart Janssens <bart at bartjanssens.org<mailto:bart at bartjanssens.org>>
Cc: "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>" <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Subject: Re: [Trilinos-Users] Epetra slow down instead of speed up on local machine (OpenMPI?)

Hi Bart,

Thanks for your advice! I've rebuilt OpenBLAS so now it's single-threaded. The 1 core runtime decreased substantially it's down to 47 seconds compared to the previous 129 seconds. The 47 seconds is a realistic time, it takes 36 seconds on Stampede.
Unfortunately I'm still struggling with the slow down issue. With 2 cores the runtime jumps to 148 seconds. Which is three times as much as with a single core.

Does anyone have ideas what the issue could be? It seems communication takes way too much time. In my code the only functions that require communication are Epetra's Dot() and SumAll().

Thanks for any advice in advance!
Best,
Gyorgy


On Wed, Jun 18, 2014 at 2:14 PM, Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>> wrote:
Hi Bart,

thanks for the quick response. I believe I didn't do that. This is what I did:

***************************
$ make NO_AFFINITY=1
***************************

And I got the following output:
***********************************************************************************************
OpenBLAS build complete.

  OS               ... Linux
  Architecture     ... x86_64
  BINARY           ... 64bit
  C compiler       ... GCC  (command line : gcc)
  Fortran compiler ... GFORTRAN  (command line : gfortran)
  Library Name     ... libopenblas_sandybridgep-r0.2.8.a (Multi threaded; Max num-threads is 32)

To install the library, you can run "make PREFIX=/path/to/your/installation install".
***********************************************************************************************

I'll try to rebuild it as you suggested.
Thank you! Best,
Gyorgy




On Wed, Jun 18, 2014 at 2:03 PM, Bart Janssens <bart at bartjanssens.org<mailto:bart at bartjanssens.org>> wrote:
On Wed, Jun 18, 2014 at 7:29 PM, Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:matyasfalvi at gmail.com>> wrote:
On the local machine I'm using OpenMPI and OpenBLAS. On Stampede MVAPICH2  and Intel MKL. I wonder if this could be the problem. Does anyone have experience with OpenMPI and Epetra? It seems to me there is a communication issue.


Hi Gyorgy,

Did you compile OpenBLAS without threads (i.e. set USE_THREAD = 0) ? This is necessary when combining with MPI, since otherwise you may overload the machine as OpenBLAS itself can spawn multiple threads.

Cheers,

Bart





More information about the Trilinos-Users mailing list