[Trilinos-Users] Epetra slow down instead of speed up on local machine (OpenMPI?)

Gyorgy Matyasfalvi matyasfalvi at gmail.com
Thu Jun 19 09:59:56 MDT 2014


Hi Mike,

Thanks for your response! The the largest vector I'm dealing with in this
particular case has dimension: 277344. I call the Dot() on that vector. The
SumAll() is used on a small vector that has dimension: 263.
On Stampede I get almost perfect scaling up to 128 cores so that's why I'm
suspecting that I've made an error in the trilinos configure script, may be
there is some additional info I have to provide so that OpenMPI and Epetra
can work properly. Do you know of any issues, additional packages etc. that
need to be installed for Epetra when using with OpenMPI?
Basically the only difference between the runs on Stampede and my local 16
core machine are the BLAS and MPI library. So I assume that's where
something goes wrong.

Thank you! Best,
Gyorgy




On Thu, Jun 19, 2014 at 11:43 AM, Heroux, Mike <MHeroux at csbsju.edu> wrote:

> Gyorgy,
>
> Is your problem sufficiently large to get an advantage from parallel
> execution?  Your runtimes are sufficiently long, but if the vector lengths
> passed in to Dot() and SumAll() are small, the overhead of MPI might be too
> high to see improvement.
>
> Mike
>
> From: Gyorgy Matyasfalvi <matyasfalvi at gmail.com<mailto:
> matyasfalvi at gmail.com>>
> Date: Thursday, June 19, 2014 9:52 AM
> To: Bart Janssens <bart at bartjanssens.org<mailto:bart at bartjanssens.org>>
> Cc: "trilinos-users at software.sandia.gov<mailto:
> trilinos-users at software.sandia.gov>" <trilinos-users at software.sandia.gov
> <mailto:trilinos-users at software.sandia.gov>>
> Subject: Re: [Trilinos-Users] Epetra slow down instead of speed up on
> local machine (OpenMPI?)
>
> Hi Bart,
>
> Thanks for your advice! I've rebuilt OpenBLAS so now it's single-threaded.
> The 1 core runtime decreased substantially it's down to 47 seconds compared
> to the previous 129 seconds. The 47 seconds is a realistic time, it takes
> 36 seconds on Stampede.
> Unfortunately I'm still struggling with the slow down issue. With 2 cores
> the runtime jumps to 148 seconds. Which is three times as much as with a
> single core.
>
> Does anyone have ideas what the issue could be? It seems communication
> takes way too much time. In my code the only functions that require
> communication are Epetra's Dot() and SumAll().
>
> Thanks for any advice in advance!
> Best,
> Gyorgy
>
>
> On Wed, Jun 18, 2014 at 2:14 PM, Gyorgy Matyasfalvi <matyasfalvi at gmail.com
> <mailto:matyasfalvi at gmail.com>> wrote:
> Hi Bart,
>
> thanks for the quick response. I believe I didn't do that. This is what I
> did:
>
> ***************************
> $ make NO_AFFINITY=1
> ***************************
>
> And I got the following output:
>
> ***********************************************************************************************
> OpenBLAS build complete.
>
>   OS               ... Linux
>   Architecture     ... x86_64
>   BINARY           ... 64bit
>   C compiler       ... GCC  (command line : gcc)
>   Fortran compiler ... GFORTRAN  (command line : gfortran)
>   Library Name     ... libopenblas_sandybridgep-r0.2.8.a (Multi threaded;
> Max num-threads is 32)
>
> To install the library, you can run "make
> PREFIX=/path/to/your/installation install".
>
> ***********************************************************************************************
>
> I'll try to rebuild it as you suggested.
> Thank you! Best,
> Gyorgy
>
>
>
>
> On Wed, Jun 18, 2014 at 2:03 PM, Bart Janssens <bart at bartjanssens.org
> <mailto:bart at bartjanssens.org>> wrote:
> On Wed, Jun 18, 2014 at 7:29 PM, Gyorgy Matyasfalvi <matyasfalvi at gmail.com
> <mailto:matyasfalvi at gmail.com>> wrote:
> On the local machine I'm using OpenMPI and OpenBLAS. On Stampede MVAPICH2
>  and Intel MKL. I wonder if this could be the problem. Does anyone have
> experience with OpenMPI and Epetra? It seems to me there is a communication
> issue.
>
>
> Hi Gyorgy,
>
> Did you compile OpenBLAS without threads (i.e. set USE_THREAD = 0) ? This
> is necessary when combining with MPI, since otherwise you may overload the
> machine as OpenBLAS itself can spawn multiple threads.
>
> Cheers,
>
> Bart
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20140619/d630441c/attachment.html>


More information about the Trilinos-Users mailing list