[Trilinos-Users] [EXTERNAL] Re: Using OpenMP support in Trilinos

Heroux, Michael A maherou at sandia.gov
Wed Sep 19 20:56:32 MDT 2012


Here are a few comments:

- Your problem size is certainly sufficient to realize some parallel speedup.
- ML (which I assume you are using) will not see any improvement from OpenMP parallelism.  It is not instrumented for it.
- This means that the only possible parallelism is in the sparse MV and vector updates.  Since ML is usually more than 50% of the total runtime, you won't see a lot of improvement from threading, even when other issues are resolved.

A few suggestions:
- Try to run your environment without ML, just to see if you get any improvement in the SpMV and vector operations.
- If you are using GMRES, make sure you link with a threaded BLAS.  DGEMV is the main kernel of GMRES other than SpMV and will need to be executed in threaded mode.
- Make sure your timer is a wall-clock timer, not a cpu timer.  A reasonable timer is the one that comes with OpenMP.

I hope this helps.  Let me know what you find out.

Mike


On Sep 19, 2012, at 8:39 PM, "Eric Marttila" <eric.marttila at thermoanalytics.com> wrote:

> Mike,
> The problem size is 1 million unknowns. I have Trilinos compiled with MPI enabled. However, I'm launching my program with only one MPI process.
> Here is some system information:
> Processors: Dual Intel Xeon E5645 Hex-Core / 2.4 Ghz / Cache: 12MB
> RAM: 96 GB
> OS: CentOS 6.2 64bit.
> When solving for 1 million unknowns on this system, AztecOO reports the following solution times:
> Solution time: 3.3 seconds (using Trilinos with OpenMP disabled)
> Solution time: 4.0 seconds (using Trilinos with OpenMP enabled)
> I had OMP_NUM_THREADS set to 4.
> If I set OMP_NUM_THREADS to 1 then I get 3.3 seconds in both cases.
> Thanks for your help.
> --Eric
> On Wednesday, September 19, 2012 08:48:20 pm Heroux, Michael A wrote:
> > Eric,
> >
> > Can you give some details about problem size, use of MPI (or not), type of
> > system, etc.
> >
> > Thanks.
> >
> > Mike
> >
> > On Sep 19, 2012, at 3:17 PM, Eric Marttila wrote:
> > > Hello,
> > >
> > > I'm using AztecOO and ML to solve a linear system. I've been running my
> > > simulation in serial mode, but now I would like to take advantage of
> > > multiple cores by using the OpenMP support that is available in
> > > Trilinos. I realize that the packages I'm using are not fully
> > > multi-threaded with openmp, but I'm hoping for some performance
> > > improvement since some of the packages I'm using have at least some
> > > level of OpenMP support.
> > >
> > > I reconfigured and built Trilinos 10.12.2 with
> > >
> > > -D Trilinos_ENABLE_OpenMP:BOOL=ON
> > >
> > > ...but when I run my simulation I see that it is slower than if I have
> > > Trilinos configured without the above option. I have set the environment
> > > variable OMP_NUM_THREADS to the desired number of threads.
> > >
> > > I was also able to reproduce this behavior with one of the trilinos
> > > example prgrams (attached below), so I suspect I am missing something
> > > obvious in using the OpenMP support.
> > >
> > > Does anybody have thoughts of what I might be missing?
> > >
> > > Thanks.
> > > --Eric
> --
> Eric A. Marttila
> ThermoAnalytics, Inc.
> 23440 Airpark Blvd.
> Calumet, MI 49913
> email: Eric.Marttila at ThermoAnalytics.com
> phone: 810-636-2443
> fax: 906-482-9755
> web: http://www.thermoanalytics.com



More information about the Trilinos-Users mailing list