[Trilinos-Users] [EXTERNAL] ML/AztecOO scalability

Eric Marttila eric.marttila at thermoanalytics.com
Wed Feb 1 04:47:17 MST 2012


Mike,
Thanks for the information.  I will continue with my implementation and hope 
for better speedups when I deploy on multi-node systems.
--Eric

On Tuesday, January 31, 2012 10:31:12 am Heroux, Michael A wrote:
> Eric,
> 
> It's hard to tell from the information you give whether or not you are
> getting the optimal performance from your system.  However, your results
> are within the range of possible speedups.  It is difficult to realize
> linear speedup on a multicore node since the algorithms you are using have
> heavy memory system performance demands.
> 
> Since the memory system is shared by the cores on your machine, when you
> run on a single core the memory system is dedicated to serving that single
> core.  Similarly with two cores, each has half of the memory system to
> support it.  As you add cores they start to compete for this limited
> resource.
> 
> You shouldn't use this as a harbinger of general scalability.  Most users
> of Trilinos run on multi-node systems.  In this situation, the memory
> system capabilities scale with the number of nodes.  Trilinos has been
> used to scale on the largest machines around, and ML/AztecOO is well-known
> for its scalability.
> 
> Furthermore, even on larger multicore systems, the memory system is
> segmented so you should see some improvement at larger core counts.
> 
> I hope this helps.
> 
> Mike
> 
> On 1/31/12 8:48 AM, "Eric Marttila" <eric.marttila at thermoanalytics.com>
> 
> wrote:
> >Dear all,
> >
> >I've used ML and AztecOO for solving linear systems on a single
> >processor.
> >Now I'm looking at running on multiple processors.  As a first step I
> >compiled
> >and ran the ML/AztecOO example (solving Laplace equation) from:
> >
> >http://code.google.com/p/trilinos/wiki/MLAztecOO
> >
> >I used a problem size of 8 million and ran with 1, 2, 3, and 4
> >processors.
> >The solution times for each case are listed below:
> >
> >np=1               Solution time: 38.168500 (sec.)
> >np=2               Solution time: 26.174297 (sec.)
> >np=3               Solution time: 21.492841 (sec.)
> >np=4               Solution time: 21.108066 (sec.)
> >
> >Can anyone comment on whether or not these timing results are reasonable?
> >
> > I
> >
> >was expecting that I would see close to linear speedup, but with this
> >case I
> >see a speedup of only 1.8 with 4 processors, and based on the trend there
> >would be little or no additional speedup with more processors.  I tested
> >with
> >smaller problem sizes and saw the same results.
> >
> >I'm using Trilinos 10.8.5, configured as release, with MPI enabled.  I'm
> >using
> >mpich2 and the tests were run on a quad-core (Intel i7) with 8 Gb of
> >memory.
> >(OS is 64bit linux fedora 14)
> >
> >I would appreciate any comments and/or suggestions of what I could do to
> >improve the scalability.
> >
> >Thank you.
> >--Eric

-- 
Eric A. Marttila
ThermoAnalytics, Inc.
23440 Airpark Blvd.
Calumet, MI 49913

email: Eric.Marttila at ThermoAnalytics.com
phone: 810-636-2443
fax:   906-482-9755
web: http://www.thermoanalytics.com



More information about the Trilinos-Users mailing list