[Trilinos-Users] [EXTERNAL] ML/AztecOO scalability
Eric Marttila
eric.marttila at thermoanalytics.com
Wed Feb 1 04:47:17 MST 2012
Mike,
Thanks for the information. I will continue with my implementation and hope
for better speedups when I deploy on multi-node systems.
--Eric
On Tuesday, January 31, 2012 10:31:12 am Heroux, Michael A wrote:
> Eric,
>
> It's hard to tell from the information you give whether or not you are
> getting the optimal performance from your system. However, your results
> are within the range of possible speedups. It is difficult to realize
> linear speedup on a multicore node since the algorithms you are using have
> heavy memory system performance demands.
>
> Since the memory system is shared by the cores on your machine, when you
> run on a single core the memory system is dedicated to serving that single
> core. Similarly with two cores, each has half of the memory system to
> support it. As you add cores they start to compete for this limited
> resource.
>
> You shouldn't use this as a harbinger of general scalability. Most users
> of Trilinos run on multi-node systems. In this situation, the memory
> system capabilities scale with the number of nodes. Trilinos has been
> used to scale on the largest machines around, and ML/AztecOO is well-known
> for its scalability.
>
> Furthermore, even on larger multicore systems, the memory system is
> segmented so you should see some improvement at larger core counts.
>
> I hope this helps.
>
> Mike
>
> On 1/31/12 8:48 AM, "Eric Marttila" <eric.marttila at thermoanalytics.com>
>
> wrote:
> >Dear all,
> >
> >I've used ML and AztecOO for solving linear systems on a single
> >processor.
> >Now I'm looking at running on multiple processors. As a first step I
> >compiled
> >and ran the ML/AztecOO example (solving Laplace equation) from:
> >
> >http://code.google.com/p/trilinos/wiki/MLAztecOO
> >
> >I used a problem size of 8 million and ran with 1, 2, 3, and 4
> >processors.
> >The solution times for each case are listed below:
> >
> >np=1 Solution time: 38.168500 (sec.)
> >np=2 Solution time: 26.174297 (sec.)
> >np=3 Solution time: 21.492841 (sec.)
> >np=4 Solution time: 21.108066 (sec.)
> >
> >Can anyone comment on whether or not these timing results are reasonable?
> >
> > I
> >
> >was expecting that I would see close to linear speedup, but with this
> >case I
> >see a speedup of only 1.8 with 4 processors, and based on the trend there
> >would be little or no additional speedup with more processors. I tested
> >with
> >smaller problem sizes and saw the same results.
> >
> >I'm using Trilinos 10.8.5, configured as release, with MPI enabled. I'm
> >using
> >mpich2 and the tests were run on a quad-core (Intel i7) with 8 Gb of
> >memory.
> >(OS is 64bit linux fedora 14)
> >
> >I would appreciate any comments and/or suggestions of what I could do to
> >improve the scalability.
> >
> >Thank you.
> >--Eric
--
Eric A. Marttila
ThermoAnalytics, Inc.
23440 Airpark Blvd.
Calumet, MI 49913
email: Eric.Marttila at ThermoAnalytics.com
phone: 810-636-2443
fax: 906-482-9755
web: http://www.thermoanalytics.com
More information about the Trilinos-Users
mailing list