[Trilinos-Users] [Pytrilinos-users] expected Trilinos on shared memory machine.
Daniel Wheeler
daniel.wheeler2 at gmail.com
Wed Apr 9 12:16:43 MDT 2008
On Wed, Apr 9, 2008 at 12:30 PM, Heroux, Michael A <maherou at sandia.gov> wrote:
>
> Daniel,
>
> Depending the preconditioner you are using with AztecOO, you should see
> improvement in performance running in parallel. Performance might be
> limited by the shared bandwidth on your machine. What kind of platform are
> you using?
How do I find out what the shared bandwidth is? I have three proposed
platforms with 2, 8, and 64 nodes. I include the details from meminfo
and cpuinfo below. Do the numbers below include what we're looking
for? Thanks!
2 node machine:
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(tm) MP 2400+
stepping : 1
cpu MHz : 2000.178
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow ts
bogomips : 4003.93
and memory:
MemTotal: 2076444 kB
MemFree: 1177832 kB
Buffers: 916 kB
Cached: 491864 kB
SwapCached: 77132 kB
Active: 495772 kB
Inactive: 355104 kB
HighTotal: 1179072 kB
HighFree: 363188 kB
LowTotal: 897372 kB
LowFree: 814644 kB
SwapTotal: 4000144 kB
SwapFree: 3833308 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 356380 kB
Mapped: 57968 kB
Slab: 35288 kB
PageTables: 4708 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 5038364 kB
Committed_AS: 1097836 kB
VmallocTotal: 114680 kB
VmallocUsed: 3828 kB
VmallocChunk: 110804 kB
==============================================
8 node machine:
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 2812.978
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm
cr8_legacy
bogomips : 5630.17
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
and memory:
MemTotal: 32964928 kB
MemFree: 13414884 kB
Buffers: 204924 kB
Cached: 13604200 kB
SwapCached: 14712 kB
Active: 12276440 kB
Inactive: 6657932 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 32964928 kB
LowFree: 13414884 kB
SwapTotal: 65535992 kB
SwapFree: 65514348 kB
Dirty: 324 kB
Writeback: 0 kB
AnonPages: 5121792 kB
Mapped: 51372 kB
Slab: 541584 kB
PageTables: 19108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 82018456 kB
Committed_AS: 6887204 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 32412 kB
VmallocChunk: 34359705719 kB
==============================================================
64 node machine:
processor : 0
vendor : GenuineIntel
arch : IA-64
family : Itanium 2
model : 2
revision : 1
archrev : 0
features : branchlong
cpu number : 0
cpu regs : 4
cpu MHz : 1500.000000
itc MHz : 1500.000000
BogoMIPS : 2244.60
siblings : 1
and memory:
MemTotal: 516572480 kB
MemFree: 508559920 kB
Buffers: 5872 kB
Cached: 3928208 kB
SwapCached: 0 kB
Active: 4242128 kB
Inactive: 1685520 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 516572480 kB
LowFree: 508559920 kB
SwapTotal: 10490400 kB
SwapFree: 10490400 kB
Dirty: 0 kB
Writeback: 16 kB
Mapped: 1669488 kB
Slab: 969504 kB
CommitLimit: 268776640 kB
Committed_AS: 2103568 kB
PageTables: 5552 kB
VmallocTotal: 137372805568 kB
VmallocUsed: 1010816 kB
VmallocChunk: 137371792720 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 262144 kB
>
> Mike
>
>
>
> On 4/9/08 10:51 AM, "Daniel Wheeler" <daniel.wheeler2 at gmail.com> wrote:
>
>
>
> On Tue, Apr 8, 2008 at 6:22 PM, Bill Spotz <wfspotz at sandia.gov> wrote:
> > On Apr 7, 2008, at 12:52 PM, Daniel Wheeler wrote:
> >
> >
> > > In our code, for a typical problem the majority of the compute time is
> > > spent in the "AztecOO.AztecOO(A, LHS, RHS)" function.
> > >
> >
> > This is just a constructor. Doesn't most of your time get spent in
> >
> > Solver.Iterate(self.iterations, self.tolerance)
> >
> > where Solver is the result of the constructor?
>
> Yes. Sorry Bill. I pasted in the wrong line. So, let me reiterate the
> question. Given that the majority of the time is being spent in
> "Solver.Iterate(self.iterations, self.tolerance)", would you expect
> major speeds ups by compiling Trilinos in parallel and running on a
> shared memory machine?
>
> --
> Daniel Wheeler
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
>
--
Daniel Wheeler
More information about the Trilinos-Users
mailing list