[Trilinos-Users] [EXTERNAL] Re: ML/AztecOO scalability

Heroux, Michael A maherou at sandia.gov
Wed Feb 1 14:16:27 MST 2012


Riccardo,

Epetra does have support for threading in the sparse matrix multiplication kernels, vector and multivector kernels.

This is turned on by using the following option to Cmake:

-D Trilinos_ENABLE_OpenMP:BOOL=ON

There are no threaded preconditioners at this time, except those that depend on sparse MV.  AztecOO uses Epetra for sparse computations, but does its own vector operations.  Presently those are not threaded.

So you should see some performance improvement on the sparse MV, but not elsewhere.  Belos uses Epetra more fully.  If you use Belos with ML, and use Chebyshev smoothers, you should have a fully-threaded solver.

Mike

From: <rrossi at cimne.upc.edu<mailto:rrossi at cimne.upc.edu>>
Date: Wed, 1 Feb 2012 08:52:13 +0100
To: <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Subject: [EXTERNAL] Re: [Trilinos-Users] ML/AztecOO scalability


Well, i also have a question concerning scalability:

i saw that Epetra has some initial support for mixed mpi-OpenMP.

While i guess this will directly benefit AztecOO, is this expected to improve performance also of the ML preconditioners?

for example do the matrix-matrix products within ML benefit of OpenMP?

thx in advance for any hint on this subject

Riccardo



On Tue, 31 Jan 2012 12:00:02 -0700, trilinos-users-request at software.sandia.gov<mailto:trilinos-users-request at software.sandia.gov> wrote:

Send Trilinos-Users mailing list submissions to
        trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>

To subscribe or unsubscribe via the World Wide Web, visit
        http://software.sandia.gov/mailman/listinfo/trilinos-users
or, via email, send a message with subject or body 'help' to
        trilinos-users-request at software.sandia.gov<mailto:trilinos-users-request at software.sandia.gov>

You can reach the person managing the list at
        trilinos-users-owner at software.sandia.gov<mailto:trilinos-users-owner at software.sandia.gov>

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Trilinos-Users digest..."


Today's Topics:

   1. Re: [EXTERNAL] Trouble installing Trilinos on IBM-AIX
      (Devine, Karen D)
   2. ML/AztecOO scalability (Eric Marttila)
   3. Re: [EXTERNAL]  ML/AztecOO scalability (Heroux, Michael A)


----------------------------------------------------------------------

Message: 1
Date: Mon, 30 Jan 2012 20:58:01 +0000
From: "Devine, Karen D" <kddevin at sandia.gov<mailto:kddevin at sandia.gov>>
Subject: Re: [Trilinos-Users] [EXTERNAL] Trouble installing Trilinos
        on IBM-AIX
To: Duk-Soon Oh <duksoon at cims.nyu.edu<mailto:duksoon at cims.nyu.edu>>,
        "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>"
        <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Message-ID: <CB4C518B.D483%kddevin at sandia.gov<mailto:CB4C518B.D483%kddevin at sandia.gov>>
Content-Type: text/plain; charset="us-ascii"

Can you send a recursive listing of the contents of
/glad/home/dsoon/soft/trilinos/packages/zoltan ?  My listing after
building with a script similar to yours is below.  Library libzoltan.a is
in zoltan/src.

% ls -R zoltan
CMakeFiles/             Makefile                ZoltanConfig.cmake
src/
CTestTestfile.cmake     Makefile.export.Zoltan  cmake_install.cmake

zoltan/CMakeFiles:
CMakeDirectoryInformation.cmake         ZoltanConfig_install.cmake
     Zoltan_libs.dir/
Makefile.export.Zoltan_install          Zoltan_all.dir/
     progress.marks

zoltan/CMakeFiles/Zoltan_all.dir:
DependInfo.cmake        build.make              cmake_clean.cmake
progress.make

zoltan/CMakeFiles/Zoltan_libs.dir:
DependInfo.cmake        build.make              cmake_clean.cmake
progress.make

zoltan/src:
CMakeFiles/             CTestTestfile.cmake     Makefile
Zoltan_config.h         cmake_install.cmake     libzoltan.a

zoltan/src/CMakeFiles:
CMakeDirectoryInformation.cmake         progress.marks
     zoltan.dir/

zoltan/src/CMakeFiles/zoltan.dir:
C.includecache                  cmake_clean.cmake               graph/
                     matrix/                         rcb/
DependInfo.cmake                cmake_clean_target.cmake        ha/
                     order/                          reftree/
Labels.txt                      coloring/                       hier/
                     par/                            simple/
Utilities/                      depend.internal                 hsfc/
                     params/                         timer/
all/                            depend.make                     lb/
                     phg/                            tpls/
build.make                      flags.make                      link.txt
                     progress.make                   zz/

zoltan/src/CMakeFiles/zoltan.dir/Utilities:
Communication/  DDirectory/     Memory/         Timer/          shared/

zoltan/src/CMakeFiles/zoltan.dir/Utilities/Communication:
comm_create.c.o                 comm_do.c.o
comm_exchange_sizes.c.o         comm_invert_map.c.o
comm_resize.c.o
comm_destroy.c.o                comm_do_reverse.c.o
comm_info.c.o                   comm_invert_plan.c.o
comm_sort_ints.c.o

zoltan/src/CMakeFiles/zoltan.dir/Utilities/DDirectory:
DD_Create.c.o                   DD_Hash2.c.o
DD_Set_Hash_Fn.c.o              DD_Set_Neighbor_Hash_Fn3.c.o
DD_Destroy.c.o                  DD_Print.c.o
DD_Set_Neighbor_Hash_Fn1.c.o    DD_Stats.c.o
DD_Find.c.o                     DD_Remove.c.o
DD_Set_Neighbor_Hash_Fn2.c.o    DD_Update.c.o

zoltan/src/CMakeFiles/zoltan.dir/Utilities/Memory:
mem.c.o

zoltan/src/CMakeFiles/zoltan.dir/Utilities/Timer:
timer.c.o               zoltan_timer.c.o

zoltan/src/CMakeFiles/zoltan.dir/Utilities/shared:
zoltan_align.c.o        zoltan_id.c.o

zoltan/src/CMakeFiles/zoltan.dir/all:
all_allo.c.o

zoltan/src/CMakeFiles/zoltan.dir/coloring:
bucket.c.o      color_test.c.o  coloring.c.o    g2l_hash.c.o

zoltan/src/CMakeFiles/zoltan.dir/graph:
graph.c.o

zoltan/src/CMakeFiles/zoltan.dir/ha:
divide_machine.c.o      get_processor_name.c.o  ha_drum.c.o
ha_ovis.c.o

zoltan/src/CMakeFiles/zoltan.dir/hier:
hier.c.o                hier_free_struct.c.o

zoltan/src/CMakeFiles/zoltan.dir/hsfc:
hsfc.c.o                hsfc_box_assign.c.o     hsfc_hilbert.c.o
hsfc_point_assign.c.o

zoltan/src/CMakeFiles/zoltan.dir/lb:
lb_balance.c.o          lb_eval.c.o             lb_invert.c.o
lb_point_assign.c.o     lb_set_method.c.o
lb_box_assign.c.o       lb_free.c.o             lb_migrate.c.o
lb_remap.c.o            lb_set_part_sizes.c.o
lb_copy.c.o             lb_init.c.o             lb_part2proc.c.o
lb_set_fn.c.o

zoltan/src/CMakeFiles/zoltan.dir/matrix:
matrix_build.c.o        matrix_distribute.c.o   matrix_operations.c.o
matrix_sym.c.o          matrix_utils.c.o

zoltan/src/CMakeFiles/zoltan.dir/order:
hsfcOrder.c.o           order.c.o               order_struct.c.o
order_tools.c.o         perm.c.o

zoltan/src/CMakeFiles/zoltan.dir/par:
par_average.c.o                 par_median.c.o
par_stats.c.o                   par_tflops_special.c.o
par_bisect.c.o                  par_median_randomized.c.o
par_sync.c.o

zoltan/src/CMakeFiles/zoltan.dir/params:
assign_param_vals.c.o   check_param.c.o         key_params.c.o
set_param.c.o
bind_param.c.o          free_params.c.o         print_params.c.o

zoltan/src/CMakeFiles/zoltan.dir/phg:
phg.c.o                         phg_comm.c.o
phg_match.c.o                   phg_rdivide.c.o
phg_two_ways.c.o
phg_Vcycle.c.o                  phg_distrib.c.o
phg_order.c.o                   phg_refinement.c.o
phg_util.c.o
phg_build.c.o                   phg_gather.c.o
phg_parkway.c.o                 phg_scale.c.o
phg_verbose.c.o
phg_build_calls.c.o             phg_hypergraph.c.o
phg_patoh.c.o                   phg_serialpartition.c.o
phg_coarse.c.o                  phg_lookup.c.o
phg_plot.c.o                    phg_tree.c.o

zoltan/src/CMakeFiles/zoltan.dir/rcb:
box_assign.c.o          inertial1d.c.o          inertial3d.c.o
rcb.c.o                 rcb_util.c.o            rib_util.c.o
create_proc_list.c.o    inertial2d.c.o          point_assign.c.o
rcb_box.c.o             rib.c.o                 shared.c.o

zoltan/src/CMakeFiles/zoltan.dir/reftree:
reftree_build.c.o               reftree_coarse_path.c.o
reftree_hash.c.o                reftree_part.c.o

zoltan/src/CMakeFiles/zoltan.dir/simple:
block.c.o       cyclic.c.o      random.c.o

zoltan/src/CMakeFiles/zoltan.dir/timer:
timer_params.c.o

zoltan/src/CMakeFiles/zoltan.dir/tpls:
build_graph.c.o         postprocessing.c.o      preprocessing.c.o
scatter_graph.c.o       third_library.c.o       verify_graph.c.o

zoltan/src/CMakeFiles/zoltan.dir/zz:
murmur3.c.o             zz_gen_files.c.o        zz_init.c.o
zz_rand.c.o             zz_struct.c.o
zz_back_trace.c.o       zz_hash.c.o             zz_map.c.o
zz_set_fn.c.o           zz_util.c.o
zz_coord.c.o            zz_heap.c.o             zz_obj_list.c.o
zz_sort.c.o




On 1/30/12 12:16 AM, "Duk-Soon Oh" <duksoon at cims.nyu.edu<mailto:duksoon at cims.nyu.edu>> wrote:


Dear all,
I am trying to install Trilinos on IBM-AIX. It seems that other packages
are alright.
However, I am getting the following error message for Zoltan package:

Scanning dependencies of target last_lib
[  0%] Building C object CMakeFiles/last_lib.dir/last_lib_dummy.c.o
Linking C static library liblast_lib.a
Target "CMakeFiles/last_lib.dir/build" is up to date.
[  0%] Built target last_lib
Target "all" is up to date.
Target "preinstall" is up to date.
Install the project...
-- Install configuration: "RELEASE"
-- Installing:
/glade/home/dsoon/soft/trilinos/lib/cmake/Trilinos/TrilinosTargets.cmake
-- Installing:
/glade/home/dsoon/soft/trilinos/lib/cmake/Trilinos/TrilinosTargets-release
.cmake
-- Installing:
/glade/home/dsoon/soft/trilinos/lib/cmake/Trilinos/TrilinosConfig.cmake
-- Installing:
/glade/home/dsoon/soft/trilinos/include/Makefile.export.Trilinos
-- Installing:
/glade/home/dsoon/soft/trilinos/lib/cmake/Trilinos/TrilinosConfigVersion.c
make
-- Installing: /glade/home/dsoon/soft/trilinos/include/Trilinos_version.h
-- Installing:
/glade/home/dsoon/soft/trilinos/include/TrilinosConfig.cmake
-- Installing:
/glade/home/dsoon/soft/trilinos/lib/cmake/Zoltan/ZoltanConfig.cmake
-- Installing:
/glade/home/dsoon/soft/trilinos/include/Makefile.export.Zoltan
CMake Error at packages/zoltan/src/cmake_install.cmake:31 (FILE):
file INSTALL cannot find
"/glade/home/dsoon/soft/trilinos/packages/zoltan/src/libzoltan.a".
Call Stack (most recent call first):
packages/zoltan/cmake_install.cmake:40 (INCLUDE)
cmake_install.cmake:71 (INCLUDE)


make: 1254-004 The error code from the last command is 1.



Stop.

================================================================
Here is my cmake script file.

rm -f CMakeCache.txt

EXTRA_ARGS=$@

cmake \
-D CMAKE_C_COMPILER=mpcc \
-D CMAKE_CXX_COMPILER=mpCC \
-D CMAKE_Fortran_COMPILER=mpxlf \
-D CMAKE_C_FLAGS:STRING="-qmaxmem=-1" \
-D CMAKE_CXX_FLAGS:STRING="-qmaxmem=-1" \
-D CMAKE_INSTALL_PREFIX:PATH="${PWD}" \
-D TPL_ENABLE_MPI:BOOL=ON \
-D Trilinos_ENABLE_TESTS:BOOL=OFF \
-D Trilinos_ENABLE_Zoltan:BOOL=ON \
\
${TRILINOS_HOME}


Could you give me any idea?

Best,

Duk-Soon
_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>http://software.sandia.gov/mailman/listinfo/trilinos-users




------------------------------

Message: 2
Date: Tue, 31 Jan 2012 09:48:11 -0500
From: "Eric Marttila" <eric.marttila at thermoanalytics.com<mailto:eric.marttila at thermoanalytics.com>>
Subject: [Trilinos-Users] ML/AztecOO scalability
To: trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>
Message-ID: <201201310948.11894.eric.marttila at thermoanalytics.com<mailto:201201310948.11894.eric.marttila at thermoanalytics.com>>
Content-Type: text/plain; charset=us-ascii

Dear all,

I've used ML and AztecOO for solving linear systems on a single processor.
Now I'm looking at running on multiple processors.  As a first step I compiled
and ran the ML/AztecOO example (solving Laplace equation) from:

http://code.google.com/p/trilinos/wiki/MLAztecOO

I used a problem size of 8 million and ran with 1, 2, 3, and 4 processors.
The solution times for each case are listed below:

np=1               Solution time: 38.168500 (sec.)
np=2               Solution time: 26.174297 (sec.)
np=3               Solution time: 21.492841 (sec.)
np=4               Solution time: 21.108066 (sec.)

Can anyone comment on whether or not these timing results are reasonable?  I
was expecting that I would see close to linear speedup, but with this case I
see a speedup of only 1.8 with 4 processors, and based on the trend there
would be little or no additional speedup with more processors.  I tested with
smaller problem sizes and saw the same results.

I'm using Trilinos 10.8.5, configured as release, with MPI enabled.  I'm using
mpich2 and the tests were run on a quad-core (Intel i7) with 8 Gb of memory.
(OS is 64bit linux fedora 14)

I would appreciate any comments and/or suggestions of what I could do to
improve the scalability.

Thank you.
--Eric

--
Eric A. Marttila
ThermoAnalytics, Inc.
23440 Airpark Blvd.
Calumet, MI 49913

email: Eric.Marttila at ThermoAnalytics.com<mailto:Eric.Marttila at ThermoAnalytics.com>
phone: 810-636-2443
fax:   906-482-9755
web: http://www.thermoanalytics.com



------------------------------

Message: 3
Date: Tue, 31 Jan 2012 15:31:12 +0000
From: "Heroux, Michael A" <maherou at sandia.gov<mailto:maherou at sandia.gov>>
Subject: Re: [Trilinos-Users] [EXTERNAL]  ML/AztecOO scalability
To: Eric Marttila <eric.marttila at thermoanalytics.com<mailto:eric.marttila at thermoanalytics.com>>,
        "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>"
        <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Message-ID: <CB4D6368.450F7%maherou at sandia.gov<mailto:CB4D6368.450F7%maherou at sandia.gov>>
Content-Type: text/plain; charset="us-ascii"

Eric,

It's hard to tell from the information you give whether or not you are
getting the optimal performance from your system.  However, your results
are within the range of possible speedups.  It is difficult to realize
linear speedup on a multicore node since the algorithms you are using have
heavy memory system performance demands.

Since the memory system is shared by the cores on your machine, when you
run on a single core the memory system is dedicated to serving that single
core.  Similarly with two cores, each has half of the memory system to
support it.  As you add cores they start to compete for this limited
resource.

You shouldn't use this as a harbinger of general scalability.  Most users
of Trilinos run on multi-node systems.  In this situation, the memory
system capabilities scale with the number of nodes.  Trilinos has been
used to scale on the largest machines around, and ML/AztecOO is well-known
for its scalability.

Furthermore, even on larger multicore systems, the memory system is
segmented so you should see some improvement at larger core counts.

I hope this helps.

Mike

On 1/31/12 8:48 AM, "Eric Marttila" <eric.marttila at thermoanalytics.com<mailto:eric.marttila at thermoanalytics.com>>
wrote:


Dear all,

I've used ML and AztecOO for solving linear systems on a single
processor.
Now I'm looking at running on multiple processors.  As a first step I
compiled
and ran the ML/AztecOO example (solving Laplace equation) from:

http://code.google.com/p/trilinos/wiki/MLAztecOO

I used a problem size of 8 million and ran with 1, 2, 3, and 4
processors.
The solution times for each case are listed below:

np=1               Solution time: 38.168500 (sec.)
np=2               Solution time: 26.174297 (sec.)
np=3               Solution time: 21.492841 (sec.)
np=4               Solution time: 21.108066 (sec.)

Can anyone comment on whether or not these timing results are reasonable?
I
was expecting that I would see close to linear speedup, but with this
case I
see a speedup of only 1.8 with 4 processors, and based on the trend there
would be little or no additional speedup with more processors.  I tested
with
smaller problem sizes and saw the same results.

I'm using Trilinos 10.8.5, configured as release, with MPI enabled.  I'm
using
mpich2 and the tests were run on a quad-core (Intel i7) with 8 Gb of
memory.
(OS is 64bit linux fedora 14)

I would appreciate any comments and/or suggestions of what I could do to
improve the scalability.

Thank you.
--Eric

--
Eric A. Marttila
ThermoAnalytics, Inc.
23440 Airpark Blvd.
Calumet, MI 49913

email: Eric.Marttila at ThermoAnalytics.com<mailto:Eric.Marttila at ThermoAnalytics.com>
phone: 810-636-2443
fax:   906-482-9755
web: http://www.thermoanalytics.com

_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>http://software.sandia.gov/mailman/listinfo/trilinos-users




------------------------------

_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>http://software.sandia.gov/mailman/listinfo/trilinos-users


End of Trilinos-Users Digest, Vol 77, Issue 12
**********************************************





_______________________________________________ Trilinos-Users mailing list Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov> http://software.sandia.gov/mailman/listinfo/trilinos-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://software.sandia.gov/pipermail/trilinos-users/attachments/20120201/a16c1722/attachment.html 


More information about the Trilinos-Users mailing list