[Trilinos-Users] [EXTERNAL] Re: Results from a scaling study of ML
John Cary
cary at colorado.edu
Tue Apr 6 05:50:48 MST 2021
Hi James,
I had forgotten that we patch trilinos to get it to build without
parmetis/metis.
We cannot include those in the build chain, as they have a commercial
license.
We use SuperLU_Dist-5.4.0.
Our patch for that is
diff -ruN trilinos-13.0.0/packages/amesos/CMakeLists.txt
trilinos-13.0.0-new/packages/amesos/CMakeLists.txt
--- trilinos-13.0.0/packages/amesos/CMakeLists.txt 2020-08-05
19:22:40.000000000 -0600
+++ trilinos-13.0.0-new/packages/amesos/CMakeLists.txt 2020-10-31
13:03:07.394676831 -0600
@@ -10,9 +10,11 @@
# B) Set up package-specific options
#
-# if using SuperLUDist, must also link in ParMETIS for some reason
-IF(${PACKAGE_NAME}_ENABLE_SuperLUDist AND NOT
${PACKAGE_NAME}_ENABLE_ParMETIS)
- MESSAGE(FATAL_ERROR "The Amesos support for the SuperLUIDist TPL
requires the ParMETIS TPL. Either disable Amesos SuperLUDist support or
enable the ParMETIS TPL.")
+# One can now configure SuperLUDist without ParMETIS
+if (NOT TPL_ENABLE_SuperLUDist_Without_ParMETIS)
+ IF(${PACKAGE_NAME}_ENABLE_SuperLUDist AND NOT
${PACKAGE_NAME}_ENABLE_ParMETIS)
+ MESSAGE(FATAL_ERROR "The Amesos support for the SuperLUDist TPL
requires the ParMETIS TPL. Either disable Amesos SuperLUDist support or
enable the ParMETIS TPL.")
+ ENDIF()
ENDIF()
Our full patch is attached. It has some pretty small changes to also
get trilinos to
build for us on Windows, where we use LLVM-10. I also had to add a fix
for superlu
version < 5, which works for me, but I am not sure whether it is right.
I suppose
I should try submitting PRs again, but will have to reproduce the
reasons for the PR.
Our SuperLU_Dist configuration includes
-Denable_parmetislib:BOOL='OFF' \
-DXSDK_ENABLE_Fortran:BOOL='OFF' \
-Denable_blaslib:BOOL='OFF' \
I attach that full configure script for your reference.
So when you run ML, is SuperLU used somehow?
Thx....John
On 4/6/21 6:28 AM, Elliott, James John wrote:
> I fat-fingered my final comment:
> So, I guess I am curious if Trilinos supports the case SuperLUDist
> **without**
> ParMETIS. Glancing at the superlu_dist.a library, I do see symbols for getting metis/parmetis. (I don't know the precise configure used for SuperLUDist when it was built)
>
> Sorry!
>
> On 4/6/21, 4:09 AM, "Trilinos-Users on behalf of Elliott, James John" <trilinos-users-bounces at trilinos.org on behalf of jjellio at sandia.gov> wrote:
>
> John,
>
> I checked on our mini Cori. A few things:
>
> I tried using our the mojo that our CI toolchains use for this platform (ATDM environment with ats1-haswell-intel-relese) - the following bit is a short hand used in some of our apps+CI - on the mini Cori (ATS1), we have TPLs built that the CI framework uses for nightly testing. (I used a slightly modified version of your Cmake though - not the SNL 'atdm shortcuts')
>
> 1. On that platform, we don't support GNU - so I figured I'd just try Intel.
> 2. I then saw `-DTPL_ENABLE_SuperLUDist_Without_ParMETIS:BOOL=TRUE`
> In the CMake script - I do not believe that is a combo we test.
>
> 3. When I spun off a build against trilinos/develop, Ameso cries:
> ```
> Processing enabled package: Amesos (Libs, Examples)
> CMake Error at packages/amesos/CMakeLists.txt:15 (MESSAGE):
> The Amesos support for the SuperLUIDist TPL requires the ParMETIS TPL.
> Either disable Amesos SuperLUDist support or enable the ParMETIS TPL.
> ```
>
> 4. if I enable ParMETIS, I see this at the end of configure:
> Unused: Trilinos_ENABLE_SuperLU5_API (Maybe this is not needed? Or is my SuperLUDist version high/low enough to negate it?)
> My SuperLUDist is: superlu_dist-5.4.0
>
>
> 5. If I keep ` DTPL_ENABLE_SuperLUDist_Without_ParMETIS:BOOL=TRUE ` and add ParMETIS, Ameso will configure:
> ```
> Processing enabled package: Amesos (Libs, Examples)
> -- Amesos_example_AmesosFactory_Tridiag: NOT added test because Amesos_ENABLE_TESTS='OFF'.
> -- Amesos_example_AmesosFactory: NOT added test because Amesos_ENABLE_TESTS='OFF'.
> -- Amesos_example_AmesosFactory_HB: NOT added test because Amesos_ENABLE_TESTS='OFF'.
> -- Amesos_compare_solvers: NOT added test because Amesos_ENABLE_TESTS='OFF'.
> -- Amesos_a_trivial_mpi_test: NOT added test because Amesos_ENABLE_TESTS='OFF'.
> ```
>
> 6. Curiously, if you configure with (5)
> You will get:
> ```
> CMake Warning:
> Manually-specified variables were not used by the project:
>
> TPL_ENABLE_SuperLUDist_Without_ParMETIS
> Trilinos_ENABLE_SuperLU5_API
> ```
>
> I am on github develop (not release v13)
>
>
> So, I guess I am curious if Trilinos supports the case SuperLUDist w/ParMETIS. Glancing at the superlu_dist.a library, I do see symbols for getting metis/parmetis. (I don't know the precise configure used for SuperLUDist when it was built)
>
> include/superlu_dist_config.h:
> ```
> /* superlu_dist_config.h.in */
>
> /* Enable parmetis */
> #define HAVE_PARMETIS TRUE
>
> /* Enable CombBLAS */
> /* #undef HAVE_COMBBLAS */
>
> /* enable 64bit index mode */
> /* #undef XSDK_INDEX_SIZE */
>
> #if (XSDK_INDEX_SIZE == 64)
> #define _LONGINT 1
> #endif
> ```
>
>
>
> On 3/29/21, 3:52 PM, "Trilinos-Users on behalf of Elliott, James John" <trilinos-users-bounces at trilinos.org on behalf of jjellio at sandia.gov> wrote:
>
> John that's odd.
>
> Cori performance variations usually happen as you scale out to multiple nodes (and you end up with an allocation + other users that causes bad routing performance).
>
> It may be easier to post on github
>
> If you can give me your slurm: sbatch or salloc commands/script. A list of the modules used, and then your srun ( plus app name + flags you give it). I can try to reproduce this on our miniature Cori (trinity testbed at SNL). I no longer have access to NERSC (I was part of the KNL early access program on Cori).
>
> If you are somehow running the Haswell binary on KNL, this could explain a marked slowdown.
> On Cori, you usually have to salloc/sbatch with -C haswell.
>
> A Haswell binary will run on KNL, but a KNL binary will not run on Haswell.
>
> Your loaded modules can also have some impacts on performance (even though the binary may be static)
>
> Jonathan, Chris, and I did run MueLu a reasonable amount on Cori duing the early access. The main culprits (then) were large scale perf variations and tracking down issues in MueLu's repartitioning routines (avoiding many to one communications)
>
> James
>
> On 3/29/21, 6:11 AM, "Trilinos-Users on behalf of John Cary" <trilinos-users-bounces at trilinos.org on behalf of cary at colorado.edu> wrote:
>
> Thanks, James. So I did
>
> srun -n 32 --distribution=block,block -c 2
> /global/cscratch1/sd/cary/builds-cori-gcc/vsimall-cori-gcc/trilinos-13.0.0/parcomm/packages/ml/examples/BasicExamples/ML_preconditioner.exe
>
> but I am still seeing the same single-node scaling of dropping to 25%
> parallel efficiency.
>
> I can see that it is not the fault of ML, because on my own local
> cluster, which has two
> AMD EPYC 7302 16-Core Processor per node, the single-node parallel
> efficiency at 32 processes
> is 82%.
>
> So I guess I still do not know how best to launch on cori.
>
> Thx.....John
>
>
> On 3/28/21 6:18 PM, James Elliott wrote:
> > # cores per proc is usually between 1 and 16 (fill up one socket)
> >
> > I may be off... been a while since I ran there. FYI, cori was really
> > noisy.
> >
> > cores_per_proc=1
> > John, I believe the usual Cori/Haswell slurm launch should look like:
> >
> > srun_opts=(
> > # use cores,v if you want verbosity
> > --cpu_bind=cores
> > -c $(($cores_per_proc*2))
> > # distribution puts ranks on nodes, then sockets
> > # block,block - is like aprun default, which fills
> > # a socket on a node, then the next socket on the same node
> > # the the next node...
> > # block,cyclic is/was the default on Cori
> > # that will put rank0 on socket0, rank1 on socket1 (same node)
> > # and repeat until the node is full. (it will stride your procs
> > # between the sockets on the node)
> > # This detail caused a few apps pain when Trinity swapped from
> > # aprun.
> > # Pick block,block or block,cyclic
> > --distribution=block,block
> > # the usual -n -N stuff
> > )
> >
> > srun "${srun_opts[@]}" ./app ....
> >
> > On 3/28/2021 5:23 PM, John Cary wrote:
> >> Hi All,
> >>
> >> As promised, we have done scaling studies on the haswell nodes on
> >> Cori at NERSC using ML_preconditioner.exe
> >> as compiled, so this is a weak scaling study with 65536 cells/nodes
> >> per processor. We find a parallel efficiency
> >> (speedup/expected speedup) that drops to 25% on 32 processes.
> >>
> >> Is this expected?
> >>
> >> Are their command line args to srun that might improve this? (I
> >> tried various args to --cpu-bind.)
> >>
> >> I can provide plenty more info (configuration line, how run, ...).
> >>
> >> Thx.....John
> >>
> >> On 3/24/21 9:05 AM, John Cary wrote:
> >>>
> >>>
> >>> Thanks, Chris, thanks Jonathan,
> >>>
> >>> I have found these executables, and we are doing scaling studies now.
> >>>
> >>> Will report....John
> >>>
> >>>
> >>>
> >>> On 3/23/21 9:42 PM, Siefert, Christopher wrote:
> >>>> John,
> >>>>
> >>>> There are some scaling examples in
> >>>> trilinoscouplings/examples/scaling (example_Poisson.cpp and
> >>>> example_Poisson2D.cpp) that use the old stack and might do what you
> >>>> need.
> >>>>
> >>>> -Chris
> >>>
> >>>
> >>> On 3/23/21 7:48 PM, Hu, Jonathan wrote:
> >>>> Hi John,
> >>>>
> >>>> ML has a 2D Poisson driver in
> >>>> ml/examples/BasicExamples/ml_preconditioner.cpp. The cmake target
> >>>> should be either "ML_preconditioner" or "ML_preconditioner.exe".
> >>>> There's a really similar one in ml/examples/XML/ml_XML.cpp that you
> >>>> can drive with an XML deck. Is this what you're after?
> >>>>
> >>>> Jonathan
> >>>>
> >>>> On 3/23/21, 5:47 PM, "Trilinos-Users on behalf of John Cary"
> >>>> <trilinos-users-bounces at trilinos.org on behalf of
> >>>> cary at colorado.edu> wrote:
> >>>>
> >>>> We are still using the old stack: ML, Epetra, ...
> >>>>
> >>>> When we run a simple Poisson solve on our cluster (32
> >>>> cores/node), we
> >>>> see parallel efficiency drop to 4% on one node with 32 cores.
> >>>> So we
> >>>> naturally believe we are doing something wrong.
> >>>>
> >>>> Does trilinos come with a simple Poisson-solve executable that
> >>>> we could
> >>>> use to test scaling (to get around the uncertainties of our
> >>>> use of
> >>>> trilinos)?
> >>>>
> >>>> Thx.......John Cary
> >>>>
> >>>> _______________________________________________
> >>>> Trilinos-Users mailing list
> >>>> Trilinos-Users at trilinos.org
> >>>> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> Trilinos-Users mailing list
> >> Trilinos-Users at trilinos.org
> >> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
> >
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at trilinos.org
> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at trilinos.org
> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at trilinos.org
> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
>
>
>
-------------- next part --------------
diff -ruN trilinos-13.0.0/cmake/tribits/win_interface/include/gettimeofday.c trilinos-13.0.0-new/cmake/tribits/win_interface/include/gettimeofday.c
--- trilinos-13.0.0/cmake/tribits/win_interface/include/gettimeofday.c 2020-08-05 19:22:40.000000000 -0600
+++ trilinos-13.0.0-new/cmake/tribits/win_interface/include/gettimeofday.c 2020-10-31 13:03:07.386676573 -0600
@@ -1,4 +1,4 @@
-#include < time.h >
+#include <time.h>
#include <Winsock2.h> /* to get timeval struct */
struct timezone
diff -ruN trilinos-13.0.0/packages/amesos/CMakeLists.txt trilinos-13.0.0-new/packages/amesos/CMakeLists.txt
--- trilinos-13.0.0/packages/amesos/CMakeLists.txt 2020-08-05 19:22:40.000000000 -0600
+++ trilinos-13.0.0-new/packages/amesos/CMakeLists.txt 2020-10-31 13:03:07.394676831 -0600
@@ -10,9 +10,11 @@
# B) Set up package-specific options
#
-# if using SuperLUDist, must also link in ParMETIS for some reason
-IF(${PACKAGE_NAME}_ENABLE_SuperLUDist AND NOT ${PACKAGE_NAME}_ENABLE_ParMETIS)
- MESSAGE(FATAL_ERROR "The Amesos support for the SuperLUIDist TPL requires the ParMETIS TPL. Either disable Amesos SuperLUDist support or enable the ParMETIS TPL.")
+# One can now configure SuperLUDist without ParMETIS
+if (NOT TPL_ENABLE_SuperLUDist_Without_ParMETIS)
+ IF(${PACKAGE_NAME}_ENABLE_SuperLUDist AND NOT ${PACKAGE_NAME}_ENABLE_ParMETIS)
+ MESSAGE(FATAL_ERROR "The Amesos support for the SuperLUDist TPL requires the ParMETIS TPL. Either disable Amesos SuperLUDist support or enable the ParMETIS TPL.")
+ ENDIF()
ENDIF()
IF(${PACKAGE_NAME}_ENABLE_PARAKLETE)
diff -ruN trilinos-13.0.0/packages/amesos2/src/Amesos2_Superlu_def.hpp trilinos-13.0.0-new/packages/amesos2/src/Amesos2_Superlu_def.hpp
--- trilinos-13.0.0/packages/amesos2/src/Amesos2_Superlu_def.hpp 2020-08-05 19:22:40.000000000 -0600
+++ trilinos-13.0.0-new/packages/amesos2/src/Amesos2_Superlu_def.hpp 2020-10-31 13:23:37.114035426 -0600
@@ -747,6 +747,7 @@
// ILU parameters
+#if (SUPERLU_MAJOR_VERSION < 5)
setStringToIntegralParameter<SLU::rowperm_t>("RowPerm", "LargeDiag",
"Type of row permutation strategy to use",
tuple<string>("NOROWPERM","LargeDiag","MY_PERMR"),
@@ -758,6 +759,22 @@
SLU::MY_PERMR),
pl.getRawPtr());
+#else
+ setStringToIntegralParameter<SLU::rowperm_t>("RowPerm", "NOROWPERM",
+ "Type of row permutation strategy to use",
+ tuple<string>("NOROWPERM","LargeDiag_MC64", "LargeDiag_AWPM",
+ "MY_PERMR"),
+ tuple<string>("Use natural ordering",
+ "Use weighted bipartite matching algorithm (not for serial)",
+ "Parallelizable approximate matching algorithm (not for serial)",
+ "Use the ordering given in perm_r input"),
+ tuple<SLU::rowperm_t>(SLU::NOROWPERM,
+ SLU::LargeDiag_MC64,
+ SLU::LargeDiag_AWPM,
+ SLU::MY_PERMR),
+ pl.getRawPtr());
+#endif
+
/*setStringToIntegralParameter<SLU::rule_t>("ILU_DropRule", "DROP_BASIC",
"Type of dropping strategy to use",
tuple<string>("DROP_BASIC","DROP_PROWS",
diff -ruN trilinos-13.0.0/packages/kokkos/cmake/kokkos_arch.cmake trilinos-13.0.0-new/packages/kokkos/cmake/kokkos_arch.cmake
--- trilinos-13.0.0/packages/kokkos/cmake/kokkos_arch.cmake 2020-08-05 19:22:40.000000000 -0600
+++ trilinos-13.0.0-new/packages/kokkos/cmake/kokkos_arch.cmake 2020-10-31 13:03:07.403677121 -0600
@@ -296,6 +296,14 @@
)
ENDIF()
+# From https://github.com/kokkos/kokkos/pull/2977/commits/9f6f9f8ecd320470d25e0094603c0255ff6afb40
+# Clang needs mcx16 option enabled for Windows atomic functions
+IF (CMAKE_CXX_COMPILER_ID STREQUAL Clang AND WIN32)
+ COMPILER_SPECIFIC_FLAGS(
+ Clang -mcx16
+ )
+ENDIF()
+
#Right now we cannot get the compiler ID when cross-compiling, so just check
#that HIP is enabled
IF (Kokkos_ENABLE_HIP)
diff -ruN trilinos-13.0.0/packages/kokkos/core/src/Kokkos_Macros.hpp trilinos-13.0.0-new/packages/kokkos/core/src/Kokkos_Macros.hpp
--- trilinos-13.0.0/packages/kokkos/core/src/Kokkos_Macros.hpp 2020-08-05 19:22:40.000000000 -0600
+++ trilinos-13.0.0-new/packages/kokkos/core/src/Kokkos_Macros.hpp 2020-10-31 13:03:07.411677379 -0600
@@ -633,8 +633,10 @@
#define KOKKOS_ATTRIBUTE_NODISCARD
#endif
-#if defined(KOKKOS_COMPILER_GNU) || defined(KOKKOS_COMPILER_CLANG) || \
- defined(KOKKOS_COMPILER_INTEL) || defined(KOKKOS_COMPILER_PGI)
+// From https://github.com/kokkos/kokkos/pull/2977/commits/9f6f9f8ecd320470d25e0094603c0255ff6afb40
+#if (defined(KOKKOS_COMPILER_GNU) || defined(KOKKOS_COMPILER_CLANG) || \
+ defined(KOKKOS_COMPILER_INTEL) || defined(KOKKOS_COMPILER_PGI)) && \
+ !defined(KOKKOS_COMPILER_MSVC)
#define KOKKOS_IMPL_ENABLE_STACKTRACE
#define KOKKOS_IMPL_ENABLE_CXXABI
#endif
diff -ruN trilinos-13.0.0/packages/shylu/shylu_node/hts/src/shylu_hts_impl_def.hpp trilinos-13.0.0-new/packages/shylu/shylu_node/hts/src/shylu_hts_impl_def.hpp
--- trilinos-13.0.0/packages/shylu/shylu_node/hts/src/shylu_hts_impl_def.hpp 2020-08-05 19:22:40.000000000 -0600
+++ trilinos-13.0.0-new/packages/shylu/shylu_node/hts/src/shylu_hts_impl_def.hpp 2020-10-31 13:03:07.422677733 -0600
@@ -104,11 +104,11 @@
T* c, blas_int ldc);
extern "C" {
- void F77_BLAS_MANGLE(sgemm,SGEMM)(
+ int F77_BLAS_MANGLE(sgemm,SGEMM)(
const char*, const char*, const blas_int*, const blas_int*, const blas_int*,
const float*, const float*, const blas_int*, const float*, const blas_int*,
const float*, float*, const blas_int*);
- void F77_BLAS_MANGLE(dgemm,DGEMM)(
+ int F77_BLAS_MANGLE(dgemm,DGEMM)(
const char*, const char*, const blas_int*, const blas_int*, const blas_int*,
const double*, const double*, const blas_int*, const double*, const blas_int*,
const double*, double*, const blas_int*);
diff -ruN trilinos-13.0.0/packages/xpetra/sup/Utils/Xpetra_MatrixMatrix.hpp trilinos-13.0.0-new/packages/xpetra/sup/Utils/Xpetra_MatrixMatrix.hpp
--- trilinos-13.0.0/packages/xpetra/sup/Utils/Xpetra_MatrixMatrix.hpp 2020-08-05 19:22:40.000000000 -0600
+++ trilinos-13.0.0-new/packages/xpetra/sup/Utils/Xpetra_MatrixMatrix.hpp 2020-10-31 13:03:07.434678120 -0600
@@ -59,7 +59,7 @@
#include "Xpetra_StridedMapFactory.hpp"
#include "Xpetra_StridedMap.hpp"
-#include <execinfo.h>
+// #include <execinfo.h>
#ifdef HAVE_XPETRA_EPETRA
#include <Xpetra_EpetraCrsMatrix_fwd.hpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcloud.txcorp.com-superlu_dist-parcomm-config.sh
Type: application/x-sh
Size: 983 bytes
Desc: not available
URL: <http://trilinos.org/pipermail/trilinos-users_trilinos.org/attachments/20210406/8f3671a1/attachment.sh>
More information about the Trilinos-Users
mailing list