[Trilinos-Users] Trilinos/Zoltan

Sam Naboulsi Sam.Naboulsi at hotmail.com
Fri Jun 8 11:55:19 EDT 2018


Greeting,

I installed Trilinos/Zoltan package on an SGI HPC system. I had no issues. Only 8 tests failed out of the 1100+ test.
The Trilinos Config file used in the installation attached below. I am using openmpi2.1.1, which I also tested using a simple parallel math
code for both single node and across nodes. It works and scales ok.

The issue is I am getting errors from Trilinos when I run another Sandia code called Peridigm, which requires Trillions libs. I could run Peridigm ok
in serial. But when using multiple nodes, I get errors mainly in Zoltan (please see the attached error below).

Could you please help and advise what is causing the error? and how to fix it?

Thank you and best regards
Sam



********************Trilinos Config File ********************************************************


cmake -D CMAKE_INSTALL_PREFIX:PATH=/INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source_builds  \

-D Trilinos_ENABLE_ALL_PACKAGES=OFF \
-D CMAKE_CXX_COMPILER:FILEPATH=/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build/bin/mpicxx \
-D CMAKE_C_COMPILER:FILEPATH=/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build/bin/mpicc \
-D CMAKE_Fortran_COMPILER:FILEPATH=/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build/bin/mpif90 \
-D CMAKE_C_FLAGS:STRING="-O2 -std=c++11 -pedantic -ftrapv -Wall -Wno-long-long -lgfortran" \
-D CMAKE_CXX_FLAGS:STRING="-O2 -std=c++11 -pedantic -ftrapv -Wall -Wno-long-long -lgfortran" \
-D CMAKE_Fortran_FLAGS:STRING="-O2 -lgfortran" \
-D TPL_ENABLE_MPI:BOOL=ON \
-D MPI_BASE_DIR:PATH="/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build" \
-D MPI_BIN_DIR:PATH="/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build/bin" \
-D MPI_EXEC:FILEPATH="/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build/bin/mpiexec" \
-D MPI_Fortran_COMPILER:FILEPATH="/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build/bin/mpif90" \
-D MPI_CXX_COMPILER:FILEPATH="/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build/bin/mpic++" \
-D MPI_C_COMPILER:FILEPATH="/INST/P/TRILINOS/GAug2017/openmpi-2.1.1_build/bin/mpicc" \
-D TPL_ENABLE_BLAS:BOOL=ON \
-D BLAS_LIBRARY_DIRS:PATH=/p/home/apps/COST/lapack/3.5.0/gnu/lib \
-D TPL_ENABLE_LAPACK:BOOL=ON \
-D LAPACK_LIBRARY_DIRS:PATH=/p/home/apps/COST/lapack/3.5.0/gnu/lib \
-D TPL_ENABLE_Boost:BOOL=ON \
-D Boost_INCLUDE_DIRS:FILEPATH=/INST/P/TRILINOS/GAug2017/boost_1_64_0/build/include \
-D Boost_LIBRARY_DIRS:FILEPATH=/INST/P/TRILINOS/GAug2017/boost_1_64_0/build/lib \
-D TPL_ENABLE_HDF5:BOOL=ON \
-D HDF5_INCLUDE_DIRS:FILEPATH="/INST/P/TRILINOS/GAug2017/hdf5-1.8.16/parallel_HDF5V1p8p16/include" \
-D HDF5_LIBRARY_DIRS:FILEPATH="/INST/P/TRILINOS/GAug2017/hdf5-1.8.16/parallel_HDF5V1p8p16/lib" \
-D TPL_ENABLE_Netcdf:BOOL=ON \
-D Netcdf_INCLUDE_DIRS:FILEPATH="/INST/P/TRILINOS/GAug2017/netcdf-c-4.3.3.1/parallel_netcdf-c/include" \
-D Netcdf_LIBRARY_DIRS:FILEPATH="/INST/P/TRILINOS/GAug2017/netcdf-c-4.3.3.1/parallel_netcdf-c/lib" \
-D CMAKE_BUILD_TYPE:STRING=RELEASE \
-D Trilinos_WARNINGS_AS_ERRORS_FLAGS:STRING="" \
-D Trilinos_ENABLE_ALL_PACKAGES:BOOL=OFF \
-D Trilinos_ENABLE_Teuchos:BOOL=ON \
-D Trilinos_ENABLE_Shards:BOOL=ON \
-D Trilinos_ENABLE_Sacado:BOOL=ON \
-D Trilinos_ENABLE_Epetra:BOOL=ON \
-D Trilinos_ENABLE_EpetraExt:BOOL=ON \
-D Trilinos_ENABLE_Ifpack:BOOL=ON \
-D Trilinos_ENABLE_AztecOO:BOOL=ON \
-D Trilinos_ENABLE_Amesos:BOOL=ON \
-D Trilinos_ENABLE_Anasazi:BOOL=ON \
-D Trilinos_ENABLE_Belos:BOOL=ON \
-D Trilinos_ENABLE_ML:BOOL=ON \
-D Trilinos_ENABLE_Phalanx:BOOL=ON \
-D Trilinos_ENABLE_Intrepid:BOOL=ON \
-D Trilinos_ENABLE_NOX:BOOL=ON \
-D Trilinos_ENABLE_Stratimikos:BOOL=ON \
-D Trilinos_ENABLE_Thyra:BOOL=ON \
-D Trilinos_ENABLE_Rythmos:BOOL=ON \
-D Trilinos_ENABLE_MOOCHO:BOOL=ON \
-D Trilinos_ENABLE_TriKota:BOOL=OFF \
-D Trilinos_ENABLE_Stokhos:BOOL=ON \
-D Trilinos_ENABLE_Zoltan:BOOL=ON \
-D Trilinos_ENABLE_Piro:BOOL=ON \
-D Trilinos_ENABLE_Teko:BOOL=ON \
-D Trilinos_ENABLE_SEACASIoss:BOOL=ON \
-D Trilinos_ENABLE_SEACAS:BOOL=ON \
-D Trilinos_ENABLE_SEACASBlot:BOOL=ON \
-D Trilinos_ENABLE_Pamgen:BOOL=ON \
-D Trilinos_ENABLE_EXAMPLES:BOOL=OFF \
-D Trilinos_ENABLE_TESTS:BOOL=ON \
-D CMAKE_VERBOSE_MAKEFILE:BOOL=OFF \
-D Trilinos_VERBOSE_CONFIGURE:BOOL=OFF \
-D TPL_ENABLE_Matio:BOOL=OFF \
/INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source

************************************************************ERRORS*******************************************************
mpirun -np 51 /INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/Peridigm_builds/bin/Peridigm fragmenting_cylinder.peridigm > sh_out_p5_40NDZ_CMC_i5cor
+ mpirun -np 51 /INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/Peridigm_builds/bin/Peridigm fragmenting_cylinder.peridigm
[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 211 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 238 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_LB (line 582 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_balance.c):  Error building return arguments; -1 returned by Zoltan_Compute_Destinations

[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 211 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 238 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_LB (line 582 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_balance.c):  Error building return arguments; -1 returned by Zoltan_Compute_Destinations

[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 211 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 238 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_LB (line 582 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_balance.c):  Error building return arguments; -1 returned by Zoltan_Compute_Destinations

[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 211 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 238 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_LB (line 582 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_balance.c):  Error building return arguments; -1 returned by Zoltan_Compute_Destinations

[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 211 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 238 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_LB (line 582 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_balance.c):  Error building return arguments; -1 returned by Zoltan_Compute_Destinations

[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 211 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 238 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_LB (line 582 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_balance.c):  Error building return arguments; -1 returned by Zoltan_Compute_Destinations

[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 211 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_Comm_Do_Post (line 180 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/Utilities/Communication/comm_do.c):  nrecvs not zero, but recv_data = NULL
[0] Zoltan ERROR in Zoltan_Invert_Lists (line 238 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_invert.c):  Error ZOLTAN_FATAL returned from Zoltan_Comm_Do.
[0] Zoltan ERROR in Zoltan_LB (line 582 of /INST/P/TRILINOS/GAug2017/trilinos-12.10.1-Source/packages/zoltan/src/lb/lb_balance.c):  Error building return arguments; -1 returned by Zoltan_Compute_Destinations

[r7i4n13:44965] 37 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs
[r7i4n13:44965] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages


***************************************************ERRORS in output file *******************************************************************

> cat sh_out_p5_40NDZ_CMC_i5cor
--------------------------------------------------------------------------
WARNING: Open MPI will create a shared memory backing file in a
directory that appears to be mounted on a network filesystem.
Creating the shared memory backup file on a network file system, such
as NFS or Lustre is not recommended -- it may cause excessive network
traffic to your file servers and/or cause shared memory traffic in
Open MPI to be much slower than expected.

You may want to check what the typical temporary directory is on your
node.  Possible sources of the location of this temporary directory
include the $TEMPDIR, $TEMP, and $TMP environment variables.

Note, too, that system administrators can set a list of filesystems
where Open MPI is disallowed from creating temporary files by setting
the MCA parameter "orte_no_session_dir".

  Local host: r7i4n13
  Filename:   /workspace/openmpi-sessions-1194 at r7i4n13_0/16035/1/3/vader_segment.r7i4n13.3

You can set the MCA paramter shmem_mmap_enable_nfs_warning to 0 to
disable this message.
--------------------------------------------------------------------------

-- Peridigm
-- version 1.5.0 (Dev)

MPI initialized on 51 processors.

WARNING!! Peridigms text file input is deprecated and will be
 removed in a future version.  You may consider installing Trilinos
 with the optional TPL_ENABLE_yaml-cpp and convert the *.peridigm input
 to *.yaml, for a similar markup.  Otherwise use the XML input format.


/INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/src/io/Peridigm_ZoltanSearchTree.cpp:122:

Throw number = 1

Throw test that evaluated to true: ierr != 0

Error in ZoltanSearchTree::ZoltanSearchTree(), call to Zoltan_LB_Partition() returned a nonzero error code.
/INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/src/io/Peridigm_ZoltanSearchTree.cpp:122:

Throw number = 1

Throw test that evaluated to true: ierr != 0

Error in ZoltanSearchTree::ZoltanSearchTree(), call to Zoltan_LB_Partition() returned a nonzero error code.
/INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/src/io/Peridigm_ZoltanSearchTree.cpp:122:

Throw number = 1

Throw test that evaluated to true: ierr != 0

Error in ZoltanSearchTree::ZoltanSearchTree(), call to Zoltan_LB_Partition() returned a nonzero error code.
/INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/src/io/Peridigm_ZoltanSearchTree.cpp:122:

Throw number = 1

Throw test that evaluated to true: ierr != 0

Error in ZoltanSearchTree::ZoltanSearchTree(), call to Zoltan_LB_Partition() returned a nonzero error code.
/INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/src/io/Peridigm_ZoltanSearchTree.cpp:122:

Throw number = 1

Throw test that evaluated to true: ierr != 0

Error in ZoltanSearchTree::ZoltanSearchTree(), call to Zoltan_LB_Partition() returned a nonzero error code.
/INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/src/io/Peridigm_ZoltanSearchTree.cpp:122:

Throw number = 1

Throw test that evaluated to true: ierr != 0

Error in ZoltanSearchTree::ZoltanSearchTree(), call to Zoltan_LB_Partition() returned a nonzero error code.
/INST/P/TRILINOS/GAug2017/peridigm-master-Nov17/src/io/Peridigm_ZoltanSearchTree.cpp:122:

Throw number = 1

Throw test that evaluated to true: ierr != 0

Error in ZoltanSearchTree::ZoltanSearchTree(), call to Zoltan_LB_Partition() returned a nonzero error code.


******************************************************************************************************************




































































































































































































































































































































































































































































































































































































































-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20180608/e1a12b98/attachment.html>


More information about the Trilinos-Users mailing list