[Trilinos-Users] [EXTERNAL] coupling Kokkos and Trilinos

BLOCH Helene helene.bloch at cea.fr
Wed Jun 26 08:05:01 EDT 2019

Hi Siva,

An archive file with a simple example to reproduce the bug is attached. Once the archive is unpacked, the code can be compiled and run with the following commands:
cd poisson2d
mkdir build
cd build
cmake ..
mpirun -np 1 ./poisson2d ../inputs.yml

If you change the resolution in the input file, you may need to update the file read_output.py

Here is the output we get from MueLu:

number of equations = 1
smoother: type = CHEBYSHEV
multigrid algorithm = sa
verbosity = extreme
problem: type = Poisson-2D
coarse: max size = 2000   [default]
max levels = 10   [default]
rap: algorithm = galerkin   [default]

A0 size =  10000 x 10000, nnz = 50000
A0 Load balancing info
A0   # active processes: 1/1
A0   # rows per proc   : avg = 1.00e+04,  dev =   0.0%,  min =   +0.0%,  max =   +0.0%
A0   #  nnz per proc   : avg = 5.00e+04,  dev =   0.0%,  min =   +0.0%,  max =   +0.0%
Clearing old data (if any)
MueLu::Amesos2Smoother: using "Klu"
MueLu::AmesosSmoother: using "Klu"
Using default factory (MueLu::SmootherFactory{pre = MueLu::DirectSolver{type = }, post = null}) for building 'CoarseSolver'.
Using default factory (MueLu::AmalgamationFactory) for building 'UnAmalgamationInfo'.
Level 0
 Setup Smoother (MueLu::Ifpack2Smoother{type = CHEBYSHEV})
  chebyshev: ratio eigenvalue (computed) = 20
  Preconditioner init
  Preconditioner compute
  chebyshev: max eigenvalue (calculated by Ifpack2) = 1.89765
  "Ifpack2::Chebyshev": {Initialized: true, Computed: true, "Ifpack2::Details::Chebyshev":{degree: 1, lambdaMax: 1.89765, alpha: 20, lambdaMin: 0.0948825, boost factor: 1.1}, Global matrix dimensions: [10000, 10000], Global nnz: 50000}
MueLu::Amesos2Smoother: using "Klu"
MueLu::AmesosSmoother: using "Klu"
Using default factory (MueLu::SmootherFactory{pre = MueLu::DirectSolver{type = }, post = null}) for building 'CoarseSolver'.
Using default factory (MueLu::AmalgamationFactory) for building 'UnAmalgamationInfo'.
Level 1
 Prolongator smoothing (MueLu::SaPFactory_kokkos)
  Kokkos::OpenMP thread_pool_topology[ 1 x 8 x 1 ]
  Build (MueLu::CoalesceDropFactory_kokkos)
   Build (MueLu::AmalgamationFactory)
    AmalagamationFactory::Build(): found fullblocksize=1 and stridedblocksize=1 from strided maps. offset=0
   ******* WARNING *******
   lightweight wrap is deprecated
   algorithm = "classical": threshold = 0, blocksize = 1
   Detected 0 Dirichlet nodes
  Build (MueLu::TentativePFactory_kokkos)
   Build (MueLu::UncoupledAggregationFactory_kokkos)
    Algo "Phase - (Dirichlet)"
     BuildAggregates (Phase - (Dirichlet))
       aggregated : 0 (phase), 0/10000 [0.00%] (total)
       remaining  : 10000
       aggregates : 0 (phase), 0 (total)
    Algo "Phase 1 (main)"
     BuildAggregates (Phase 1 (main))
       aggregated : 7672 (phase), 7672/10000 [76.72%] (total)
       remaining  : 2328
       aggregates : 1311 (phase), 1311 (total)
    Algo "Phase 2a (secondary)"
     BuildAggregates (Phase 2a (secondary))
       aggregated : 0 (phase), 7672/10000 [76.72%] (total)
       remaining  : 2328
       aggregates : 0 (phase), 1311 (total)
    Algo "Phase 2b (expansion)"
     BuildAggregates (Phase 2b (expansion))
       aggregated : 2328 (phase), 10000/10000 [100.00%] (total)
       remaining  : 0
       aggregates : 0 (phase), 1311 (total)
    Algo "Phase 3 (cleanup)"
     BuildAggregates (Phase 3 (cleanup))
       aggregated : 0 (phase), 10000/10000 [100.00%] (total)
       remaining  : 0
       aggregates : 0 (phase), 1311 (total)
    "UC": MueLu::Aggregates_kokkos{nGlobalAggregates = 1311}
   Nullspace factory (MueLu::NullspaceFactory_kokkos)
    Generating canonical nullspace: dimension = 1
   Build (MueLu::CoarseMapFactory_kokkos)
   Get Aggregates graph
   Check good map
terminate called after throwing an instance of 'MueLu::Exceptions::RuntimeError'
  what():  /local/home/hbloch/trilinos-12-13/Trilinos/packages/muelu/src/Transfers/Smoothed-Aggregation/MueLu_TentativePFactory_kokkos_def.hpp:589:

Throw number = 1

Throw test that evaluated to true: !goodMap

MueLu: TentativePFactory_kokkos: for now works only with good maps (i.e. "matching" row and column maps)
[mdlspc114:32452] *** Process received signal ***
[mdlspc114:32452] Signal: Aborted (6)
[mdlspc114:32452] Signal code:  (-6)
[mdlspc114:32452] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f7e95e7c890]
[mdlspc114:32452] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f7e95ab7e97]
[mdlspc114:32452] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f7e95ab9801]
[mdlspc114:32452] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c957)[0x7f7e968fa957]
[mdlspc114:32452] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ab6)[0x7f7e96900ab6]
[mdlspc114:32452] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92af1)[0x7f7e96900af1]
[mdlspc114:32452] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92d24)[0x7f7e96900d24]
[mdlspc114:32452] [ 7] ./poisson2d(_ZNK5MueLu24TentativePFactory_kokkosIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEE15BuildPuncoupledERNS_5LevelEN7Teuchos3RCPIN6Xpetra6MatrixIdiiS6_EEEENSB_INS_17Aggregates_kokkosIiiS6_EEEENSB_INS_16AmalgamationInfoIiiS6_EEEENSB_INSC_11MultiVectorIdiiS6_EEEENSB_IKNSC_3MapIiiS6_EEEERSF_RSO_i+0x661e)[0x55eb424ef08e]
[mdlspc114:32452] [ 8] ./poisson2d(_ZNK5MueLu24TentativePFactory_kokkosIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEE6BuildPERNS_5LevelES9_+0x17e4)[0x55eb424f0994]
[mdlspc114:32452] [ 9] ./poisson2d(_ZNK5MueLu19TwoLevelFactoryBase9CallBuildERNS_5LevelE+0x1f8)[0x55eb41ecb088]
[mdlspc114:32452] [10] ./poisson2d(_ZN5MueLu5Level3GetIN7Teuchos3RCPIN6Xpetra6MatrixIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS6_6OpenMPENS6_9HostSpaceEEEEEEEEERT_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKNS_11FactoryBaseE+0x83)[0x55eb41e03023]
[mdlspc114:32452] [11] ./poisson2d(_ZNK5MueLu17SaPFactory_kokkosIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEE6BuildPERNS_5LevelES9_+0x1de)[0x55eb424815be]
[mdlspc114:32452] [12] ./poisson2d(_ZNK5MueLu19TwoLevelFactoryBase9CallBuildERNS_5LevelE+0x1f8)[0x55eb41ecb088]
[mdlspc114:32452] [13] ./poisson2d(_ZN5MueLu5Level3GetIN7Teuchos3RCPIN6Xpetra8OperatorIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS6_6OpenMPENS6_9HostSpaceEEEEEEEEERT_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKNS_11FactoryBaseE+0x83)[0x55eb41e00cb3]
[mdlspc114:32452] [14] ./poisson2d(_ZNK5MueLu13TopRAPFactoryIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEE5BuildERNS_5LevelES9_+0x85)[0x55eb42504205]
[mdlspc114:32452] [15] ./poisson2d(_ZN5MueLu9HierarchyIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEE5SetupEiN7Teuchos3RCPIKNS_18FactoryManagerBaseEEESC_SC_+0xb7a)[0x55eb422d06ca]
[mdlspc114:32452] [16] ./poisson2d(_ZNK5MueLu16HierarchyManagerIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEE14SetupHierarchyERNS_9HierarchyIdiiS6_EE+0x97e)[0x55eb41e3658e]
[mdlspc114:32452] [17] ./poisson2d(_ZN5MueLu26CreateXpetraPreconditionerIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEEN7Teuchos3RCPINS_9HierarchyIT_T0_T1_T2_EEEENS8_IN6Xpetra6MatrixISA_SB_SC_SD_EEEERKNS7_13ParameterListESM_+0x3fb)[0x55eb41e375cb]
[mdlspc114:32452] [18] ./poisson2d(_ZN5MueLu26CreateTpetraPreconditionerIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEEN7Teuchos3RCPINS_14TpetraOperatorIT_T0_T1_T2_EEEERKNS8_IN6Tpetra7Classes8OperatorISA_SB_SC_SD_EEEERNS7_13ParameterListESO_+0x43f)[0x55eb41e37ecf]
[mdlspc114:32452] [19] ./poisson2d(_ZN5MueLu26CreateTpetraPreconditionerIdiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS1_6OpenMPENS1_9HostSpaceEEEEEN7Teuchos3RCPINS_14TpetraOperatorIT_T0_T1_T2_EEEERKNS8_IN6Tpetra7Classes8OperatorISA_SB_SC_SD_EEEERNS7_13ParameterListERKNS8_INSH_11MultiVectorIdSB_SC_SD_EEEERKNS8_INSP_ISA_SB_SC_SD_EEEE+0x1ce)[0x55eb41e38cfe]
[mdlspc114:32452] [20] ./poisson2d(main+0x2472)[0x55eb41d7b832]
[mdlspc114:32452] [21] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f7e95a9ab97]
[mdlspc114:32452] [22] ./poisson2d(_start+0x2a)[0x55eb41da09ca]
[mdlspc114:32452] *** End of error message ***
mpirun noticed that process rank 0 with PID 0 on node mdlspc114 exited on signal 6 (Aborted).


De : Rajamanickam, Sivasankaran [srajama at sandia.gov]
Envoyé : mardi 25 juin 2019 20:32
À : BLOCH Helene; trilinos-users at trilinos.org
Objet : Re: [EXTERNAL] [Trilinos-Users] coupling Kokkos and Trilinos

Hi Helene,
  Your use case is common in several Trilinos applications. At the high level, this *should* work.

  It is better to give the full error on what you are seeing and a simple example to reproduce it. Is it possible to do that ?


From: BLOCH Helene <helene.bloch at cea.fr>
Sent: Tuesday, June 25, 2019 6:43 AM
To: Rajamanickam, Sivasankaran; trilinos-users at trilinos.org
Subject: RE:[EXTERNAL] [Trilinos-Users] coupling Kokkos and Trilinos

Hi Siva,

Thank you for your answer

1. We want to update only the values of the matrix, its structure does not change. We create the matrix with a const CrsGraph. Instead of using "replaceGlobalValues", we would like to use "replaceValues" from KokkosSparse.

2.  We can solve the linear system if we use a preconditioner from Ifpack2. If the matrix is not updated correctly, then Belos crashes because of NaN.

We are only using two maps because it is the only way we found to deal with our MPI domain decomposition, but we are open to any suggestion.


De : Rajamanickam, Sivasankaran [srajama at sandia.gov]
Envoyé : lundi 24 juin 2019 17:55
À : BLOCH Helene; trilinos-users at trilinos.org
Objet : Re: [EXTERNAL] [Trilinos-Users] coupling Kokkos and Trilinos

Hi Helene
  Can you answer these questions ?

  1. What do you mean by update the matrix. It appears all you need to update is the values of the matrix. Is this correct ?
   2. Were you able to run one solve without update correctly ?

I ask these because it appears the error is pointing to something in your maps. This may not be a device update issue.


From: Trilinos-Users <trilinos-users-bounces at trilinos.org> on behalf of BLOCH Helene <helene.bloch at cea.fr>
Sent: Monday, June 24, 2019 2:51 AM
To: trilinos-users at trilinos.org
Subject: [EXTERNAL] [Trilinos-Users] coupling Kokkos and Trilinos


We are using Kokkos and Trilinos, especially the packages Tpetra, Belos and MueLu, and we have encountered some difficulties to update the matrix on the device.

 We are currently trying to couple Trilinos with an existing code parallelized with Kokkos and MPI. This code implements an explicit finite volume scheme on a fixed Cartesian grid, with an MPI domain decomposition with ghost cells. We are trying to add an implicit solver, for the finite volume scheme, using backward Euler time integration. We are solving a sparse matrix, using Tpetra, with a linear solver from Belos, with an AMG preconditioner from MueLu. At each iteration of a time loop, we have to
- call the explicit solver
- update the matrix with data coming from the explicit solver
- solve the linear system
- update the solution of the explicit solver with the solution of the implicit solver
As the data coming from the explicit solver lies on the device, it would be great to update the matrix on the device, using methods from KokkosSparse::CrsMatrix. Because we already have a domain decomposition with ghost cells, the matrix has a row map and a column map that are not the same. A first test is performed on the 2D Poisson equation, MueLu gives the following error:
Throw test that evaluated to true: !goodMap
MueLu: TentativePFactory_kokkos: for now works only with good maps (i.e. "matching" row and column maps)
Here are our questions:
- did we miss some examples on the Trilinos wiki to update the matrix on the device ?
- assuming the only way to deal with the domain decomposition is to use two different maps, is there a way to still use MueLu as a preconditioner ?

Thank you for your help

Hélène Bloch

