[Trilinos-Users] Tpetra error during Belos solve with more than 1 MPI process

Rutherford, Joseph M jmruther at illinois.edu
Thu May 26 22:22:33 EDT 2016


Trilinos team:

I'm trying to use Tpetra & Belos to build a distributed parallel iterative solver. My system works well with 1 MPI process using serial or OpenMP Kokkos nodes. However, if I increase my MPI process count, I get the following error message from my executable.  My system has globally distributed rows & RHS and locally replicated columns & unknowns. My read of the situation is that something in the Belos GMRES solve is expecting a differently shaped system.  Example code for the trivial case yielding this error can be easily shared if needed.

Joe

<console>

jmruther at heaviside:~/build/simple_solver$ mpirun -np 2 bin/complex_case.exe 100
terminate called after throwing an instance of 'std::invalid_argument'
  what():  /home/jmruther/simple_solver/externals/trilinos/include/Tpetra_MultiVector_def.hpp:2460:

Throw number = 2

Throw test that evaluated to true: (lclNumRows != A.getLocalLength ())

Tpetra::MultiVector<complex<float>,int,int,Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>>::update: this->getLocalLength() = 100 != A.getLocalLength() = 50.
terminate called after throwing an instance of '[heaviside:35734] *** Process received signal ***
[heaviside:35734] Signal: Aborted (6)
[heaviside:35734] Signal code:  (-6)
std::invalid_argument'
  what():  /home/jmruther/simple_solver/externals/trilinos/include/Tpetra_MultiVector_def.hpp:2460:

Throw number = 2

Throw test that evaluated to true: (lclNumRows != A.getLocalLength ())

Tpetra::MultiVector<complex<float>,int,int,Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>>::update: this->getLocalLength() = 100 != A.getLocalLength() = 50.
[heaviside:35735] *** Process received signal ***
[heaviside:35735] Signal: Aborted (6)
[heaviside:35735] Signal code:  (-6)
[heaviside:35734] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7fda442e0340]
[heaviside:35734] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7fda42d08c49]
[heaviside:35734] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7fda42d0c058]
[heaviside:35734] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155) [0x7fda43613535]
[heaviside:35734] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6) [0x7fda436116d6]
[heaviside:35734] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703) [0x7fda43611703]
[heaviside:35734] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e922) [0x7fda43611922]
[heaviside:35734] [ 7] bin/complex_case.exe(_ZN6Tpetra11MultiVectorISt7complexIfEiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS3_6SerialENS3_9HostSpaceEEELb0EE6updateERKS2_RKS9_SB_+0x256) [0x921c10]
[heaviside:35734] [ 8] bin/complex_case.exe(main+0x5fd) [0x8d48b4]
[heaviside:35734] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fda42cf3f45]
[heaviside:35734] [10] bin/complex_case.exe() [0x8d402f]
[heaviside:35734] *** End of error message ***
[heaviside:35735] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f72ef096340]
[heaviside:35735] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f72edabec49]
[heaviside:35735] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f72edac2058]
[heaviside:35735] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155) [0x7f72ee3c9535]
[heaviside:35735] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6) [0x7f72ee3c76d6]
[heaviside:35735] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703) [0x7f72ee3c7703]
[heaviside:35735] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e922) [0x7f72ee3c7922]
[heaviside:35735] [ 7] bin/complex_case.exe(_ZN6Tpetra11MultiVectorISt7complexIfEiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS3_6SerialENS3_9HostSpaceEEELb0EE6updateERKS2_RKS9_SB_+0x256) [0x921c10]
[heaviside:35735] [ 8] bin/complex_case.exe(main+0x5fd) [0x8d48b4]
[heaviside:35735] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f72edaa9f45]
[heaviside:35735] [10] bin/complex_case.exe() [0x8d402f]
[heaviside:35735] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 35734 on node heaviside exited on signal 6 (Aborted).
--------------------------------------------------------------------------
</console>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20160527/5967c36b/attachment.html>


More information about the Trilinos-Users mailing list