[Trilinos-Users] Tpetra error during Belos solve with more than 1 MPI process

Alicia Klinvex aklinvex at gmail.com
Fri May 27 10:12:16 EDT 2016


Hello Joe,

I suspect it's an issue with the maps you're providing to the Tpetra
objects.  I'd be happy to take a look at your driver if you like.

Best wishes,
Alicia

On Thu, May 26, 2016 at 8:22 PM, Rutherford, Joseph M <jmruther at illinois.edu
> wrote:

> Trilinos team:
>
>
>
> I’m trying to use Tpetra & Belos to build a distributed parallel iterative
> solver. My system works well with 1 MPI process using serial or OpenMP
> Kokkos nodes. However, if I increase my MPI process count, I get the
> following error message from my executable.  My system has globally
> distributed rows & RHS and locally replicated columns & unknowns. My read
> of the situation is that something in the Belos GMRES solve is expecting a
> differently shaped system.  Example code for the trivial case yielding this
> error can be easily shared if needed.
>
>
>
> Joe
>
>
>
> <console>
>
>
>
> jmruther at heaviside:~/build/simple_solver$ mpirun -np 2
> bin/complex_case.exe 100
>
> terminate called after throwing an instance of 'std::invalid_argument'
>
>   what():
> /home/jmruther/simple_solver/externals/trilinos/include/Tpetra_MultiVector_def.hpp:2460:
>
>
>
> Throw number = 2
>
>
>
> Throw test that evaluated to true: (lclNumRows != A.getLocalLength ())
>
>
>
> Tpetra::MultiVector<complex<float>,int,int,Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial,
> Kokkos::HostSpace>>::update: this->getLocalLength() = 100 !=
> A.getLocalLength() = 50.
>
> terminate called after throwing an instance of '[heaviside:35734] ***
> Process received signal ***
>
> [heaviside:35734] Signal: Aborted (6)
>
> [heaviside:35734] Signal code:  (-6)
>
> std::invalid_argument'
>
>   what():
> /home/jmruther/simple_solver/externals/trilinos/include/Tpetra_MultiVector_def.hpp:2460:
>
>
>
> Throw number = 2
>
>
>
> Throw test that evaluated to true: (lclNumRows != A.getLocalLength ())
>
>
>
> Tpetra::MultiVector<complex<float>,int,int,Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial,
> Kokkos::HostSpace>>::update: this->getLocalLength() = 100 !=
> A.getLocalLength() = 50.
>
> [heaviside:35735] *** Process received signal ***
>
> [heaviside:35735] Signal: Aborted (6)
>
> [heaviside:35735] Signal code:  (-6)
>
> [heaviside:35734] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)
> [0x7fda442e0340]
>
> [heaviside:35734] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)
> [0x7fda42d08c49]
>
> [heaviside:35734] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)
> [0x7fda42d0c058]
>
> [heaviside:35734] [ 3]
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155)
> [0x7fda43613535]
>
> [heaviside:35734] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6)
> [0x7fda436116d6]
>
> [heaviside:35734] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703)
> [0x7fda43611703]
>
> [heaviside:35734] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e922)
> [0x7fda43611922]
>
> [heaviside:35734] [ 7]
> bin/complex_case.exe(_ZN6Tpetra11MultiVectorISt7complexIfEiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS3_6SerialENS3_9HostSpaceEEELb0EE6updateERKS2_RKS9_SB_+0x256)
> [0x921c10]
>
> [heaviside:35734] [ 8] bin/complex_case.exe(main+0x5fd) [0x8d48b4]
>
> [heaviside:35734] [ 9]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fda42cf3f45]
>
> [heaviside:35734] [10] bin/complex_case.exe() [0x8d402f]
>
> [heaviside:35734] *** End of error message ***
>
> [heaviside:35735] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)
> [0x7f72ef096340]
>
> [heaviside:35735] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)
> [0x7f72edabec49]
>
> [heaviside:35735] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)
> [0x7f72edac2058]
>
> [heaviside:35735] [ 3]
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x155)
> [0x7f72ee3c9535]
>
> [heaviside:35735] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6)
> [0x7f72ee3c76d6]
>
> [heaviside:35735] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703)
> [0x7f72ee3c7703]
>
> [heaviside:35735] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e922)
> [0x7f72ee3c7922]
>
> [heaviside:35735] [ 7]
> bin/complex_case.exe(_ZN6Tpetra11MultiVectorISt7complexIfEiiN6Kokkos6Compat23KokkosDeviceWrapperNodeINS3_6SerialENS3_9HostSpaceEEELb0EE6updateERKS2_RKS9_SB_+0x256)
> [0x921c10]
>
> [heaviside:35735] [ 8] bin/complex_case.exe(main+0x5fd) [0x8d48b4]
>
> [heaviside:35735] [ 9]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f72edaa9f45]
>
> [heaviside:35735] [10] bin/complex_case.exe() [0x8d402f]
>
> [heaviside:35735] *** End of error message ***
>
> --------------------------------------------------------------------------
>
> mpirun noticed that process rank 0 with PID 35734 on node heaviside exited
> on signal 6 (Aborted).
>
> --------------------------------------------------------------------------
>
> </console>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at trilinos.org
> https://trilinos.org/mailman/listinfo/trilinos-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20160527/23355438/attachment.html>


More information about the Trilinos-Users mailing list