[Trilinos-Users] [EXTERNAL] Segfaults with Tpetra

Bradley, Andrew Michael ambradl at sandia.gov
Tue Jun 9 17:49:44 EDT 2015


Hi Martin,

First, assigning as you do here: 
    m_ifpack_prec = factory.create("ILUT", tpetra_mat);
should be fine, so I don't see anything obviously wrong in the code and description you sent.

Thanks for sending the call stack. According to it, the RCP operator= decrements the reference counter, discovers that the number of strong references is now 0, and so calls the destructor on the preconditioner the RCP currently holds. Then in ~CrsGraph(), one of CrsGraph's RCP members seg faults in delete_obj().

The question now is why. I haven't been able to deduce anything further from the data, so I think I need to ask for more. The first step is to build Trilinos with
     -D CMAKE_BUILD_TYPE:STRING=DEBUG \
I also like
     -D CMAKE_CXX_FLAGS:STRING="-ggdb" \

Second, run the problem again and send the output, if any, from the debug build. I'll particularly be looking for any error messages from Teuchos::RCP.

Third, if a small problem produces the error, run it with valgrind and send the output. Alternatively, run with gdb and send the stack trace from the seg fault.

Another alternative, if it's feasible, is to send a minimum breaking example that I can debug myself.

We'll keep this conversation on the Trilinos-users list in case others think of something while we track down the problem.

Cheers,
Andrew

________________________________________
From: Trilinos-Users <trilinos-users-bounces at trilinos.org> on behalf of Martin Vymazal <martin.vymazal at vki.ac.be>
Sent: Sunday, June 7, 2015 1:06 PM
To: trilinos-users at trilinos.org
Subject: [EXTERNAL] [Trilinos-Users] Segfaults with Tpetra

Hello,

 Tpetra linear solver & preconditioner started segfaulting randomly since I
updated to gcc 5.1 (and with that also to glibc 2.21-4).

I'm getting the following error message:

#0  0x0000000000000001 in ?? ()
#1  0x00007fffe9725d80 in Teuchos::RCPNodeHandle::unbindOne() () from
/data/software/deps/trilinos-12.0.1/lib/libteuchoscore.so.12
#2  0x00007ffff20b2acb in Tpetra::CrsGraph<int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>, false>::~CrsGraph()
() from /data/software/deps/trilinos-12.0.1/lib/libifpack2.so.12
#3  0x00007ffff20b2dce in Teuchos::RCPNodeTmpl<Tpetra::CrsGraph<int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>, false>,
Teuchos::DeallocDelete<Tpetra::CrsGraph<int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>, false> >
>::delete_obj() () from
/data/software/deps/trilinos-12.0.1/lib/libifpack2.so.12
#4  0x00007fffe9725d80 in Teuchos::RCPNodeHandle::unbindOne() () from
/data/software/deps/trilinos-12.0.1/lib/libteuchoscore.so.12
#5  0x00007ffff20a492c in Tpetra::CrsMatrix<double, int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>, false>::~CrsMatrix()
() from /data/software/deps/trilinos-12.0.1/lib/libifpack2.so.12
#6  0x00007ffff20a4a89 in Tpetra::CrsMatrix<double, int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>, false>::~CrsMatrix()
() from /data/software/deps/trilinos-12.0.1/lib/libifpack2.so.12
#7  0x00007fffe9725d80 in Teuchos::RCPNodeHandle::unbindOne() () from
/data/software/deps/trilinos-12.0.1/lib/libteuchoscore.so.12
#8  0x00007ffff2087bda in Ifpack2::ILUT<Tpetra::CrsMatrix<double, int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>, false> >::~ILUT() ()
from /data/software/deps/trilinos-12.0.1/lib/libifpack2.so.12
#9  0x00007ffff2087cc9 in Ifpack2::ILUT<Tpetra::CrsMatrix<double, int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>, false> >::~ILUT() ()
from /data/software/deps/trilinos-12.0.1/lib/libifpack2.so.12
#10 0x00007fffe9725d80 in Teuchos::RCPNodeHandle::unbindOne() () from
/data/software/deps/trilinos-12.0.1/lib/libteuchoscore.so.12
#11 0x00007ffff6ebb6ce in unbind (this=<optimized out>) at
/data/software/deps/trilinos-12.0.1/lib/cmake/Trilinos/../../../include/Teuchos_RCPNode.hpp:959
#12 ~RCPNodeHandle (this=<optimized out>) at
/data/software/deps/trilinos-12.0.1/lib/cmake/Trilinos/../../../include/Teuchos_RCPNode.hpp:784
#13 ~RCP (this=<optimized out>) at
/data/software/deps/trilinos-12.0.1/lib/cmake/Trilinos/../../../include/Teuchos_RCP.hpp:296
#14 operator= (this=<optimized out>, r_ptr=...) at
/data/software/deps/trilinos-12.0.1/lib/cmake/Trilinos/../../../include/Teuchos_RCP.hpp:308
#15 pdekit::ls::LSTpetra<double, int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> >::configure
(this=0x9e15a8, matrix=..., rhs=..., x=..., print_output=<optimized out>)
    at src/linear_system/LSTpetra.hpp:198

LSTpetra is my own class and on line 198: I have this:

 m_ifpack_prec = factory.create("ILUT", tpetra_mat);

where m_ifpack_prec is a member variable of type
Teuchos::RCP<Ifpack_prec_type>

with Ifpack_prec_type being a typedef for
Ifpack2::Preconditioner<Scalar, LocalOrdinal, GlobalOrdinal, Node>

tpetra_mat is a variable of type Teuchos::RCP<trilinos_matrix_type const>,
where 'trilinos_matrix_type' is Tpetra::CrsMatrix<Scalar, LocalOrdinal,
GlobalOrdinal, Node>

I did not have this problem until the new glibc (I usually compile with clang
anyway), but it randomly appears regardless of whether I use gcc or clang
(seems to be more frequent with gcc). Am I doing something wrong?
I'm also not sure it's correct to assign to the Teuchos::RCP pointer holding
the preconditioner every time I solve my linear system (i.e at every
iteration).

Do you see by any chance from the error output where's the problem?

Thank you,

 Martin Vymazal

_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at trilinos.org
https://trilinos.org/mailman/listinfo/trilinos-users


More information about the Trilinos-Users mailing list