[Trilinos-Users] Kokkos::HostSpace destroyed with memory leaks
Martin Vymazal
martin.vymazal at vki.ac.be
Thu Apr 30 10:27:18 EDT 2015
Hello,
I think I have a similar problem.
When my code is finishing, I see a lot of this:
Kokkos::HostSpace destroyed with memory leaks: { "Tpetra::CrsGraph::ind"
count(3) memory[ 0x155da00 + 3268 ]
{ "DualView::modified_host" count(1) memory[ 0x155ec00 + 4 ]
{ "DualView::modified_host" count(1) memory[ 0x1564400 + 4 ]
{ "DualView::modified_device" count(1) memory[ 0x1564780 + 4 ]
{ "DualView::modified_device" count(1) memory[ 0x156ae00 + 4 ]
{ "DualView::modified_device" count(1) memory[ 0x156af00 + 4 ]
{ "DualView::modified_device" count(2) memory[ 0x1576e00 + 4 ]
{ "DualView::modified_device" count(1) memory[ 0x1577880 + 4 ]
{ "DualView::modified_device" count(2) memory[ 0x1577d00 + 4 ]
{ "DualView::modified_host" count(2) memory[ 0x1578100 + 4 ]
{ "MV::DualView" count(4) memory[ 0x1578300 + 2352 ]
{ "DualView::modified_host" count(2) memory[ 0x1578c80 + 4 ]
{ "MV::DualView" count(4) memory[ 0x1579200 + 2352 ]
{ "Tpetra::CrsGraph::ptr" count(3) memory[ 0x1579c80 + 2360 ]
{ "DualView::modified_host" count(1) memory[ 0x157f280 + 4 ]
{ "Tpetra::CrsGraph::ind" count(3) memory[ 0x1586200 + 4444 ]
{ "Tpetra::CrsGraph::ind" count(3) memory[ 0x1589380 + 3268 ]
{ "DualView::modified_device" count(1) memory[ 0x158a200 + 4 ]
{ "DualView::modified_device" count(1) memory[ 0x158c500 + 4 ]
{ "DualView::modified_device" count(1) memory[ 0x158c680 + 4 ]
{ "DualView::modified_device" count(1) memory[ 0x158c880 + 4 ]
{ "DualView::modified_host" count(1) memory[ 0x158c900 + 4 ]
{ "Tpetra::CrsGraph::ind" count(3) memory[ 0x159c400 + 7712 ]
{ "Tpetra::CrsMatrix::val" count(2) memory[ 0x159e280 + 15424 ]
{ "DualView::modified_device" count(1) memory[ 0x15a4c80 + 4 ]
{ "DualView::modified_host" count(1) memory[ 0x15a4e00 + 4 ]
{ "MV::DualView" count(4) memory[ 0x15a5680 + 2352 ]
{ "DualView::modified_device" count(2) memory[ 0x15a6180 + 4 ]
{ "DualView::modified_host" count(2) memory[ 0x15a6300 + 4 ]
{ "MV::DualView" count(4) memory[ 0x15a6680 + 2352 ]
{ "DualView::modified_device" count(2) memory[ 0x15a7100 + 4 ]
{ "DualView::modified_host" count(2) memory[ 0x15a7280 + 4 ]
{ "DualView::modified_device" count(2) memory[ 0x15b3080 + 4 ]
{ "Tpetra::CrsGraph::ptr" count(3) memory[ 0x15b4700 + 2360 ]
Gdb does not complain, but valgrind prints out a lot of messages such as:
==20250== 160 bytes in 1 blocks are definitely lost in loss record 509 of 874
==20250== at 0x4C2A4F0: operator new(unsigned long) (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20250== by 0x1622C269:
Kokkos::Impl::host_allocate_not_thread_safe(std::string const&, unsigned long)
(in /home/martin/local/trilinos-11.14.1/lib/libkokkoscore.so.11.14.1)
==20250== by 0x1622CA9F: Kokkos::HostSpace::allocate(std::string const&,
unsigned long) (in
/home/martin/local/trilinos-11.14.1/lib/libkokkoscore.so.11.14.1)
==20250== by 0x6619C47: allocate<true> (Kokkos_ViewSupport.hpp:258)
==20250== by 0x6619C47: View<std::basic_string<char> >
(Kokkos_View.hpp:630)
==20250== by 0x6619C47: Kokkos::DualView<double**, Kokkos::LayoutLeft,
Kokkos::Serial, void>::DualView(std::string const&, unsigned long, unsigned
long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned
long, unsigned long) (Kokkos_DualView.hpp:223)
==20250== by 0x660DB3C: Tpetra::MultiVector<double, int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>,
false>::MultiVector(Teuchos::RCP<Tpetra::Map<int, int,
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> > const> const&,
unsigned long, bool) (Tpetra_KokkosRefactor_MultiVector_def.hpp:158)
This one in particular is triggered at a spot where I'm initializing a tpetra
vector by a map.
I would post a minimal example, but unfortunately the code is too long. The
documentation says to use Kokkos::initialize() and Kokkos::finalize(), but
putting these between MPI_Init and MPI_Finalize did not help.
I'm using trilinos 11.14.1 and running a small code (in serial but with mpi
initialized and finalized) on a laptop.
The structure is roughly as follows: a Solver class holds internally
std::shared_ptrs to my own wrappers around Tpetra matrix and 2 vectors (for
solution and right-hand side). These wrappers in turn have member pointers to
their respective trilinos objects. The solver initializes the matrix and
vectors and then performs assembly with potentially multiple threads, which
buffer data and periodically ask the Tpetra matrix wrapper to insert them in
the sparse matrix. The 'insert' method of the matrix wrapper uses a lock so
that only one thread can accumulate at a time.
I don't use threads for anything else than the linear system assembly and I
did not use any threading capabilities of trilinos. The error above was
triggered when running with a single thread, but after I allocated several
Solver objects at the same time.
The error is so cryptic to me that I have no idea if I did something wrong or
if this is a bug in trilinos. It would also really help if the tutorials not
only explained how to initialize Kokkos, but also WHEN is it necessary and
WHY. I'm new to multithreading (hence I make mistakes frequently), but I have
a strong impression that the threads I launch interfere with Kokkos. For
example, multithreaded accumulation into Tpetra matrix was segfaulting until I
hid it inside a wrapper function protected by a lock. Design issues aside (is
this efficient etc.), I would like to know if this can be done with Tpetra in a
correct manner - without segfaults, memory leaks etc.
Sadly, I have found no information whether it is safe to use std::thread of
C++11 and Kokkos in one code and what are the potential problems.
Best regards,
Martin Vymazal
On Tuesday, April 21, 2015 10:39:07 PM Nico Schlömer wrote:
> Kokkos currently has two resource leaks [1]; I'm not sure if this is the
> cause for the message though.
>
> Cheers,
> Nico
>
>
> [1] https://scan8.coverity.com/reports.htm#v11178/p10118
>
> On Wed, Apr 22, 2015 at 12:30 AM Hysom, David A. <hysom1 at llnl.gov> wrote:
> > Hi,
> >
> > A collegue is using some code we wrote, and is getting this just before
> >
> > exit:
> >
> > Kokkos::HostSpace destroyed with memory leaks
> >
> > The computation code appears to be correctly computing what it's
> >
> > supposed to.
> > He said he's using trilinos 11.6, and I suggested he upgrade to a newer
> > version.
> > I'm wondering if you have any insight here. I've never seen such a message
> > myself.
> >
> > thanks, David
> > _______________________________________________
> >
> > Trilinos-Users mailing list
> > Trilinos-Users at trilinos.org
> > https://trilinos.org/mailman/listinfo/trilinos-users
More information about the Trilinos-Users
mailing list