[Trilinos-Users] Kokkos::HostSpace destroyed with memory leaks

Martin Vymazal martin.vymazal at vki.ac.be
Thu Apr 30 10:27:18 EDT 2015


Hello,

 I think I have a similar problem.

 When my code is finishing, I see a lot of this:

Kokkos::HostSpace destroyed with memory leaks:  { "Tpetra::CrsGraph::ind" 
count(3) memory[ 0x155da00 + 3268 ]
  { "DualView::modified_host" count(1) memory[ 0x155ec00 + 4 ]
  { "DualView::modified_host" count(1) memory[ 0x1564400 + 4 ]
  { "DualView::modified_device" count(1) memory[ 0x1564780 + 4 ]
  { "DualView::modified_device" count(1) memory[ 0x156ae00 + 4 ]
  { "DualView::modified_device" count(1) memory[ 0x156af00 + 4 ]
  { "DualView::modified_device" count(2) memory[ 0x1576e00 + 4 ]
  { "DualView::modified_device" count(1) memory[ 0x1577880 + 4 ]
  { "DualView::modified_device" count(2) memory[ 0x1577d00 + 4 ]
  { "DualView::modified_host" count(2) memory[ 0x1578100 + 4 ]
  { "MV::DualView" count(4) memory[ 0x1578300 + 2352 ]
  { "DualView::modified_host" count(2) memory[ 0x1578c80 + 4 ]
  { "MV::DualView" count(4) memory[ 0x1579200 + 2352 ]
  { "Tpetra::CrsGraph::ptr" count(3) memory[ 0x1579c80 + 2360 ]
  { "DualView::modified_host" count(1) memory[ 0x157f280 + 4 ]
  { "Tpetra::CrsGraph::ind" count(3) memory[ 0x1586200 + 4444 ]
  { "Tpetra::CrsGraph::ind" count(3) memory[ 0x1589380 + 3268 ]
  { "DualView::modified_device" count(1) memory[ 0x158a200 + 4 ]
  { "DualView::modified_device" count(1) memory[ 0x158c500 + 4 ]
  { "DualView::modified_device" count(1) memory[ 0x158c680 + 4 ]
  { "DualView::modified_device" count(1) memory[ 0x158c880 + 4 ]
  { "DualView::modified_host" count(1) memory[ 0x158c900 + 4 ]
  { "Tpetra::CrsGraph::ind" count(3) memory[ 0x159c400 + 7712 ]
  { "Tpetra::CrsMatrix::val" count(2) memory[ 0x159e280 + 15424 ]
  { "DualView::modified_device" count(1) memory[ 0x15a4c80 + 4 ]
  { "DualView::modified_host" count(1) memory[ 0x15a4e00 + 4 ]
  { "MV::DualView" count(4) memory[ 0x15a5680 + 2352 ]
  { "DualView::modified_device" count(2) memory[ 0x15a6180 + 4 ]
  { "DualView::modified_host" count(2) memory[ 0x15a6300 + 4 ]
  { "MV::DualView" count(4) memory[ 0x15a6680 + 2352 ]
  { "DualView::modified_device" count(2) memory[ 0x15a7100 + 4 ]
  { "DualView::modified_host" count(2) memory[ 0x15a7280 + 4 ]
  { "DualView::modified_device" count(2) memory[ 0x15b3080 + 4 ]
  { "Tpetra::CrsGraph::ptr" count(3) memory[ 0x15b4700 + 2360 ]


Gdb does not complain, but valgrind prints out a lot of messages such as:

==20250== 160 bytes in 1 blocks are definitely lost in loss record 509 of 874
==20250==    at 0x4C2A4F0: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20250==    by 0x1622C269: 
Kokkos::Impl::host_allocate_not_thread_safe(std::string const&, unsigned long) 
(in /home/martin/local/trilinos-11.14.1/lib/libkokkoscore.so.11.14.1)
==20250==    by 0x1622CA9F: Kokkos::HostSpace::allocate(std::string const&, 
unsigned long) (in 
/home/martin/local/trilinos-11.14.1/lib/libkokkoscore.so.11.14.1)
==20250==    by 0x6619C47: allocate<true> (Kokkos_ViewSupport.hpp:258)
==20250==    by 0x6619C47: View<std::basic_string<char> > 
(Kokkos_View.hpp:630)
==20250==    by 0x6619C47: Kokkos::DualView<double**, Kokkos::LayoutLeft, 
Kokkos::Serial, void>::DualView(std::string const&, unsigned long, unsigned 
long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned 
long, unsigned long) (Kokkos_DualView.hpp:223)
==20250==    by 0x660DB3C: Tpetra::MultiVector<double, int, int, 
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial>, 
false>::MultiVector(Teuchos::RCP<Tpetra::Map<int, int, 
Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial> > const> const&, 
unsigned long, bool) (Tpetra_KokkosRefactor_MultiVector_def.hpp:158)

This one in particular is triggered at a spot where I'm initializing a tpetra 
vector by a map.

I would post a minimal example, but unfortunately the code is too long. The 
documentation says to use Kokkos::initialize() and Kokkos::finalize(), but 
putting these between MPI_Init and MPI_Finalize did not help.

I'm using trilinos 11.14.1 and running a small code (in serial but with mpi 
initialized and finalized) on a laptop.

The structure is roughly as follows: a Solver class holds internally 
std::shared_ptrs to my own wrappers around Tpetra matrix and 2 vectors (for 
solution and right-hand side). These wrappers in turn have member pointers to 
their respective trilinos objects. The solver initializes the matrix and 
vectors and then performs assembly with potentially multiple threads, which 
buffer data and periodically ask the Tpetra matrix wrapper to insert them in 
the sparse matrix. The 'insert' method of the matrix wrapper uses a lock so 
that only one thread can accumulate at a time.

I don't use threads for anything else than the linear system assembly and I 
did not use any threading capabilities of trilinos. The error above was 
triggered when running with a single thread, but after I allocated several 
Solver objects at the same time.

The error is so cryptic to me that I have no idea if I did something wrong or 
if this is a bug in trilinos. It would also really help if the tutorials not 
only explained how to initialize Kokkos, but also WHEN is it necessary and 
WHY. I'm new to multithreading (hence I make mistakes frequently), but I have 
a strong impression that the threads I launch interfere with Kokkos. For 
example, multithreaded accumulation into Tpetra matrix was segfaulting until I 
hid it inside a wrapper function protected by a lock. Design issues aside (is 
this efficient etc.), I would like to know if this can be done with Tpetra in a 
correct manner - without segfaults, memory leaks etc.

Sadly, I have found no information whether it is safe to use std::thread of 
C++11 and Kokkos in one code and what are the potential problems.

Best regards,

  Martin Vymazal


On Tuesday, April 21, 2015 10:39:07 PM Nico Schlömer wrote:
> Kokkos currently has two resource leaks [1]; I'm not sure if this is the
> cause for the message though.
> 
> Cheers,
> Nico
> 
> 
> [1] https://scan8.coverity.com/reports.htm#v11178/p10118
> 
> On Wed, Apr 22, 2015 at 12:30 AM Hysom, David A. <hysom1 at llnl.gov> wrote:
> >  Hi,
> >  
> >  A collegue is using some code we wrote, and is getting this just before
> > 
> > exit:
> > 
> > Kokkos::HostSpace destroyed with memory leaks
> > 
> >  The computation code appears to be correctly computing what it's
> > 
> > supposed to.
> > He said he's using trilinos 11.6, and I suggested he upgrade to a newer
> > version.
> > I'm wondering if you have any insight here. I've never seen such a message
> > myself.
> > 
> >  thanks, David
> >  _______________________________________________
> > 
> > Trilinos-Users mailing list
> > Trilinos-Users at trilinos.org
> > https://trilinos.org/mailman/listinfo/trilinos-users



More information about the Trilinos-Users mailing list