[Trilinos-Users] [EXTERNAL] test failures

Templeton, Jeremy Alan jatempl at sandia.gov
Fri Jul 11 09:38:08 MDT 2014


Hi Ross, your suggestion helped although 72 out of 296 tests still fail.  They look to be failing on RCP issues at termination, so for the purposes of debugging it should be ok.  I don’t know what’s up with my build environment that is causing this though.
Thanks for your help,
Jeremy

On Jul 9, 2014, at 12:20 PM, Bartlett, Roscoe A. <bartlettra at ornl.gov> wrote:

> Jeremy,
>  
> Assuming this is not a broken development version of Trilinos on your platform (which seems unlikely since you have 60% failing tests which we have not seen in many years) …
>  
> I saw what executable was failing so I did not mean to imply it was your code.  Instead, I was asking if the problem might be due to your compilation or runtime environment or something external from the source code.  To know more, exactly why did these tests fail (i.e. according to ctest).  To see, you can look at the detailed output in:
>  
>    <YOUR-BUILD-DIR>/Testing/Temporary/LastTest.log
>  
> and it will tell you why CTest passed or failed each test.  My guess is that it is because it is seeing these dangling RCPNode objects.  The problem is that this exec is not terminating normally, it is being aborted prematurely for some reason as shown in your output:
>  
> Program received signal SIGABRT, Aborted.
>  
> Why?   The program aborting prematurely would concern me given that these tests pass on a bunch of platforms and compilers without aborting (again, assuming this is not a broken version of Trilinos).
>  
> If you want, you can just make these errors go away by configuring with:
>  
>    -DTeuchos_ENABLE_DEBUG_RCP_NODE_TRACING:BOOL=OFF
>  
> and that will turn off this type of checking (blow away your CMakeCache.txt file to be safe).  The test programs will likely pass even with this abort because ctest is looking for:
>  
> End Result: TEST PASSED
>  
> and *not* a zero return code in most cases (because return codes from mpiexec or mpirun are unreliable).
>  
> But then again, if the program is done its work at the end of main() then do you really care how it terminates?  The problem is that some sophisticated C++ programs may actually do necessary things after main() ends in the destruction of static objects (but most good C++ programs will not do that).  This check in Teuchos is just helping to catch and debug circular references.
>  
> -Ross
>  
>  
>  
> From: Templeton, Jeremy Alan [mailto:jatempl at sandia.gov] 
> Sent: Wednesday, July 09, 2014 1:12 PM
> To: Bartlett, Roscoe A.
> Cc: trilinos-users at software.sandia.gov
> Subject: Re: [EXTERNAL] test failures
>  
> Hi Ross, 
>  
> Thanks for the response.  Just to be clear, the program which is failing is a Trilinos test which I have not modified, specifically Tpetra_ExportToStaticGraphCrsMatrix.  This is an example as nearly 60% of the tests failed in similar fashion, again, without any changes from the distribution.
>  
> Jeremy
>  
> On Jul 9, 2014, at 9:21 AM, Bartlett, Roscoe A. <bartlettra at ornl.gov> wrote:
> 
> 
> Jeremy,
>  
> See section:
>  
>    “5.11.2 Detection of circular references”
>  
> in:
>  
>     http://web.ornl.gov/~8vt/TeuchosMemoryManagementSAND.pdf
>  
> My guess is that your programs are terminating incorrectly for some reason.  This could be due to a number of different issues but it is not a good sign.  I would not trust software build on this system with these types of failures.
>  
> -Ross
>  
>  
>  
> From: trilinos-users-bounces at software.sandia.gov [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf OfTempleton, Jeremy Alan
> Sent: Wednesday, July 09, 2014 12:01 PM
> To: trilinos-users at software.sandia.gov
> Subject: [Trilinos-Users] test failures
>  
> Hi all,
>  
> I’m trying to get Trilinos 11.8 (tpetra, zoltan2) running on my mac (10.9, gcc4.8, openmpi) but when I build an MPI debug version nearly 60% of the tests fail on exit.  I ran one of them through GDB and the output is below.  When I build the MPI release version with no other configuration changes, all but one test passes.  So this looks like some extra error checking.  How relevant is it?  If it’s not harmful, is there a way to turn it off?
>  
> Thanks,
> Jeremy
>  
> Starting program: /Users/jatempl/Codes/Trilinos_build/packages/tpetra/test/ImportExport/Tpetra_ExportToStaticGraphCrsMatrix.exe 
> End Result: TEST PASSED
>  
> ***
> *** Warning! The following Teuchos::RCPNode objects were created but have
> *** not been destroyed yet.  A memory checking tool may complain that these
> *** objects are not destroyed correctly.
> ***
> *** There can be many possible reasons that this might occur including:
> ***
> ***   a) The program called abort() or exit() before main() was finished.
> ***      All of the objects that would have been freed through destructors
> ***      are not freed but some compilers (e.g. GCC) will still call the
> ***      destructors on static objects (which is what causes this message
> ***      to be printed).
> ***
> ***   b) The program is using raw new/delete to manage some objects and
> ***      delete was not called correctly and the objects not deleted hold
> ***      other objects through reference-counted pointers.
> ***
> ***   c) This may be an indication that these objects may be involved in
> ***      a circular dependency of reference-counted managed objects.
> ***
>  
>   0: RCPNode (map_key_void_ptr=0x101353290)
>        Information = {T=Teuchos::SerializationTraits<int, unsigned long>, ConcreteT=Teuchos::SerializationTraits<int, unsigned long>, p=0x101353290, has_ownership=1}
>        RCPNode address = 0x1013533f0
>        insertionNumber = 14
>   1: RCPNode (map_key_void_ptr=0x101353440)
>        Information = {T=Teuchos::SerializationTraits<int, int>, ConcreteT=Teuchos::SerializationTraits<int, int>, p=0x101353440, has_ownership=1}
>        RCPNode address = 0x101354230
>        insertionNumber = 15
>  
> NOTE: To debug issues, open a debugger, and set a break point in the function where
> the RCPNode object is first created to determine the context where the object first
> gets created.  Each RCPNode object is given a unique insertionNumber to allow setting
> breakpoints in the code.  For example, in GDB one can perform:
>  
> 1) Open the debugger (GDB) and run the program again to get updated object addresses
>  
> 2) Set a breakpoint in the RCPNode insertion routine when the desired RCPNode is first
> inserted.  In GDB, to break when the RCPNode with insertionNumber==3 is added, do:
>  
>   (gdb) b 'Teuchos::RCPNodeTracer::addNewRCPNode( [TAB] ' [ENTER]
>   (gdb) cond 1 insertionNumber==3 [ENTER]
>  
> 3) Run the program in the debugger.  In GDB, do:
>  
>   (gdb) run [ENTER]
>  
> 4) Examine the call stack when the program breaks in the function addNewRCPNode(...)
> libc++abi.dylib: terminating with uncaught exception of type std::logic_error: /Users/jatempl/Codes/trilinos-11.8.1-Source/packages/teuchos/core/src/Teuchos_RCPNode.cpp:497:
>  
> Throw number = 1
>  
> Throw test that evaluated to true: !(rcp_node_list())
>  
> Error!
>  
> Program received signal SIGABRT, Aborted.
> 0x00007fff8d734866 in ?? ()
> (gdb) where
> #0  0x00007fff8d734866 in ?? ()
> #1  0x00007fff9294035c in ?? ()
> #2  0x0000000000000000 in ?? ()
> 
> --------------------------------------------------------
> Jeremy A. Templeton, Ph.D.
> Thermal/Fluid Sciences & Engineering
> jatempl at sandia.gov
> http://tiny.sandia.gov/jatempl
> 925-294-1429
>  
> 
> --------------------------------------------------------
> Jeremy A. Templeton, Ph.D.
> Thermal/Fluid Sciences & Engineering
> jatempl at sandia.gov
> http://tiny.sandia.gov/jatempl
> 925-294-1429


--------------------------------------------------------
Jeremy A. Templeton, Ph.D.
Thermal/Fluid Sciences & Engineering
jatempl at sandia.gov
http://tiny.sandia.gov/jatempl
925-294-1429





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20140711/bff7b32a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3308 bytes
Desc: not available
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20140711/bff7b32a/attachment.p7s>


More information about the Trilinos-Users mailing list