[Trilinos-Users] aztec00 problem

Heroux, Mike MHeroux at csbsju.edu
Mon Jan 20 14:18:01 MST 2014


Matthias,

Do you have a sense of whether or not the data sizes you are using would
result in array indexing that exceed 2.1 billion?  The kinds of issues you
are seeing would be consistent with trying to address an array using an
integer value that is bigger than what signed int can handle.

The hex values you are printing are very large (more than 140 trillion),
which seems to indicate an incorrect address calculation somewhere.  I
agree that the memory manager should detect the issue, no matter what.

Mike

On 1/20/14 8:24 AM, "Matthias Heil" <matthias.heil at manchester.ac.uk> wrote:

>Hi,
>
>   we've come across a possible bug in trilinos aztecoo.
>The code seg faults when trying to execute the line
>
>    *dst_ptr++ = s;
>
>in
>
>trilinos-11.4.3-Source/packages/epetra/src/Epetra_CrsMatrix.cpp:3327
>
>An attempt to de-reference that pointer (in ddd) shows:
>
>(gdb) print *dst_ptr
>Cannot access memory at address 0x8001cb466110
>
>Moving back through the call stack shows that the
>memory is initially allocated in AZ_manage_memory(...)
>which is called from just under
>
>    trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:239
>
>The problem arises only for large values of kspace which is related
>to the max number of iterations. We've set this to a rather large
>value of 5000 (We don't usually need that many, BUT the code
>should hopefully still be able to handle this or fail
>gracefully. Things work ok for smaller values, e.g kspace=1000).
>
>Following the return from this call, the memory allocated in
>AZ_manage_memory(...) gets distributed into two vectors, hh
>and v, and it's v that contains the illegal memory address:
>Placing a breakpoint in
>
>    trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:248
>
>(just after that loop) and interrogating various values of v yields:
>
>(gdb) print v[5000]
>$1 = (double *) 0x8001cb466110
>
>and, predictably:
>
>(gdb) print *v[5000]
>Cannot access memory at address 0x8001cb466110
>
>whereas
>
>(gdb) print *v[500]
>$4 = 0
>
>is fine.
>
>Trial and error shows that things go wrong beyond entry 518:
>
>(gdb) print *v[519]
>Cannot access memory at address 0x7ffff0bf14f0
>(gdb) print *v[518]
>$8 = 0
>
>Further information:
>
>   -- All code was completely built from source, using gcc
>      without optimisation and with -g.
>
>   -- Based on a (small) sample of machines, the problem only
>      arises on 64 bit machines (not 32)
>
>   -- The problem only arises for sufficiently big problem sizes
>      (though they are still way short of the machines' total
>      available memory). When running on a machine with very
>      little memory, the call to AZ_manage_memory(...) fails
>      gracefully with the "maybe you should try a smaller problem"
>      message.
>
>   -- The problem arises with both serial and parallel installations
>      (i.e. when the code is compiled with and without mpi support)
>      and with different trilinos releases.
>
>  -- The problem is difficult to isolate further since we use
>      trilinos from within our own big library (which provides the
>      preconditioner). Note that our code works fine if we use our
>      own (serial) GMRES solver (or a direct solver).
>
>   Does any of this ring a bell?
>
>      Happy to run further tests here or provide additional diagnostic
>information.
>
>      Best wishes,
>
>              Matthias
>
>-- 
>--------------------------------------------------------------------------
>-
>Professor Matthias Heil
>
>Alan Turing Building, Room 2.224
>School of Mathematics           Tel. +44 (0)161 275 5808
>University of Manchester        Fax. +44 (0)161 275 5819
>Oxford Road                     email: M.Heil at maths.man.ac.uk
>Manchester M13 9PL              WWW: http://www.maths.man.ac.uk/~mheil/
>U.K.
>
>NEWS:   The beta release of oomph-lib, the object-oriented
>         multi-physics finite-element library is now available
>         as free open-source software at
>
>             http://www.oomph-lib.org
>
>--------------------------------------------------------------------------
>-
>
>_______________________________________________
>Trilinos-Users mailing list
>Trilinos-Users at software.sandia.gov
>http://software.sandia.gov/mailman/listinfo/trilinos-users




More information about the Trilinos-Users mailing list