[Trilinos-Users] aztec00 problem

Matthias Heil matthias.heil at manchester.ac.uk
Mon Jan 20 07:24:04 MST 2014


Hi,

   we've come across a possible bug in trilinos aztecoo.
The code seg faults when trying to execute the line

    *dst_ptr++ = s;

in

trilinos-11.4.3-Source/packages/epetra/src/Epetra_CrsMatrix.cpp:3327

An attempt to de-reference that pointer (in ddd) shows:

(gdb) print *dst_ptr
Cannot access memory at address 0x8001cb466110

Moving back through the call stack shows that the
memory is initially allocated in AZ_manage_memory(...)
which is called from just under

    trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:239

The problem arises only for large values of kspace which is related
to the max number of iterations. We've set this to a rather large
value of 5000 (We don't usually need that many, BUT the code
should hopefully still be able to handle this or fail
gracefully. Things work ok for smaller values, e.g kspace=1000).

Following the return from this call, the memory allocated in 
AZ_manage_memory(...) gets distributed into two vectors, hh
and v, and it's v that contains the illegal memory address:
Placing a breakpoint in

    trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:248

(just after that loop) and interrogating various values of v yields:

(gdb) print v[5000]
$1 = (double *) 0x8001cb466110

and, predictably:

(gdb) print *v[5000]
Cannot access memory at address 0x8001cb466110

whereas

(gdb) print *v[500]
$4 = 0

is fine.

Trial and error shows that things go wrong beyond entry 518:

(gdb) print *v[519]
Cannot access memory at address 0x7ffff0bf14f0
(gdb) print *v[518]
$8 = 0

Further information:

   -- All code was completely built from source, using gcc
      without optimisation and with -g.

   -- Based on a (small) sample of machines, the problem only
      arises on 64 bit machines (not 32)

   -- The problem only arises for sufficiently big problem sizes
      (though they are still way short of the machines' total
      available memory). When running on a machine with very
      little memory, the call to AZ_manage_memory(...) fails
      gracefully with the "maybe you should try a smaller problem"
      message.

   -- The problem arises with both serial and parallel installations
      (i.e. when the code is compiled with and without mpi support)
      and with different trilinos releases.

  -- The problem is difficult to isolate further since we use
      trilinos from within our own big library (which provides the
      preconditioner). Note that our code works fine if we use our
      own (serial) GMRES solver (or a direct solver).

   Does any of this ring a bell?

      Happy to run further tests here or provide additional diagnostic
information.

      Best wishes,

              Matthias

-- 
---------------------------------------------------------------------------
Professor Matthias Heil

Alan Turing Building, Room 2.224
School of Mathematics           Tel. +44 (0)161 275 5808
University of Manchester        Fax. +44 (0)161 275 5819
Oxford Road                     email: M.Heil at maths.man.ac.uk
Manchester M13 9PL              WWW: http://www.maths.man.ac.uk/~mheil/
U.K.

NEWS:   The beta release of oomph-lib, the object-oriented
         multi-physics finite-element library is now available
         as free open-source software at

             http://www.oomph-lib.org

---------------------------------------------------------------------------



More information about the Trilinos-Users mailing list