[Trilinos-Users] aztec00 problem
Matthias Heil
matthias.heil at manchester.ac.uk
Mon Jan 20 07:24:04 MST 2014
Hi,
we've come across a possible bug in trilinos aztecoo.
The code seg faults when trying to execute the line
*dst_ptr++ = s;
in
trilinos-11.4.3-Source/packages/epetra/src/Epetra_CrsMatrix.cpp:3327
An attempt to de-reference that pointer (in ddd) shows:
(gdb) print *dst_ptr
Cannot access memory at address 0x8001cb466110
Moving back through the call stack shows that the
memory is initially allocated in AZ_manage_memory(...)
which is called from just under
trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:239
The problem arises only for large values of kspace which is related
to the max number of iterations. We've set this to a rather large
value of 5000 (We don't usually need that many, BUT the code
should hopefully still be able to handle this or fail
gracefully. Things work ok for smaller values, e.g kspace=1000).
Following the return from this call, the memory allocated in
AZ_manage_memory(...) gets distributed into two vectors, hh
and v, and it's v that contains the illegal memory address:
Placing a breakpoint in
trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:248
(just after that loop) and interrogating various values of v yields:
(gdb) print v[5000]
$1 = (double *) 0x8001cb466110
and, predictably:
(gdb) print *v[5000]
Cannot access memory at address 0x8001cb466110
whereas
(gdb) print *v[500]
$4 = 0
is fine.
Trial and error shows that things go wrong beyond entry 518:
(gdb) print *v[519]
Cannot access memory at address 0x7ffff0bf14f0
(gdb) print *v[518]
$8 = 0
Further information:
-- All code was completely built from source, using gcc
without optimisation and with -g.
-- Based on a (small) sample of machines, the problem only
arises on 64 bit machines (not 32)
-- The problem only arises for sufficiently big problem sizes
(though they are still way short of the machines' total
available memory). When running on a machine with very
little memory, the call to AZ_manage_memory(...) fails
gracefully with the "maybe you should try a smaller problem"
message.
-- The problem arises with both serial and parallel installations
(i.e. when the code is compiled with and without mpi support)
and with different trilinos releases.
-- The problem is difficult to isolate further since we use
trilinos from within our own big library (which provides the
preconditioner). Note that our code works fine if we use our
own (serial) GMRES solver (or a direct solver).
Does any of this ring a bell?
Happy to run further tests here or provide additional diagnostic
information.
Best wishes,
Matthias
--
---------------------------------------------------------------------------
Professor Matthias Heil
Alan Turing Building, Room 2.224
School of Mathematics Tel. +44 (0)161 275 5808
University of Manchester Fax. +44 (0)161 275 5819
Oxford Road email: M.Heil at maths.man.ac.uk
Manchester M13 9PL WWW: http://www.maths.man.ac.uk/~mheil/
U.K.
NEWS: The beta release of oomph-lib, the object-oriented
multi-physics finite-element library is now available
as free open-source software at
http://www.oomph-lib.org
---------------------------------------------------------------------------
More information about the Trilinos-Users
mailing list