[Trilinos-Users] Crash in AZ_gdot_vec
David Neckels
dneckels at ucar.edu
Thu Jan 31 16:41:05 MST 2008
Hi,
This may be a long shot, but maybe someone has seen this before:
My program appears to be crashing in AZ_gdot_vec during a linear solve
using Aztecoo:
MPI_Wait(139): MPI_Wait(request=0xfff6558, status0xfff6510) failed
(unknown)(): Message truncatedAZ_gdot_vec: ERROR on node 156
md_wait failed, message type = 1234
I am working on an IBM Blue Gene/L, with their version of mpich.
Just prior to dying, a message from their (modified) version of mpich echos:
'Truncating! 8 instead of <some number>
According to IBM, this means a send and recv are mismatched, and the
truncation is an attempt to fix the send/recv size mismatch.
I have checkpointed the Solver.Iterate call with MPI-Barriers, so I know
the crash is happening within Solver.Iterate.
Also, this crash is non-deterministic in nature.
Maybe someone has seen this? I was wondering if I could replace
gdot_vec with a (perhaps less efficient) MPI_Allreduce, or is the action
here different?
This crash only happens on this particular computer, so it may very well
be the modified version of MPICH causing the trouble....
Again, this may be a long shot, but thanks in advance.
-David N.
*
*
More information about the Trilinos-Users
mailing list