[Trilinos-Users] Crash in AZ_gdot_vec

David Neckels dneckels at ucar.edu
Thu Jan 31 16:41:05 MST 2008


Hi,

This may be a long shot, but maybe someone has seen this before:

My program appears to be crashing in AZ_gdot_vec during a linear solve 
using Aztecoo:

MPI_Wait(139): MPI_Wait(request=0xfff6558, status0xfff6510) failed
(unknown)(): Message truncatedAZ_gdot_vec: ERROR on node 156
md_wait failed, message type = 1234

I am working on an IBM Blue Gene/L, with their version of mpich.
Just prior to dying, a message from their (modified) version of mpich echos:
'Truncating! 8 instead of <some number>

According to IBM, this means a send and recv are mismatched, and the 
truncation is an attempt to fix the send/recv size mismatch.

I have checkpointed the Solver.Iterate call with MPI-Barriers, so I know 
the crash is happening within Solver.Iterate.
Also, this crash is non-deterministic in nature.

Maybe someone has seen this?  I was wondering if I could replace 
gdot_vec with a (perhaps less efficient) MPI_Allreduce, or is the action 
here different?
This crash only happens on this particular computer, so it may very well 
be the modified version of MPICH causing the trouble....

Again, this may be a long shot, but thanks in advance.

-David N.

*
*



More information about the Trilinos-Users mailing list