[Trilinos-Users] ML hangs

Jonas Thies J.Thies at rug.nl
Mon Oct 13 03:00:19 MDT 2008


Hi,

somehow, the problem has been resolved. I changed several things:

- added -DREDUCE_SCATTER_BUG when compiling Epetra_MpiDistributor.cpp
- the xlC compiler was updated to 10.1 in the meantime
- disabled the -qhot (high order transformations) flag as it seems to be 
buggy in the new compiler (maybe also in the old one, but not so obviously)

Probably I'll never find out what caused the problem, but I don't think 
it was in ML after all.
cheers,
Jonas

Jonas Thies wrote:
> Hi Ray,
> 
> I managed to produce a hang with a 3-level hierarchy where I use 
> "symmetric Gauss-Seidel" (from Ifpack) on level 0, "do-nothing" on level 
> 1 and Amesos-KLU on level 3. I also got a hang with Aztec's ILUT 
> smoother recently.
> What they all have in common is that someone waits for a message never 
> posted in the function
> 
>       Epetra_MpiDistributor::DoWaits
> 
> whereupon the rest gets stuck at whatever MPI call comes next, I guess.
> 
> regards,
> Jonas
> 
> Ray Tuminaro wrote:
>> Jonas,
>>    Could you try switching to a symmetric Gauss-Seidel smoother to see 
>> if things
>> still hang?
>> -Ray
>>
>>
>> Jonas Thies wrote:
>>> Hi,
>>>
>>> I have a rather technical problem with ML.
>>> When running in parallel, from time to time the program
>>> hangs during an ML preconditioned Aztec solve. I attached
>>> a debugger and found the processes are MPI_Waiting in either of these
>>> three functions:
>>>
>>> * Epetra_MpiDistributor::DoWaits
>>>
>>> * Epetra_MpiComm::SumAll
>>>
>>> * ML_Comm_CheapWait
>>>
>>> I'm using a full MGV cycle, my smoother is a non-overlapping
>>> Ifpack_AdditiveSchwarz with custom ILU on each subdomain. I'm currently
>>> using Trilinos 8.0.5 on 16 IBM Power6 cpu's, the compiler is IBM xlC_r
>>> 9.0 with POE MPI.
>>>
>>> I guess that's pretty vague...
>>> I'm not very good with totalview, maybe someone has some hints for me?
>>> Or is it a known issue and I just have to update to a more recent
>>> Trilinos version? Furthermore, I do not trust the machine and its
>>> software :)
>>>
>>> Thanks,
>>> Jonas
>>>
>>>
>>> _______________________________________________
>>> Trilinos-Users mailing list
>>> Trilinos-Users at software.sandia.gov
>>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>>   
> 
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users
> 



More information about the Trilinos-Users mailing list