[Trilinos-Users] ML hangs

Jonathan Hu jhu at sandia.gov
Wed Oct 1 12:01:54 MDT 2008


Hi Jonas,

    Could you attach a more detailed stack trace that includes the calls 
above ML_Comm_CheapWait()?  Also, could you rerun your program with the 
ML output set at 10, i.e., mlList.set("ML output",10) and send us the 
result?

Thanks,
Jonathan

trilinos-users-request at software.sandia.gov wrote:
> Today's Topics:
>
>    1. Re: ML hangs (Jonas Thies)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 25 Sep 2008 10:29:15 +0200
> From: "Jonas Thies" <J.Thies at rug.nl>
> Subject: Re: [Trilinos-Users] ML hangs
> To: "Ray Tuminaro" <rstumin at sandia.gov>
> Cc: "trilinos-users at software.sandia.gov"
>         <trilinos-users at software.sandia.gov>
> Message-ID: <48DB4BDB.90109 at rug.nl>
> Content-Type: text/plain; charset=iso-8859-1; format=flowed
>
> Hi Ray,
>
> I managed to produce a hang with a 3-level hierarchy where I use
> "symmetric Gauss-Seidel" (from Ifpack) on level 0, "do-nothing" on level
> 1 and Amesos-KLU on level 3. I also got a hang with Aztec's ILUT
> smoother recently.
> What they all have in common is that someone waits for a message never
> posted in the function
>
>       Epetra_MpiDistributor::DoWaits
>
> whereupon the rest gets stuck at whatever MPI call comes next, I guess.
>
> regards,
> Jonas
>
> Ray Tuminaro wrote:
>   
>> Jonas,
>>    Could you try switching to a symmetric Gauss-Seidel smoother to see
>> if things
>> still hang?
>> -Ray
>>
>>
>> Jonas Thies wrote:
>>     
>>> Hi,
>>>
>>> I have a rather technical problem with ML.
>>> When running in parallel, from time to time the program
>>> hangs during an ML preconditioned Aztec solve. I attached
>>> a debugger and found the processes are MPI_Waiting in either of these
>>> three functions:
>>>
>>> * Epetra_MpiDistributor::DoWaits
>>>
>>> * Epetra_MpiComm::SumAll
>>>
>>> * ML_Comm_CheapWait
>>>
>>> I'm using a full MGV cycle, my smoother is a non-overlapping
>>> Ifpack_AdditiveSchwarz with custom ILU on each subdomain. I'm currently
>>> using Trilinos 8.0.5 on 16 IBM Power6 cpu's, the compiler is IBM xlC_r
>>> 9.0 with POE MPI.
>>>
>>> I guess that's pretty vague...
>>> I'm not very good with totalview, maybe someone has some hints for me?
>>> Or is it a known issue and I just have to update to a more recent
>>> Trilinos version? Furthermore, I do not trust the machine and its
>>> software :)
>>>
>>> Thanks,
>>> Jonas
>>>
>>>       


-- 
Jonathan J. Hu, mailto:jhu at sandia.gov
Postal address: Sandia National Laboratories
                Mailstop 9159
                PO Box 969, Livermore, CA 94551-0969
Tel / Fax (925) 294-2931 



More information about the Trilinos-Users mailing list