[Trilinos-Users] [EXTERNAL] Re: Matrix Free operator, and loss of accuracy error

Tom Goffrey t.goffrey at exeter.ac.uk
Tue Dec 9 08:02:40 MST 2014


Roger, Mike,

Thanks for all your help on this issue, I think the final email clears up
any remaining questions.

Cheers,

Tom

On Tue, Dec 9, 2014 at 1:51 PM, Roger Pawlowski <rppawlo at sandia.gov> wrote:

>  Tom,
>
> For the NOX::Direction::Newton, there is a parameter that you can use to
> allow for a "failed" Newton step to be used.  Typically, your Krylov solver
> makes good enough progress and this flag allows for the nonlinear system to
> converge.  See the parameter "Rescue Bad Newton Solve".  So yes - we are
> able to nonlinear converge systems even with Aztec reporting this warning.
> You do have to be careful in this case - using only a relative residual
> status test for convergence can result in a "false" convergence due to
> stagnation if the linear solver is essentially making no progress.  Make
> sure to use an absolute residual norm as well in your stopping criteria.
>
> Roger
>
>
>
> On 12/09/2014 07:51 AM, Tom Goffrey wrote:
>
>   Hi Mike,
>
> Sorry for the delay, wanted to be absolutely sure that I had the details
> straight before replying.
>
> I agree that this test is only carried out once the recursive residual
> indicates convergence. The true residual is then calculated, and as you say
> we hit this warning if the difference is too large.
>
>  The trouble we are having is that quite often the difference is
> sufficient that the recursive residual implies convergence but the true
> residual does not, and we get something like:
>          ***************************************************************
>
>         Warning: recursive residual indicates convergence
>         though the true residual is too large.
>
>         Sometimes this occurs when storage is overwritten (e.g. the
>         solution vector was not dimensioned large enough to hold
>         external variables). Other times, this is due to roundoff. In
>         this case, the solution has either converged to the accuracy
>         of the machine or intermediate roundoff errors occurred
>         preventing full convergence. In the latter case, try solving
>         again using the new solution as an initial guess.
>
>         Solver:                 gmres_condnum
>         number of iterations:   11
>
>         Actual residual =  3.3361e-05   Recursive residual =  3.4301e-06
>
>         Calculated Norms                                Requested Norm
>         --------------------------------------------    --------------
>
>     ||r||_2 / ||b||_2:              7.213690e-02    1.000000e-02
>
>     ***************************************************************
>
>
>  In some cases we are able to restart GMRES based on the solution we get
> out and eventually we get a solution that has converged according to the
> true residual. However in other cases we seem to get stuck, repeatedly
> hitting this error.
>
>  Do you have any advice in this situation?
>
>  Roger when you say be default NOX doesn't worry about this, are you
> basically saying that NOX would treat this situation as an acceptable
> convergence?
>
>  Thanks,
>
> Tom
>
> On Mon, Dec 8, 2014 at 8:43 PM, Heroux, Mike <MHeroux at csbsju.edu> wrote:
>
>> As I recall, this test is invoked when the recursive residual has reached
>> the threshold specified in params[AZ_tol]
>>
>> Then the true residual is computed. If the absolute value of the
>> difference between the recursive and true residual is greater than
>> params[AZ_tol], this error condition is enable.
>>
>> You could try increasing your value for AZ_tol.  This would decrease the
>> chance of seeing the error, by not forcing iterations to continue beyond
>> what is reachable and because the difference test would be easier to
>> satisfy.
>>
>> There is some mathematical reasoning behind this test.  Do you know that
>> continued iterations improves the overall solution process?
>>
>> You could, for experimental purposes, modify az_util.c:1413 and make the
>> logic test trivially false.  Then the solver would continue and you could
>> see the impact of continued iterations.
>>
>> There is no “official” way to turn off this test.
>>
>> Mike
>>
>> From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>
>> Date: Monday, December 8, 2014 at 2:12 PM
>> To: Michael A Heroux <mheroux at csbsju.edu<mailto:mheroux at csbsju.edu>>
>> Cc: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>,
>> trilinos-users <trilinos-users at software.sandia.gov<mailto:
>> trilinos-users at software.sandia.gov>>
>> Subject: Re: [Trilinos-Users] Matrix Free operator, and loss of accuracy
>> error
>>
>> Hi Mike,
>>
>> The silence I don't mind (can always check return status), but presumably
>> Aztec will still detect this error and abort early (i.e. before requested
>> tolerance has been achieved), albeit silently?
>> Is there any way I can ask Aztec to ignore this error and carry on
>> iterating, as I got the impression NOX was capable of?
>>
>> Thanks,
>>
>> Tom
>>
>> On Mon, Dec 8, 2014 at 7:55 PM, Heroux, Mike <MHeroux at csbsju.edu<mailto:
>> MHeroux at csbsju.edu>> wrote:
>> Tom,
>>
>> If AZ_output is set to AZ_none, no output, including this warning
>> message, will be generated.  This might be too much “silence” but it will
>> work.
>>
>> Mike
>>
>> From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk
>> ><mailto:t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>>
>> Date: Monday, December 8, 2014 at 1:38 PM
>> To: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov
>> ><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>>, trilinos-users
>> <trilinos-users at software.sandia.gov<mailto:
>> trilinos-users at software.sandia.gov><mailto:
>> trilinos-users at software.sandia.gov<mailto:
>> trilinos-users at software.sandia.gov>>>
>> Subject: Re: [Trilinos-Users] [EXTERNAL] Matrix Free operator, and loss
>> of accuracy error
>>
>> Hi Roger,
>>
>> Thanks for the input. I think perhaps my initial question was misleading.
>>
>> We rely on Trilinos to solve the linear system only, the non-linear
>> portion is handled by our own code. We use the NOX to provide the operator
>> (interfaced with our own code for residual evaluations etc), but don't use
>> NOX as the non-linear solver itself.
>>
>> I've experimented quite a lot with scaling, and whilst I've found the
>> frequency of this error does depend on what I use I've never been able to
>> eradicate it. Do you know of any way I can convince Aztec to not perform
>> this check, as you suggest the NOX solvers do?
>>
>> Many thanks,
>>
>> Tom
>>
>>
>>
>>  On Mon, Dec 8, 2014 at 6:30 PM, Roger Pawlowski <rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov<mailto:
>> rppawlo at sandia.gov>>> wrote:
>> Hi Tom,
>>
>> This is a known issue.  JFNK uses a directional derivative for the
>> Jacobian-vector product and Aztec is detecting that loss of accuracy from
>> the finite differencing.  It could be indicative of a poorly scaled system
>> of equations and/or linear solve tolerances in Aztec that are too tight.
>> You can switch to a higher order derivative in the JFNK algorithm or loosen
>> the linear solve tolerances.  If it is poorly scaled, you can enable row
>> sum scaling of the linear system in the nox group for the linear solves.
>> We tend to ignore the warning as it was an overly stringent check that, if
>> memory serves, assumed you had an analytic Jacobian.  By default, nox with
>> JFNK should ignore this warning.  Please send your output so I can check to
>> make sure that is happening in your case. Note that if you switch to Belos
>> instead of Aztec (preferred), this warning will not occur.
>>
>> Best,
>> Roger
>>
>>
>>
>> On 12/08/2014 12:04 PM, Tom Goffrey wrote:
>> Hello,
>>
>> We currently utilise AztecOO GMRES (with condition number estimate) to
>> solve the linear system resulting from the time implicit solution of
>> multi-dimensional hydrodynamics. However we have run into problems when
>> adapting our code to apply a Jacobian free approach.
>>
>> We replaced our previous explicit Jacobian operator with the matrix free
>> operator from the NOX package.
>> (
>> http://trilinos.org/docs/dev/packages/nox/doc/html/classNOX_1_1Epetra_1_1MatrixFree.html
>> )
>>
>> However we find a high proportion of our GMRES solves abort prematurely
>> due to an error relating to an apparent loss of accuracy:
>>
>>         Warning: recursive residual indicates convergence
>>         though the true residual is too large.
>>
>>         Sometimes this occurs when storage is overwritten (e.g. the
>>         solution vector was not dimensioned large enough to hold
>>         external variables). Other times, this is due to roundoff. In
>>         this case, the solution has either converged to the accuracy
>>         of the machine or intermediate roundoff errors occurred
>>         preventing full convergence. In the latter case, try solving
>>         again using the new solution as an initial guess.
>>
>> I didn't manage to find much within Trilinos documentation on this, but
>> from looking through the source code it appears this is triggered by a
>> significant difference between the recursive residual and the true
>> residual, where the true residual is calculated for the solution vector,
>> but the recursive residual is calculated by using an incremental solution
>> vector.
>>
>> Currently we believe the high frequency of this error we see for the JFNK
>> approach is because we have in fact used a non-linear operator (in this
>> case our non-linear residual function) to approximate the linear
>> Jacobian-vector product. As such it is not clear to us that this sort of
>> error checking makes sense in this situation, or perhaps it should be
>> interpreted as a sign the Jacobian free approximation is becoming worse?
>>
>> Presumably I can control this error checking by allowing the two
>> residuals to vary by a greater amount, but I was wondering if others have
>> had similar problems, and had any advice on the issue?
>>
>> Of course, any alternative explanation of the high frequency would be
>> equally appreciated!
>>
>> Thanks,
>>
>> Tom Goffrey
>>
>>
>>
>> _______________________________________________
>> Trilinos-Users mailing list
>>  Trilinos-Users at software.sandia.gov<mailto:
>> Trilinos-Users at software.sandia.gov><mailto:
>> Trilinos-Users at software.sandia.gov<mailto:
>> Trilinos-Users at software.sandia.gov>>
>> https://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20141209/c463264f/attachment.html>


More information about the Trilinos-Users mailing list