[Trilinos-Users] [EXTERNAL] Re: Matrix Free operator, and loss of accuracy error
Tom Goffrey
t.goffrey at exeter.ac.uk
Tue Dec 9 08:02:40 MST 2014
Roger, Mike,
Thanks for all your help on this issue, I think the final email clears up
any remaining questions.
Cheers,
Tom
On Tue, Dec 9, 2014 at 1:51 PM, Roger Pawlowski <rppawlo at sandia.gov> wrote:
> Tom,
>
> For the NOX::Direction::Newton, there is a parameter that you can use to
> allow for a "failed" Newton step to be used. Typically, your Krylov solver
> makes good enough progress and this flag allows for the nonlinear system to
> converge. See the parameter "Rescue Bad Newton Solve". So yes - we are
> able to nonlinear converge systems even with Aztec reporting this warning.
> You do have to be careful in this case - using only a relative residual
> status test for convergence can result in a "false" convergence due to
> stagnation if the linear solver is essentially making no progress. Make
> sure to use an absolute residual norm as well in your stopping criteria.
>
> Roger
>
>
>
> On 12/09/2014 07:51 AM, Tom Goffrey wrote:
>
> Hi Mike,
>
> Sorry for the delay, wanted to be absolutely sure that I had the details
> straight before replying.
>
> I agree that this test is only carried out once the recursive residual
> indicates convergence. The true residual is then calculated, and as you say
> we hit this warning if the difference is too large.
>
> The trouble we are having is that quite often the difference is
> sufficient that the recursive residual implies convergence but the true
> residual does not, and we get something like:
> ***************************************************************
>
> Warning: recursive residual indicates convergence
> though the true residual is too large.
>
> Sometimes this occurs when storage is overwritten (e.g. the
> solution vector was not dimensioned large enough to hold
> external variables). Other times, this is due to roundoff. In
> this case, the solution has either converged to the accuracy
> of the machine or intermediate roundoff errors occurred
> preventing full convergence. In the latter case, try solving
> again using the new solution as an initial guess.
>
> Solver: gmres_condnum
> number of iterations: 11
>
> Actual residual = 3.3361e-05 Recursive residual = 3.4301e-06
>
> Calculated Norms Requested Norm
> -------------------------------------------- --------------
>
> ||r||_2 / ||b||_2: 7.213690e-02 1.000000e-02
>
> ***************************************************************
>
>
> In some cases we are able to restart GMRES based on the solution we get
> out and eventually we get a solution that has converged according to the
> true residual. However in other cases we seem to get stuck, repeatedly
> hitting this error.
>
> Do you have any advice in this situation?
>
> Roger when you say be default NOX doesn't worry about this, are you
> basically saying that NOX would treat this situation as an acceptable
> convergence?
>
> Thanks,
>
> Tom
>
> On Mon, Dec 8, 2014 at 8:43 PM, Heroux, Mike <MHeroux at csbsju.edu> wrote:
>
>> As I recall, this test is invoked when the recursive residual has reached
>> the threshold specified in params[AZ_tol]
>>
>> Then the true residual is computed. If the absolute value of the
>> difference between the recursive and true residual is greater than
>> params[AZ_tol], this error condition is enable.
>>
>> You could try increasing your value for AZ_tol. This would decrease the
>> chance of seeing the error, by not forcing iterations to continue beyond
>> what is reachable and because the difference test would be easier to
>> satisfy.
>>
>> There is some mathematical reasoning behind this test. Do you know that
>> continued iterations improves the overall solution process?
>>
>> You could, for experimental purposes, modify az_util.c:1413 and make the
>> logic test trivially false. Then the solver would continue and you could
>> see the impact of continued iterations.
>>
>> There is no “official” way to turn off this test.
>>
>> Mike
>>
>> From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>
>> Date: Monday, December 8, 2014 at 2:12 PM
>> To: Michael A Heroux <mheroux at csbsju.edu<mailto:mheroux at csbsju.edu>>
>> Cc: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>,
>> trilinos-users <trilinos-users at software.sandia.gov<mailto:
>> trilinos-users at software.sandia.gov>>
>> Subject: Re: [Trilinos-Users] Matrix Free operator, and loss of accuracy
>> error
>>
>> Hi Mike,
>>
>> The silence I don't mind (can always check return status), but presumably
>> Aztec will still detect this error and abort early (i.e. before requested
>> tolerance has been achieved), albeit silently?
>> Is there any way I can ask Aztec to ignore this error and carry on
>> iterating, as I got the impression NOX was capable of?
>>
>> Thanks,
>>
>> Tom
>>
>> On Mon, Dec 8, 2014 at 7:55 PM, Heroux, Mike <MHeroux at csbsju.edu<mailto:
>> MHeroux at csbsju.edu>> wrote:
>> Tom,
>>
>> If AZ_output is set to AZ_none, no output, including this warning
>> message, will be generated. This might be too much “silence” but it will
>> work.
>>
>> Mike
>>
>> From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk
>> ><mailto:t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>>
>> Date: Monday, December 8, 2014 at 1:38 PM
>> To: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov
>> ><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>>, trilinos-users
>> <trilinos-users at software.sandia.gov<mailto:
>> trilinos-users at software.sandia.gov><mailto:
>> trilinos-users at software.sandia.gov<mailto:
>> trilinos-users at software.sandia.gov>>>
>> Subject: Re: [Trilinos-Users] [EXTERNAL] Matrix Free operator, and loss
>> of accuracy error
>>
>> Hi Roger,
>>
>> Thanks for the input. I think perhaps my initial question was misleading.
>>
>> We rely on Trilinos to solve the linear system only, the non-linear
>> portion is handled by our own code. We use the NOX to provide the operator
>> (interfaced with our own code for residual evaluations etc), but don't use
>> NOX as the non-linear solver itself.
>>
>> I've experimented quite a lot with scaling, and whilst I've found the
>> frequency of this error does depend on what I use I've never been able to
>> eradicate it. Do you know of any way I can convince Aztec to not perform
>> this check, as you suggest the NOX solvers do?
>>
>> Many thanks,
>>
>> Tom
>>
>>
>>
>> On Mon, Dec 8, 2014 at 6:30 PM, Roger Pawlowski <rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov<mailto:
>> rppawlo at sandia.gov>>> wrote:
>> Hi Tom,
>>
>> This is a known issue. JFNK uses a directional derivative for the
>> Jacobian-vector product and Aztec is detecting that loss of accuracy from
>> the finite differencing. It could be indicative of a poorly scaled system
>> of equations and/or linear solve tolerances in Aztec that are too tight.
>> You can switch to a higher order derivative in the JFNK algorithm or loosen
>> the linear solve tolerances. If it is poorly scaled, you can enable row
>> sum scaling of the linear system in the nox group for the linear solves.
>> We tend to ignore the warning as it was an overly stringent check that, if
>> memory serves, assumed you had an analytic Jacobian. By default, nox with
>> JFNK should ignore this warning. Please send your output so I can check to
>> make sure that is happening in your case. Note that if you switch to Belos
>> instead of Aztec (preferred), this warning will not occur.
>>
>> Best,
>> Roger
>>
>>
>>
>> On 12/08/2014 12:04 PM, Tom Goffrey wrote:
>> Hello,
>>
>> We currently utilise AztecOO GMRES (with condition number estimate) to
>> solve the linear system resulting from the time implicit solution of
>> multi-dimensional hydrodynamics. However we have run into problems when
>> adapting our code to apply a Jacobian free approach.
>>
>> We replaced our previous explicit Jacobian operator with the matrix free
>> operator from the NOX package.
>> (
>> http://trilinos.org/docs/dev/packages/nox/doc/html/classNOX_1_1Epetra_1_1MatrixFree.html
>> )
>>
>> However we find a high proportion of our GMRES solves abort prematurely
>> due to an error relating to an apparent loss of accuracy:
>>
>> Warning: recursive residual indicates convergence
>> though the true residual is too large.
>>
>> Sometimes this occurs when storage is overwritten (e.g. the
>> solution vector was not dimensioned large enough to hold
>> external variables). Other times, this is due to roundoff. In
>> this case, the solution has either converged to the accuracy
>> of the machine or intermediate roundoff errors occurred
>> preventing full convergence. In the latter case, try solving
>> again using the new solution as an initial guess.
>>
>> I didn't manage to find much within Trilinos documentation on this, but
>> from looking through the source code it appears this is triggered by a
>> significant difference between the recursive residual and the true
>> residual, where the true residual is calculated for the solution vector,
>> but the recursive residual is calculated by using an incremental solution
>> vector.
>>
>> Currently we believe the high frequency of this error we see for the JFNK
>> approach is because we have in fact used a non-linear operator (in this
>> case our non-linear residual function) to approximate the linear
>> Jacobian-vector product. As such it is not clear to us that this sort of
>> error checking makes sense in this situation, or perhaps it should be
>> interpreted as a sign the Jacobian free approximation is becoming worse?
>>
>> Presumably I can control this error checking by allowing the two
>> residuals to vary by a greater amount, but I was wondering if others have
>> had similar problems, and had any advice on the issue?
>>
>> Of course, any alternative explanation of the high frequency would be
>> equally appreciated!
>>
>> Thanks,
>>
>> Tom Goffrey
>>
>>
>>
>> _______________________________________________
>> Trilinos-Users mailing list
>> Trilinos-Users at software.sandia.gov<mailto:
>> Trilinos-Users at software.sandia.gov><mailto:
>> Trilinos-Users at software.sandia.gov<mailto:
>> Trilinos-Users at software.sandia.gov>>
>> https://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20141209/c463264f/attachment.html>
More information about the Trilinos-Users
mailing list