[Trilinos-Users] [EXTERNAL] Re: Matrix Free operator, and loss of accuracy error
Roger Pawlowski
rppawlo at sandia.gov
Tue Dec 9 08:20:34 MST 2014
Tom,
I forgot to mention, there are a bunch of examples that exercise the
MatrixFree object with the Aztec solver in the directory:
Trilinos/packages/nox/test/epetra/1Dfem
So you can compare performance of JFNK when encountering this Aztec
warning against having analytic Jacobians.
Roger
On 12/09/2014 10:02 AM, Tom Goffrey wrote:
> Roger, Mike,
>
> Thanks for all your help on this issue, I think the final email clears
> up any remaining questions.
>
> Cheers,
>
> Tom
>
> On Tue, Dec 9, 2014 at 1:51 PM, Roger Pawlowski <rppawlo at sandia.gov
> <mailto:rppawlo at sandia.gov>> wrote:
>
> Tom,
>
> For the NOX::Direction::Newton, there is a parameter that you can
> use to allow for a "failed" Newton step to be used. Typically,
> your Krylov solver makes good enough progress and this flag allows
> for the nonlinear system to converge. See the parameter "Rescue
> Bad Newton Solve". So yes - we are able to nonlinear converge
> systems even with Aztec reporting this warning. You do have to be
> careful in this case - using only a relative residual status test
> for convergence can result in a "false" convergence due to
> stagnation if the linear solver is essentially making no
> progress. Make sure to use an absolute residual norm as well in
> your stopping criteria.
>
> Roger
>
>
>
> On 12/09/2014 07:51 AM, Tom Goffrey wrote:
>> Hi Mike,
>>
>> Sorry for the delay, wanted to be absolutely sure that I had the
>> details straight before replying.
>>
>> I agree that this test is only carried out once the recursive
>> residual indicates convergence. The true residual is then
>> calculated, and as you say we hit this warning if the difference
>> is too large.
>>
>> The trouble we are having is that quite often the difference is
>> sufficient that the recursive residual implies convergence but
>> the true residual does not, and we get something like:
>> ***************************************************************
>>
>> Warning: recursive residual indicates convergence
>> though the true residual is too large.
>>
>> Sometimes this occurs when storage is overwritten (e.g. the
>> solution vector was not dimensioned large enough to hold
>> external variables). Other times, this is due to roundoff. In
>> this case, the solution has either converged to the accuracy
>> of the machine or intermediate roundoff errors occurred
>> preventing full convergence. In the latter case, try solving
>> again using the new solution as an initial guess.
>>
>> Solver: gmres_condnum
>> number of iterations: 11
>>
>> Actual residual = 3.3361e-05 Recursive residual =
>> 3.4301e-06
>>
>> Calculated Norms Requested Norm
>> -------------------------------------------- --------------
>>
>> ||r||_2 / ||b||_2: 7.213690e-02 1.000000e-02
>>
>> ***************************************************************
>>
>>
>> In some cases we are able to restart GMRES based on the solution
>> we get out and eventually we get a solution that has converged
>> according to the true residual. However in other cases we seem to
>> get stuck, repeatedly hitting this error.
>>
>> Do you have any advice in this situation?
>>
>> Roger when you say be default NOX doesn't worry about this, are
>> you basically saying that NOX would treat this situation as an
>> acceptable convergence?
>>
>> Thanks,
>>
>> Tom
>>
>> On Mon, Dec 8, 2014 at 8:43 PM, Heroux, Mike <MHeroux at csbsju.edu
>> <mailto:MHeroux at csbsju.edu>> wrote:
>>
>> As I recall, this test is invoked when the recursive residual
>> has reached the threshold specified in params[AZ_tol]
>>
>> Then the true residual is computed. If the absolute value of
>> the difference between the recursive and true residual is
>> greater than params[AZ_tol], this error condition is enable.
>>
>> You could try increasing your value for AZ_tol. This would
>> decrease the chance of seeing the error, by not forcing
>> iterations to continue beyond what is reachable and because
>> the difference test would be easier to satisfy.
>>
>> There is some mathematical reasoning behind this test. Do
>> you know that continued iterations improves the overall
>> solution process?
>>
>> You could, for experimental purposes, modify az_util.c:1413
>> and make the logic test trivially false. Then the solver
>> would continue and you could see the impact of continued
>> iterations.
>>
>> There is no “official” way to turn off this test.
>>
>> Mike
>>
>> From: Tom Goffrey <t.goffrey at exeter.ac.uk
>> <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>> <mailto:t.goffrey at exeter.ac.uk>>>
>> Date: Monday, December 8, 2014 at 2:12 PM
>> To: Michael A Heroux <mheroux at csbsju.edu
>> <mailto:mheroux at csbsju.edu><mailto:mheroux at csbsju.edu
>> <mailto:mheroux at csbsju.edu>>>
>> Cc: Roger Pawlowski <rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov>>>, trilinos-users
>> <trilinos-users at software.sandia.gov
>> <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>> <mailto:trilinos-users at software.sandia.gov>>>
>> Subject: Re: [Trilinos-Users] Matrix Free operator, and loss
>> of accuracy error
>>
>> Hi Mike,
>>
>> The silence I don't mind (can always check return status),
>> but presumably Aztec will still detect this error and abort
>> early (i.e. before requested tolerance has been achieved),
>> albeit silently?
>> Is there any way I can ask Aztec to ignore this error and
>> carry on iterating, as I got the impression NOX was capable of?
>>
>> Thanks,
>>
>> Tom
>>
>> On Mon, Dec 8, 2014 at 7:55 PM, Heroux, Mike
>> <MHeroux at csbsju.edu
>> <mailto:MHeroux at csbsju.edu><mailto:MHeroux at csbsju.edu
>> <mailto:MHeroux at csbsju.edu>>> wrote:
>> Tom,
>>
>> If AZ_output is set to AZ_none, no output, including this
>> warning message, will be generated. This might be too much
>> “silence” but it will work.
>>
>> Mike
>>
>> From: Tom Goffrey <t.goffrey at exeter.ac.uk
>> <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>> <mailto:t.goffrey at exeter.ac.uk>><mailto:t.goffrey at exeter.ac.uk <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>> <mailto:t.goffrey at exeter.ac.uk>>>>
>> Date: Monday, December 8, 2014 at 1:38 PM
>> To: Roger Pawlowski <rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov>><mailto:rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov>>>>, trilinos-users
>> <trilinos-users at software.sandia.gov
>> <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>> <mailto:trilinos-users at software.sandia.gov>><mailto:trilinos-users at software.sandia.gov
>> <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>> <mailto:trilinos-users at software.sandia.gov>>>>
>> Subject: Re: [Trilinos-Users] [EXTERNAL] Matrix Free
>> operator, and loss of accuracy error
>>
>> Hi Roger,
>>
>> Thanks for the input. I think perhaps my initial question was
>> misleading.
>>
>> We rely on Trilinos to solve the linear system only, the
>> non-linear portion is handled by our own code. We use the NOX
>> to provide the operator (interfaced with our own code for
>> residual evaluations etc), but don't use NOX as the
>> non-linear solver itself.
>>
>> I've experimented quite a lot with scaling, and whilst I've
>> found the frequency of this error does depend on what I use
>> I've never been able to eradicate it. Do you know of any way
>> I can convince Aztec to not perform this check, as you
>> suggest the NOX solvers do?
>>
>> Many thanks,
>>
>> Tom
>>
>>
>>
>> On Mon, Dec 8, 2014 at 6:30 PM, Roger Pawlowski
>> <rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov>><mailto:rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>> <mailto:rppawlo at sandia.gov>>>> wrote:
>> Hi Tom,
>>
>> This is a known issue. JFNK uses a directional derivative
>> for the Jacobian-vector product and Aztec is detecting that
>> loss of accuracy from the finite differencing. It could be
>> indicative of a poorly scaled system of equations and/or
>> linear solve tolerances in Aztec that are too tight. You can
>> switch to a higher order derivative in the JFNK algorithm or
>> loosen the linear solve tolerances. If it is poorly scaled,
>> you can enable row sum scaling of the linear system in the
>> nox group for the linear solves. We tend to ignore the
>> warning as it was an overly stringent check that, if memory
>> serves, assumed you had an analytic Jacobian. By default,
>> nox with JFNK should ignore this warning. Please send your
>> output so I can check to make sure that is happening in your
>> case. Note that if you switch to Belos instead of Aztec
>> (preferred), this warning will not occur.
>>
>> Best,
>> Roger
>>
>>
>>
>> On 12/08/2014 12:04 PM, Tom Goffrey wrote:
>> Hello,
>>
>> We currently utilise AztecOO GMRES (with condition number
>> estimate) to solve the linear system resulting from the time
>> implicit solution of multi-dimensional hydrodynamics. However
>> we have run into problems when adapting our code to apply a
>> Jacobian free approach.
>>
>> We replaced our previous explicit Jacobian operator with the
>> matrix free operator from the NOX package.
>> (http://trilinos.org/docs/dev/packages/nox/doc/html/classNOX_1_1Epetra_1_1MatrixFree.html)
>>
>> However we find a high proportion of our GMRES solves abort
>> prematurely due to an error relating to an apparent loss of
>> accuracy:
>>
>> Warning: recursive residual indicates convergence
>> though the true residual is too large.
>>
>> Sometimes this occurs when storage is overwritten
>> (e.g. the
>> solution vector was not dimensioned large enough to hold
>> external variables). Other times, this is due to
>> roundoff. In
>> this case, the solution has either converged to the
>> accuracy
>> of the machine or intermediate roundoff errors occurred
>> preventing full convergence. In the latter case, try
>> solving
>> again using the new solution as an initial guess.
>>
>> I didn't manage to find much within Trilinos documentation on
>> this, but from looking through the source code it appears
>> this is triggered by a significant difference between the
>> recursive residual and the true residual, where the true
>> residual is calculated for the solution vector, but the
>> recursive residual is calculated by using an incremental
>> solution vector.
>>
>> Currently we believe the high frequency of this error we see
>> for the JFNK approach is because we have in fact used a
>> non-linear operator (in this case our non-linear residual
>> function) to approximate the linear Jacobian-vector product.
>> As such it is not clear to us that this sort of error
>> checking makes sense in this situation, or perhaps it should
>> be interpreted as a sign the Jacobian free approximation is
>> becoming worse?
>>
>> Presumably I can control this error checking by allowing the
>> two residuals to vary by a greater amount, but I was
>> wondering if others have had similar problems, and had any
>> advice on the issue?
>>
>> Of course, any alternative explanation of the high frequency
>> would be equally appreciated!
>>
>> Thanks,
>>
>> Tom Goffrey
>>
>>
>>
>> _______________________________________________
>> Trilinos-Users mailing list
>> Trilinos-Users at software.sandia.gov
>> <mailto:Trilinos-Users at software.sandia.gov><mailto:Trilinos-Users at software.sandia.gov
>> <mailto:Trilinos-Users at software.sandia.gov>><mailto:Trilinos-Users at software.sandia.gov
>> <mailto:Trilinos-Users at software.sandia.gov><mailto:Trilinos-Users at software.sandia.gov
>> <mailto:Trilinos-Users at software.sandia.gov>>>https://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20141209/5598d331/attachment-0001.html>
More information about the Trilinos-Users
mailing list