[Trilinos-Users] [EXTERNAL] Re: Matrix Free operator, and loss of accuracy error

Roger Pawlowski rppawlo at sandia.gov
Tue Dec 9 08:20:34 MST 2014


Tom,

I forgot to mention, there are a bunch of examples that exercise the 
MatrixFree object with the Aztec solver in the directory:

Trilinos/packages/nox/test/epetra/1Dfem

So you can compare performance of JFNK when encountering this Aztec 
warning against having analytic Jacobians.

Roger


On 12/09/2014 10:02 AM, Tom Goffrey wrote:
> Roger, Mike,
>
> Thanks for all your help on this issue, I think the final email clears 
> up any remaining questions.
>
> Cheers,
>
> Tom
>
> On Tue, Dec 9, 2014 at 1:51 PM, Roger Pawlowski <rppawlo at sandia.gov 
> <mailto:rppawlo at sandia.gov>> wrote:
>
>     Tom,
>
>     For the NOX::Direction::Newton, there is a parameter that you can
>     use to allow for a "failed" Newton step to be used.  Typically,
>     your Krylov solver makes good enough progress and this flag allows
>     for the nonlinear system to converge.  See the parameter "Rescue
>     Bad Newton Solve".  So yes - we are able to nonlinear converge
>     systems even with Aztec reporting this warning.  You do have to be
>     careful in this case - using only a relative residual status test
>     for convergence can result in a "false" convergence due to
>     stagnation if the linear solver is essentially making no
>     progress.  Make sure to use an absolute residual norm as well in
>     your stopping criteria.
>
>     Roger
>
>
>
>     On 12/09/2014 07:51 AM, Tom Goffrey wrote:
>>     Hi Mike,
>>
>>     Sorry for the delay, wanted to be absolutely sure that I had the
>>     details straight before replying.
>>
>>     I agree that this test is only carried out once the recursive
>>     residual indicates convergence. The true residual is then
>>     calculated, and as you say we hit this warning if the difference
>>     is too large.
>>
>>     The trouble we are having is that quite often the difference is
>>     sufficient that the recursive residual implies convergence but
>>     the true residual does not, and we get something like:
>>     ***************************************************************
>>
>>             Warning: recursive residual indicates convergence
>>             though the true residual is too large.
>>
>>             Sometimes this occurs when storage is overwritten (e.g. the
>>             solution vector was not dimensioned large enough to hold
>>             external variables). Other times, this is due to roundoff. In
>>             this case, the solution has either converged to the accuracy
>>             of the machine or intermediate roundoff errors occurred
>>             preventing full convergence. In the latter case, try solving
>>             again using the new solution as an initial guess.
>>
>>             Solver: gmres_condnum
>>             number of iterations:   11
>>
>>             Actual residual =  3.3361e-05 Recursive residual = 
>>     3.4301e-06
>>
>>             Calculated Norms Requested Norm
>>     -------------------------------------------- --------------
>>
>>         ||r||_2 / ||b||_2: 7.213690e-02    1.000000e-02
>>
>>     ***************************************************************
>>
>>
>>     In some cases we are able to restart GMRES based on the solution
>>     we get out and eventually we get a solution that has converged
>>     according to the true residual. However in other cases we seem to
>>     get stuck, repeatedly hitting this error.
>>
>>     Do you have any advice in this situation?
>>
>>     Roger when you say be default NOX doesn't worry about this, are
>>     you basically saying that NOX would treat this situation as an
>>     acceptable convergence?
>>
>>     Thanks,
>>
>>     Tom
>>
>>     On Mon, Dec 8, 2014 at 8:43 PM, Heroux, Mike <MHeroux at csbsju.edu
>>     <mailto:MHeroux at csbsju.edu>> wrote:
>>
>>         As I recall, this test is invoked when the recursive residual
>>         has reached the threshold specified in params[AZ_tol]
>>
>>         Then the true residual is computed. If the absolute value of
>>         the difference between the recursive and true residual is
>>         greater than params[AZ_tol], this error condition is enable.
>>
>>         You could try increasing your value for AZ_tol.  This would
>>         decrease the chance of seeing the error, by not forcing
>>         iterations to continue beyond what is reachable and because
>>         the difference test would be easier to satisfy.
>>
>>         There is some mathematical reasoning behind this test.  Do
>>         you know that continued iterations improves the overall
>>         solution process?
>>
>>         You could, for experimental purposes, modify az_util.c:1413
>>         and make the logic test trivially false.  Then the solver
>>         would continue and you could see the impact of continued
>>         iterations.
>>
>>         There is no “official” way to turn off this test.
>>
>>         Mike
>>
>>         From: Tom Goffrey <t.goffrey at exeter.ac.uk
>>         <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>>         <mailto:t.goffrey at exeter.ac.uk>>>
>>         Date: Monday, December 8, 2014 at 2:12 PM
>>         To: Michael A Heroux <mheroux at csbsju.edu
>>         <mailto:mheroux at csbsju.edu><mailto:mheroux at csbsju.edu
>>         <mailto:mheroux at csbsju.edu>>>
>>         Cc: Roger Pawlowski <rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov>>>, trilinos-users
>>         <trilinos-users at software.sandia.gov
>>         <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>>         <mailto:trilinos-users at software.sandia.gov>>>
>>         Subject: Re: [Trilinos-Users] Matrix Free operator, and loss
>>         of accuracy error
>>
>>         Hi Mike,
>>
>>         The silence I don't mind (can always check return status),
>>         but presumably Aztec will still detect this error and abort
>>         early (i.e. before requested tolerance has been achieved),
>>         albeit silently?
>>         Is there any way I can ask Aztec to ignore this error and
>>         carry on iterating, as I got the impression NOX was capable of?
>>
>>         Thanks,
>>
>>         Tom
>>
>>         On Mon, Dec 8, 2014 at 7:55 PM, Heroux, Mike
>>         <MHeroux at csbsju.edu
>>         <mailto:MHeroux at csbsju.edu><mailto:MHeroux at csbsju.edu
>>         <mailto:MHeroux at csbsju.edu>>> wrote:
>>         Tom,
>>
>>         If AZ_output is set to AZ_none, no output, including this
>>         warning message, will be generated.  This might be too much
>>         “silence” but it will work.
>>
>>         Mike
>>
>>         From: Tom Goffrey <t.goffrey at exeter.ac.uk
>>         <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>>         <mailto:t.goffrey at exeter.ac.uk>><mailto:t.goffrey at exeter.ac.uk <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>>         <mailto:t.goffrey at exeter.ac.uk>>>>
>>         Date: Monday, December 8, 2014 at 1:38 PM
>>         To: Roger Pawlowski <rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov>><mailto:rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov>>>>, trilinos-users
>>         <trilinos-users at software.sandia.gov
>>         <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>>         <mailto:trilinos-users at software.sandia.gov>><mailto:trilinos-users at software.sandia.gov
>>         <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>>         <mailto:trilinos-users at software.sandia.gov>>>>
>>         Subject: Re: [Trilinos-Users] [EXTERNAL] Matrix Free
>>         operator, and loss of accuracy error
>>
>>         Hi Roger,
>>
>>         Thanks for the input. I think perhaps my initial question was
>>         misleading.
>>
>>         We rely on Trilinos to solve the linear system only, the
>>         non-linear portion is handled by our own code. We use the NOX
>>         to provide the operator (interfaced with our own code for
>>         residual evaluations etc), but don't use NOX as the
>>         non-linear solver itself.
>>
>>         I've experimented quite a lot with scaling, and whilst I've
>>         found the frequency of this error does depend on what I use
>>         I've never been able to eradicate it. Do you know of any way
>>         I can convince Aztec to not perform this check, as you
>>         suggest the NOX solvers do?
>>
>>         Many thanks,
>>
>>         Tom
>>
>>
>>
>>         On Mon, Dec 8, 2014 at 6:30 PM, Roger Pawlowski
>>         <rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov>><mailto:rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>>         <mailto:rppawlo at sandia.gov>>>> wrote:
>>         Hi Tom,
>>
>>         This is a known issue.  JFNK uses a directional derivative
>>         for the Jacobian-vector product and Aztec is detecting that
>>         loss of accuracy from the finite differencing.  It could be
>>         indicative of a poorly scaled system of equations and/or
>>         linear solve tolerances in Aztec that are too tight.  You can
>>         switch to a higher order derivative in the JFNK algorithm or
>>         loosen the linear solve tolerances.  If it is poorly scaled,
>>         you can enable row sum scaling of the linear system in the
>>         nox group for the linear solves.  We tend to ignore the
>>         warning as it was an overly stringent check that, if memory
>>         serves, assumed you had an analytic Jacobian.  By default,
>>         nox with JFNK should ignore this warning.  Please send your
>>         output so I can check to make sure that is happening in your
>>         case. Note that if you switch to Belos instead of Aztec
>>         (preferred), this warning will not occur.
>>
>>         Best,
>>         Roger
>>
>>
>>
>>         On 12/08/2014 12:04 PM, Tom Goffrey wrote:
>>         Hello,
>>
>>         We currently utilise AztecOO GMRES (with condition number
>>         estimate) to solve the linear system resulting from the time
>>         implicit solution of multi-dimensional hydrodynamics. However
>>         we have run into problems when adapting our code to apply a
>>         Jacobian free approach.
>>
>>         We replaced our previous explicit Jacobian operator with the
>>         matrix free operator from the NOX package.
>>         (http://trilinos.org/docs/dev/packages/nox/doc/html/classNOX_1_1Epetra_1_1MatrixFree.html)
>>
>>         However we find a high proportion of our GMRES solves abort
>>         prematurely due to an error relating to an apparent loss of
>>         accuracy:
>>
>>                 Warning: recursive residual indicates convergence
>>                 though the true residual is too large.
>>
>>                 Sometimes this occurs when storage is overwritten
>>         (e.g. the
>>                 solution vector was not dimensioned large enough to hold
>>                 external variables). Other times, this is due to
>>         roundoff. In
>>                 this case, the solution has either converged to the
>>         accuracy
>>                 of the machine or intermediate roundoff errors occurred
>>                 preventing full convergence. In the latter case, try
>>         solving
>>                 again using the new solution as an initial guess.
>>
>>         I didn't manage to find much within Trilinos documentation on
>>         this, but from looking through the source code it appears
>>         this is triggered by a significant difference between the
>>         recursive residual and the true residual, where the true
>>         residual is calculated for the solution vector, but the
>>         recursive residual is calculated by using an incremental
>>         solution vector.
>>
>>         Currently we believe the high frequency of this error we see
>>         for the JFNK approach is because we have in fact used a
>>         non-linear operator (in this case our non-linear residual
>>         function) to approximate the linear Jacobian-vector product.
>>         As such it is not clear to us that this sort of error
>>         checking makes sense in this situation, or perhaps it should
>>         be interpreted as a sign the Jacobian free approximation is
>>         becoming worse?
>>
>>         Presumably I can control this error checking by allowing the
>>         two residuals to vary by a greater amount, but I was
>>         wondering if others have had similar problems, and had any
>>         advice on the issue?
>>
>>         Of course, any alternative explanation of the high frequency
>>         would be equally appreciated!
>>
>>         Thanks,
>>
>>         Tom Goffrey
>>
>>
>>
>>         _______________________________________________
>>         Trilinos-Users mailing list
>>         Trilinos-Users at software.sandia.gov
>>         <mailto:Trilinos-Users at software.sandia.gov><mailto:Trilinos-Users at software.sandia.gov
>>         <mailto:Trilinos-Users at software.sandia.gov>><mailto:Trilinos-Users at software.sandia.gov
>>         <mailto:Trilinos-Users at software.sandia.gov><mailto:Trilinos-Users at software.sandia.gov
>>         <mailto:Trilinos-Users at software.sandia.gov>>>https://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20141209/5598d331/attachment-0001.html>


More information about the Trilinos-Users mailing list