[Trilinos-Users] [EXTERNAL] Re: Matrix Free operator, and loss of accuracy error

Roger Pawlowski rppawlo at sandia.gov
Tue Dec 9 06:51:05 MST 2014


Tom,

For the NOX::Direction::Newton, there is a parameter that you can use to 
allow for a "failed" Newton step to be used.  Typically, your Krylov 
solver makes good enough progress and this flag allows for the nonlinear 
system to converge.  See the parameter "Rescue Bad Newton Solve".  So 
yes - we are able to nonlinear converge systems even with Aztec 
reporting this warning.  You do have to be careful in this case - using 
only a relative residual status test for convergence can result in a 
"false" convergence due to stagnation if the linear solver is 
essentially making no progress.  Make sure to use an absolute residual 
norm as well in your stopping criteria.

Roger


On 12/09/2014 07:51 AM, Tom Goffrey wrote:
> Hi Mike,
>
> Sorry for the delay, wanted to be absolutely sure that I had the 
> details straight before replying.
>
> I agree that this test is only carried out once the recursive residual 
> indicates convergence. The true residual is then calculated, and as 
> you say we hit this warning if the difference is too large.
>
> The trouble we are having is that quite often the difference is 
> sufficient that the recursive residual implies convergence but the 
> true residual does not, and we get something like:
> ***************************************************************
>
>         Warning: recursive residual indicates convergence
>         though the true residual is too large.
>
>         Sometimes this occurs when storage is overwritten (e.g. the
>         solution vector was not dimensioned large enough to hold
>         external variables). Other times, this is due to roundoff. In
>         this case, the solution has either converged to the accuracy
>         of the machine or intermediate roundoff errors occurred
>         preventing full convergence. In the latter case, try solving
>         again using the new solution as an initial guess.
>
>         Solver:                 gmres_condnum
>         number of iterations:   11
>
>         Actual residual =  3.3361e-05   Recursive residual =  3.4301e-06
>
>         Calculated Norms Requested Norm
>         -------------------------------------------- --------------
>
>     ||r||_2 / ||b||_2:              7.213690e-02 1.000000e-02
>
> ***************************************************************
>
>
> In some cases we are able to restart GMRES based on the solution we 
> get out and eventually we get a solution that has converged according 
> to the true residual. However in other cases we seem to get stuck, 
> repeatedly hitting this error.
>
> Do you have any advice in this situation?
>
> Roger when you say be default NOX doesn't worry about this, are you 
> basically saying that NOX would treat this situation as an acceptable 
> convergence?
>
> Thanks,
>
> Tom
>
> On Mon, Dec 8, 2014 at 8:43 PM, Heroux, Mike <MHeroux at csbsju.edu 
> <mailto:MHeroux at csbsju.edu>> wrote:
>
>     As I recall, this test is invoked when the recursive residual has
>     reached the threshold specified in params[AZ_tol]
>
>     Then the true residual is computed. If the absolute value of the
>     difference between the recursive and true residual is greater than
>     params[AZ_tol], this error condition is enable.
>
>     You could try increasing your value for AZ_tol.  This would
>     decrease the chance of seeing the error, by not forcing iterations
>     to continue beyond what is reachable and because the difference
>     test would be easier to satisfy.
>
>     There is some mathematical reasoning behind this test.  Do you
>     know that continued iterations improves the overall solution process?
>
>     You could, for experimental purposes, modify az_util.c:1413 and
>     make the logic test trivially false.  Then the solver would
>     continue and you could see the impact of continued iterations.
>
>     There is no “official” way to turn off this test.
>
>     Mike
>
>     From: Tom Goffrey <t.goffrey at exeter.ac.uk
>     <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>     <mailto:t.goffrey at exeter.ac.uk>>>
>     Date: Monday, December 8, 2014 at 2:12 PM
>     To: Michael A Heroux <mheroux at csbsju.edu
>     <mailto:mheroux at csbsju.edu><mailto:mheroux at csbsju.edu
>     <mailto:mheroux at csbsju.edu>>>
>     Cc: Roger Pawlowski <rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov>>>, trilinos-users
>     <trilinos-users at software.sandia.gov
>     <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>     <mailto:trilinos-users at software.sandia.gov>>>
>     Subject: Re: [Trilinos-Users] Matrix Free operator, and loss of
>     accuracy error
>
>     Hi Mike,
>
>     The silence I don't mind (can always check return status), but
>     presumably Aztec will still detect this error and abort early
>     (i.e. before requested tolerance has been achieved), albeit silently?
>     Is there any way I can ask Aztec to ignore this error and carry on
>     iterating, as I got the impression NOX was capable of?
>
>     Thanks,
>
>     Tom
>
>     On Mon, Dec 8, 2014 at 7:55 PM, Heroux, Mike <MHeroux at csbsju.edu
>     <mailto:MHeroux at csbsju.edu><mailto:MHeroux at csbsju.edu
>     <mailto:MHeroux at csbsju.edu>>> wrote:
>     Tom,
>
>     If AZ_output is set to AZ_none, no output, including this warning
>     message, will be generated.  This might be too much “silence” but
>     it will work.
>
>     Mike
>
>     From: Tom Goffrey <t.goffrey at exeter.ac.uk
>     <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>     <mailto:t.goffrey at exeter.ac.uk>><mailto:t.goffrey at exeter.ac.uk
>     <mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk
>     <mailto:t.goffrey at exeter.ac.uk>>>>
>     Date: Monday, December 8, 2014 at 1:38 PM
>     To: Roger Pawlowski <rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov>><mailto:rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov>>>>, trilinos-users
>     <trilinos-users at software.sandia.gov
>     <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>     <mailto:trilinos-users at software.sandia.gov>><mailto:trilinos-users at software.sandia.gov
>     <mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov
>     <mailto:trilinos-users at software.sandia.gov>>>>
>     Subject: Re: [Trilinos-Users] [EXTERNAL] Matrix Free operator, and
>     loss of accuracy error
>
>     Hi Roger,
>
>     Thanks for the input. I think perhaps my initial question was
>     misleading.
>
>     We rely on Trilinos to solve the linear system only, the
>     non-linear portion is handled by our own code. We use the NOX to
>     provide the operator (interfaced with our own code for residual
>     evaluations etc), but don't use NOX as the non-linear solver itself.
>
>     I've experimented quite a lot with scaling, and whilst I've found
>     the frequency of this error does depend on what I use I've never
>     been able to eradicate it. Do you know of any way I can convince
>     Aztec to not perform this check, as you suggest the NOX solvers do?
>
>     Many thanks,
>
>     Tom
>
>
>
>     On Mon, Dec 8, 2014 at 6:30 PM, Roger Pawlowski
>     <rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov>><mailto:rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov
>     <mailto:rppawlo at sandia.gov>>>> wrote:
>     Hi Tom,
>
>     This is a known issue.  JFNK uses a directional derivative for the
>     Jacobian-vector product and Aztec is detecting that loss of
>     accuracy from the finite differencing.  It could be indicative of
>     a poorly scaled system of equations and/or linear solve tolerances
>     in Aztec that are too tight.  You can switch to a higher order
>     derivative in the JFNK algorithm or loosen the linear solve
>     tolerances.  If it is poorly scaled, you can enable row sum
>     scaling of the linear system in the nox group for the linear
>     solves.  We tend to ignore the warning as it was an overly
>     stringent check that, if memory serves, assumed you had an
>     analytic Jacobian.  By default, nox with JFNK should ignore this
>     warning. Please send your output so I can check to make sure that
>     is happening in your case. Note that if you switch to Belos
>     instead of Aztec (preferred), this warning will not occur.
>
>     Best,
>     Roger
>
>
>
>     On 12/08/2014 12:04 PM, Tom Goffrey wrote:
>     Hello,
>
>     We currently utilise AztecOO GMRES (with condition number
>     estimate) to solve the linear system resulting from the time
>     implicit solution of multi-dimensional hydrodynamics. However we
>     have run into problems when adapting our code to apply a Jacobian
>     free approach.
>
>     We replaced our previous explicit Jacobian operator with the
>     matrix free operator from the NOX package.
>     (http://trilinos.org/docs/dev/packages/nox/doc/html/classNOX_1_1Epetra_1_1MatrixFree.html)
>
>     However we find a high proportion of our GMRES solves abort
>     prematurely due to an error relating to an apparent loss of accuracy:
>
>             Warning: recursive residual indicates convergence
>             though the true residual is too large.
>
>             Sometimes this occurs when storage is overwritten (e.g. the
>             solution vector was not dimensioned large enough to hold
>             external variables). Other times, this is due to roundoff. In
>             this case, the solution has either converged to the accuracy
>             of the machine or intermediate roundoff errors occurred
>             preventing full convergence. In the latter case, try solving
>             again using the new solution as an initial guess.
>
>     I didn't manage to find much within Trilinos documentation on
>     this, but from looking through the source code it appears this is
>     triggered by a significant difference between the recursive
>     residual and the true residual, where the true residual is
>     calculated for the solution vector, but the recursive residual is
>     calculated by using an incremental solution vector.
>
>     Currently we believe the high frequency of this error we see for
>     the JFNK approach is because we have in fact used a non-linear
>     operator (in this case our non-linear residual function) to
>     approximate the linear Jacobian-vector product. As such it is not
>     clear to us that this sort of error checking makes sense in this
>     situation, or perhaps it should be interpreted as a sign the
>     Jacobian free approximation is becoming worse?
>
>     Presumably I can control this error checking by allowing the two
>     residuals to vary by a greater amount, but I was wondering if
>     others have had similar problems, and had any advice on the issue?
>
>     Of course, any alternative explanation of the high frequency would
>     be equally appreciated!
>
>     Thanks,
>
>     Tom Goffrey
>
>
>
>     _______________________________________________
>     Trilinos-Users mailing list
>     Trilinos-Users at software.sandia.gov
>     <mailto:Trilinos-Users at software.sandia.gov><mailto:Trilinos-Users at software.sandia.gov
>     <mailto:Trilinos-Users at software.sandia.gov>><mailto:Trilinos-Users at software.sandia.gov
>     <mailto:Trilinos-Users at software.sandia.gov><mailto:Trilinos-Users at software.sandia.gov
>     <mailto:Trilinos-Users at software.sandia.gov>>>https://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20141209/c14cf49f/attachment-0001.html>


More information about the Trilinos-Users mailing list