[Trilinos-Users] Matrix Free operator, and loss of accuracy error

Heroux, Mike MHeroux at CSBSJU.EDU
Tue Dec 9 08:14:23 MST 2014


Tom,

If you have subsequent issues controlling the behavior of the standard convergence testing in AztecOO, there is an API for creating your own status tests.  The base class, AztecOO_StatusTest has several configurable derived classes, and you can combine those with your own custom tests.  This is the best way to have complete control over the iteration process.

The base class is defined here:  http://trilinos.org/docs/r11.10/packages/aztecoo/doc/html/classAztecOO__StatusTest.html

Mike

From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>
Date: Tuesday, December 9, 2014 at 9:02 AM
To: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>
Cc: Michael A Heroux <mheroux at csbsju.edu<mailto:mheroux at csbsju.edu>>, trilinos-users <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Subject: Re: [EXTERNAL] Re: [Trilinos-Users] Matrix Free operator, and loss of accuracy error

Roger, Mike,

Thanks for all your help on this issue, I think the final email clears up any remaining questions.

Cheers,

Tom

On Tue, Dec 9, 2014 at 1:51 PM, Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>> wrote:
Tom,

For the NOX::Direction::Newton, there is a parameter that you can use to allow for a "failed" Newton step to be used.  Typically, your Krylov solver makes good enough progress and this flag allows for the nonlinear system to converge.  See the parameter "Rescue Bad Newton Solve".  So yes - we are able to nonlinear converge systems even with Aztec reporting this warning.  You do have to be careful in this case - using only a relative residual status test for convergence can result in a "false" convergence due to stagnation if the linear solver is essentially making no progress.  Make sure to use an absolute residual norm as well in your stopping criteria.

Roger



On 12/09/2014 07:51 AM, Tom Goffrey wrote:
Hi Mike,

Sorry for the delay, wanted to be absolutely sure that I had the details straight before replying.

I agree that this test is only carried out once the recursive residual indicates convergence. The true residual is then calculated, and as you say we hit this warning if the difference is too large.

The trouble we are having is that quite often the difference is sufficient that the recursive residual implies convergence but the true residual does not, and we get something like:
        ***************************************************************

        Warning: recursive residual indicates convergence
        though the true residual is too large.

        Sometimes this occurs when storage is overwritten (e.g. the
        solution vector was not dimensioned large enough to hold
        external variables). Other times, this is due to roundoff. In
        this case, the solution has either converged to the accuracy
        of the machine or intermediate roundoff errors occurred
        preventing full convergence. In the latter case, try solving
        again using the new solution as an initial guess.

        Solver:                 gmres_condnum
        number of iterations:   11

        Actual residual =  3.3361e-05   Recursive residual =  3.4301e-06

        Calculated Norms                                Requested Norm
        --------------------------------------------    --------------

    ||r||_2 / ||b||_2:              7.213690e-02    1.000000e-02

    ***************************************************************


In some cases we are able to restart GMRES based on the solution we get out and eventually we get a solution that has converged according to the true residual. However in other cases we seem to get stuck, repeatedly hitting this error.

Do you have any advice in this situation?

Roger when you say be default NOX doesn't worry about this, are you basically saying that NOX would treat this situation as an acceptable convergence?

Thanks,

Tom

On Mon, Dec 8, 2014 at 8:43 PM, Heroux, Mike <MHeroux at csbsju.edu<mailto:MHeroux at csbsju.edu>> wrote:
As I recall, this test is invoked when the recursive residual has reached the threshold specified in params[AZ_tol]

Then the true residual is computed. If the absolute value of the difference between the recursive and true residual is greater than params[AZ_tol], this error condition is enable.

You could try increasing your value for AZ_tol.  This would decrease the chance of seeing the error, by not forcing iterations to continue beyond what is reachable and because the difference test would be easier to satisfy.

There is some mathematical reasoning behind this test.  Do you know that continued iterations improves the overall solution process?

You could, for experimental purposes, modify az_util.c:1413 and make the logic test trivially false.  Then the solver would continue and you could see the impact of continued iterations.

There is no “official” way to turn off this test.

Mike

From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>>
Date: Monday, December 8, 2014 at 2:12 PM
To: Michael A Heroux <mheroux at csbsju.edu<mailto:mheroux at csbsju.edu><mailto:mheroux at csbsju.edu<mailto:mheroux at csbsju.edu>>>
Cc: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>>, trilinos-users <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>>
Subject: Re: [Trilinos-Users] Matrix Free operator, and loss of accuracy error

Hi Mike,

The silence I don't mind (can always check return status), but presumably Aztec will still detect this error and abort early (i.e. before requested tolerance has been achieved), albeit silently?
Is there any way I can ask Aztec to ignore this error and carry on iterating, as I got the impression NOX was capable of?

Thanks,

Tom

On Mon, Dec 8, 2014 at 7:55 PM, Heroux, Mike <MHeroux at csbsju.edu<mailto:MHeroux at csbsju.edu><mailto:MHeroux at csbsju.edu<mailto:MHeroux at csbsju.edu>>> wrote:
Tom,

If AZ_output is set to AZ_none, no output, including this warning message, will be generated.  This might be too much “silence” but it will work.

Mike

From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>><mailto:t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk><mailto:t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>>>
Date: Monday, December 8, 2014 at 1:38 PM
To: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>>>, trilinos-users <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>><mailto:trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>>>
Subject: Re: [Trilinos-Users] [EXTERNAL] Matrix Free operator, and loss of accuracy error

Hi Roger,

Thanks for the input. I think perhaps my initial question was misleading.

We rely on Trilinos to solve the linear system only, the non-linear portion is handled by our own code. We use the NOX to provide the operator (interfaced with our own code for residual evaluations etc), but don't use NOX as the non-linear solver itself.

I've experimented quite a lot with scaling, and whilst I've found the frequency of this error does depend on what I use I've never been able to eradicate it. Do you know of any way I can convince Aztec to not perform this check, as you suggest the NOX solvers do?

Many thanks,

Tom



On Mon, Dec 8, 2014 at 6:30 PM, Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>>> wrote:
Hi Tom,

This is a known issue.  JFNK uses a directional derivative for the Jacobian-vector product and Aztec is detecting that loss of accuracy from the finite differencing.  It could be indicative of a poorly scaled system of equations and/or linear solve tolerances in Aztec that are too tight.  You can switch to a higher order derivative in the JFNK algorithm or loosen the linear solve tolerances.  If it is poorly scaled, you can enable row sum scaling of the linear system in the nox group for the linear solves.  We tend to ignore the warning as it was an overly stringent check that, if memory serves, assumed you had an analytic Jacobian.  By default, nox with JFNK should ignore this warning.  Please send your output so I can check to make sure that is happening in your case. Note that if you switch to Belos instead of Aztec (preferred), this warning will not occur.

Best,
Roger



On 12/08/2014 12:04 PM, Tom Goffrey wrote:
Hello,

We currently utilise AztecOO GMRES (with condition number estimate) to solve the linear system resulting from the time implicit solution of multi-dimensional hydrodynamics. However we have run into problems when adapting our code to apply a Jacobian free approach.

We replaced our previous explicit Jacobian operator with the matrix free operator from the NOX package.
(http://trilinos.org/docs/dev/packages/nox/doc/html/classNOX_1_1Epetra_1_1MatrixFree.html)

However we find a high proportion of our GMRES solves abort prematurely due to an error relating to an apparent loss of accuracy:

        Warning: recursive residual indicates convergence
        though the true residual is too large.

        Sometimes this occurs when storage is overwritten (e.g. the
        solution vector was not dimensioned large enough to hold
        external variables). Other times, this is due to roundoff. In
        this case, the solution has either converged to the accuracy
        of the machine or intermediate roundoff errors occurred
        preventing full convergence. In the latter case, try solving
        again using the new solution as an initial guess.

I didn't manage to find much within Trilinos documentation on this, but from looking through the source code it appears this is triggered by a significant difference between the recursive residual and the true residual, where the true residual is calculated for the solution vector, but the recursive residual is calculated by using an incremental solution vector.

Currently we believe the high frequency of this error we see for the JFNK approach is because we have in fact used a non-linear operator (in this case our non-linear residual function) to approximate the linear Jacobian-vector product. As such it is not clear to us that this sort of error checking makes sense in this situation, or perhaps it should be interpreted as a sign the Jacobian free approximation is becoming worse?

Presumably I can control this error checking by allowing the two residuals to vary by a greater amount, but I was wondering if others have had similar problems, and had any advice on the issue?

Of course, any alternative explanation of the high frequency would be equally appreciated!

Thanks,

Tom Goffrey



_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov><mailto:Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>><mailto:Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov><mailto:Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>>>https://software.sandia.gov/mailman/listinfo/trilinos-users








More information about the Trilinos-Users mailing list