[Trilinos-Users] Matrix Free operator, and loss of accuracy error

Tom Goffrey t.goffrey at exeter.ac.uk
Tue Dec 9 05:51:04 MST 2014


Hi Mike,

Sorry for the delay, wanted to be absolutely sure that I had the details
straight before replying.

I agree that this test is only carried out once the recursive residual
indicates convergence. The true residual is then calculated, and as you say
we hit this warning if the difference is too large.

The trouble we are having is that quite often the difference is sufficient
that the recursive residual implies convergence but the true residual does
not, and we get something like:
        ***************************************************************

        Warning: recursive residual indicates convergence
        though the true residual is too large.

        Sometimes this occurs when storage is overwritten (e.g. the
        solution vector was not dimensioned large enough to hold
        external variables). Other times, this is due to roundoff. In
        this case, the solution has either converged to the accuracy
        of the machine or intermediate roundoff errors occurred
        preventing full convergence. In the latter case, try solving
        again using the new solution as an initial guess.

        Solver:                 gmres_condnum
        number of iterations:   11

        Actual residual =  3.3361e-05   Recursive residual =  3.4301e-06

        Calculated Norms                                Requested Norm
        --------------------------------------------    --------------

    ||r||_2 / ||b||_2:              7.213690e-02    1.000000e-02

    ***************************************************************


In some cases we are able to restart GMRES based on the solution we get out
and eventually we get a solution that has converged according to the true
residual. However in other cases we seem to get stuck, repeatedly hitting
this error.

Do you have any advice in this situation?

Roger when you say be default NOX doesn't worry about this, are you
basically saying that NOX would treat this situation as an acceptable
convergence?

Thanks,

Tom

On Mon, Dec 8, 2014 at 8:43 PM, Heroux, Mike <MHeroux at csbsju.edu> wrote:

> As I recall, this test is invoked when the recursive residual has reached
> the threshold specified in params[AZ_tol]
>
> Then the true residual is computed. If the absolute value of the
> difference between the recursive and true residual is greater than
> params[AZ_tol], this error condition is enable.
>
> You could try increasing your value for AZ_tol.  This would decrease the
> chance of seeing the error, by not forcing iterations to continue beyond
> what is reachable and because the difference test would be easier to
> satisfy.
>
> There is some mathematical reasoning behind this test.  Do you know that
> continued iterations improves the overall solution process?
>
> You could, for experimental purposes, modify az_util.c:1413 and make the
> logic test trivially false.  Then the solver would continue and you could
> see the impact of continued iterations.
>
> There is no “official” way to turn off this test.
>
> Mike
>
> From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>
> Date: Monday, December 8, 2014 at 2:12 PM
> To: Michael A Heroux <mheroux at csbsju.edu<mailto:mheroux at csbsju.edu>>
> Cc: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>,
> trilinos-users <trilinos-users at software.sandia.gov<mailto:
> trilinos-users at software.sandia.gov>>
> Subject: Re: [Trilinos-Users] Matrix Free operator, and loss of accuracy
> error
>
> Hi Mike,
>
> The silence I don't mind (can always check return status), but presumably
> Aztec will still detect this error and abort early (i.e. before requested
> tolerance has been achieved), albeit silently?
> Is there any way I can ask Aztec to ignore this error and carry on
> iterating, as I got the impression NOX was capable of?
>
> Thanks,
>
> Tom
>
> On Mon, Dec 8, 2014 at 7:55 PM, Heroux, Mike <MHeroux at csbsju.edu<mailto:
> MHeroux at csbsju.edu>> wrote:
> Tom,
>
> If AZ_output is set to AZ_none, no output, including this warning message,
> will be generated.  This might be too much “silence” but it will work.
>
> Mike
>
> From: Tom Goffrey <t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk
> ><mailto:t.goffrey at exeter.ac.uk<mailto:t.goffrey at exeter.ac.uk>>>
> Date: Monday, December 8, 2014 at 1:38 PM
> To: Roger Pawlowski <rppawlo at sandia.gov<mailto:rppawlo at sandia.gov><mailto:
> rppawlo at sandia.gov<mailto:rppawlo at sandia.gov>>>, trilinos-users <
> trilinos-users at software.sandia.gov<mailto:
> trilinos-users at software.sandia.gov><mailto:
> trilinos-users at software.sandia.gov<mailto:
> trilinos-users at software.sandia.gov>>>
> Subject: Re: [Trilinos-Users] [EXTERNAL] Matrix Free operator, and loss of
> accuracy error
>
> Hi Roger,
>
> Thanks for the input. I think perhaps my initial question was misleading.
>
> We rely on Trilinos to solve the linear system only, the non-linear
> portion is handled by our own code. We use the NOX to provide the operator
> (interfaced with our own code for residual evaluations etc), but don't use
> NOX as the non-linear solver itself.
>
> I've experimented quite a lot with scaling, and whilst I've found the
> frequency of this error does depend on what I use I've never been able to
> eradicate it. Do you know of any way I can convince Aztec to not perform
> this check, as you suggest the NOX solvers do?
>
> Many thanks,
>
> Tom
>
>
>
> On Mon, Dec 8, 2014 at 6:30 PM, Roger Pawlowski <rppawlo at sandia.gov
> <mailto:rppawlo at sandia.gov><mailto:rppawlo at sandia.gov<mailto:
> rppawlo at sandia.gov>>> wrote:
> Hi Tom,
>
> This is a known issue.  JFNK uses a directional derivative for the
> Jacobian-vector product and Aztec is detecting that loss of accuracy from
> the finite differencing.  It could be indicative of a poorly scaled system
> of equations and/or linear solve tolerances in Aztec that are too tight.
> You can switch to a higher order derivative in the JFNK algorithm or loosen
> the linear solve tolerances.  If it is poorly scaled, you can enable row
> sum scaling of the linear system in the nox group for the linear solves.
> We tend to ignore the warning as it was an overly stringent check that, if
> memory serves, assumed you had an analytic Jacobian.  By default, nox with
> JFNK should ignore this warning.  Please send your output so I can check to
> make sure that is happening in your case. Note that if you switch to Belos
> instead of Aztec (preferred), this warning will not occur.
>
> Best,
> Roger
>
>
>
> On 12/08/2014 12:04 PM, Tom Goffrey wrote:
> Hello,
>
> We currently utilise AztecOO GMRES (with condition number estimate) to
> solve the linear system resulting from the time implicit solution of
> multi-dimensional hydrodynamics. However we have run into problems when
> adapting our code to apply a Jacobian free approach.
>
> We replaced our previous explicit Jacobian operator with the matrix free
> operator from the NOX package.
> (
> http://trilinos.org/docs/dev/packages/nox/doc/html/classNOX_1_1Epetra_1_1MatrixFree.html
> )
>
> However we find a high proportion of our GMRES solves abort prematurely
> due to an error relating to an apparent loss of accuracy:
>
>         Warning: recursive residual indicates convergence
>         though the true residual is too large.
>
>         Sometimes this occurs when storage is overwritten (e.g. the
>         solution vector was not dimensioned large enough to hold
>         external variables). Other times, this is due to roundoff. In
>         this case, the solution has either converged to the accuracy
>         of the machine or intermediate roundoff errors occurred
>         preventing full convergence. In the latter case, try solving
>         again using the new solution as an initial guess.
>
> I didn't manage to find much within Trilinos documentation on this, but
> from looking through the source code it appears this is triggered by a
> significant difference between the recursive residual and the true
> residual, where the true residual is calculated for the solution vector,
> but the recursive residual is calculated by using an incremental solution
> vector.
>
> Currently we believe the high frequency of this error we see for the JFNK
> approach is because we have in fact used a non-linear operator (in this
> case our non-linear residual function) to approximate the linear
> Jacobian-vector product. As such it is not clear to us that this sort of
> error checking makes sense in this situation, or perhaps it should be
> interpreted as a sign the Jacobian free approximation is becoming worse?
>
> Presumably I can control this error checking by allowing the two residuals
> to vary by a greater amount, but I was wondering if others have had similar
> problems, and had any advice on the issue?
>
> Of course, any alternative explanation of the high frequency would be
> equally appreciated!
>
> Thanks,
>
> Tom Goffrey
>
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov<mailto:
> Trilinos-Users at software.sandia.gov><mailto:
> Trilinos-Users at software.sandia.gov<mailto:
> Trilinos-Users at software.sandia.gov>>
> https://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20141209/9db89971/attachment.html>


More information about the Trilinos-Users mailing list