[Trilinos-Users] Belos GMRES results changing with MPI rank count
Alicia Klinvex
aklinvex at gmail.com
Wed Jun 8 11:28:23 EDT 2016
Hello Joseph,
I recommend you try calling the MatrixMarket writer to ensure that your
operators are in fact the same:
https://trilinos.org/docs/dev/packages/tpetra/doc/html/classTpetra_1_1MatrixMarket_1_1Writer.html#a6fcc0d5884f6b7ec0016dc628b170e86
I don't know if you've ever used MatrixMarket before, but it's a filetype
that you can easily read into Matlab for debugging (assuming your matrix is
small enough/computer is big enough). I would start by calling that writer
from 1, 2, and 4 MPI processes and using Matlab to determine whether those
matrices are actually the same. I would also use Matlab to
estimate/compute the condition number. I assume you're aware that you can
get small differences in your mat-vec with different numbers of MPI
processes, just because the order of operations is different, and those
differences can be magnified like any other error.
Best wishes,
Alicia
On Wed, Jun 8, 2016 at 9:17 AM, Rutherford, Joseph M <jmruther at illinois.edu>
wrote:
> All,
>
>
>
> My custom operator expresses a diagonalized dense operator as a cascade of
> sparse operators. If A,B,C are Tpetra::Operators, I is an optional
> identity Operator effecting Tpetra::Import, and x,y are MultiVectors, then
> my custom operator is executing y=(A*B+C)*I*x. My system is
>
> 1.) Vectors x,y are 1:1 distributed in a non-uniform map.
>
> 2.) apply() consistently computes the same mat-vec with MPI rank
> counts 1 and 2.
>
> 3.) Belos GMRES converges to validated answers with 1 MPI rank.
>
> 4.) Belos GMRES converges to different answers with MPI rank counts 1,
> 2, and 4.
>
>
>
> Points 1 and 2 suggest that perhaps the operator itself is working
> correctly. I have no idea what I might be doing incorrectly with Belos.
> The only “GMRES” solver parameters I’m defining are “Convergence Tolerance”
> and “Verbosity”. For verbosity=127 (all enums summed together), I get the
> following output:
>
>
>
> <console>
>
> Belos::StatusTestGeneralOutput: Passed
>
> (Num calls,Mod test,State test): (153, 1, Passed)
>
> Passed.......OR Combination ->
>
> OK...........Number of Iterations = 152 < 1000
>
> Converged....(2-Norm Res Vec) / (2-Norm Prec Res0)
>
> residual [ 0 ] = 8.47249e-07 < 1e-06
>
>
>
> Passed.......OR Combination ->
>
> OK...........Number of Iterations = 152 < 1000
>
> Converged....(2-Norm Res Vec) / (2-Norm Prec Res0)
>
> residual [ 0 ] = 8.47249e-07 < 1e-06
>
>
>
>
>
>
> =========================================================================================================================
>
>
>
> TimeMonitor results over 4
> processors
>
>
>
> Timer Name MinOverProcs
> MeanOverProcs MaxOverProcs MeanOverCallCounts
>
>
> -------------------------------------------------------------------------------------------------------------------------
>
> Belos: Operation Op*x 9.564 (154) 9.717
> (154) 9.883 (154) 0.0631 (154)
>
> Belos: Operation Prec*x 0 (0) 0
> (0) 0 (0) 0 (0)
>
> Belos: Orthogonalization 0.1221 (153) 0.2889
> (153) 0.4433 (153) 0.001888 (153)
>
> Belos: PseudoBlockGmresSolMgr total solve time 9.949 (1) 9.949
> (1) 9.95 (1) 9.949 (1)
>
>
> =========================================================================================================================
>
> </console>
>
>
>
> Can anyone please suggest how to better diagnose the problem?
>
>
>
> Joe
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at trilinos.org
> https://trilinos.org/mailman/listinfo/trilinos-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20160608/2bffea4b/attachment.html>
More information about the Trilinos-Users
mailing list