[Trilinos-Users] ML + Zoltan

Lucas Wilcox lucasw at ices.utexas.edu
Wed Oct 21 09:34:44 MDT 2009


Hi all,

We are trying to use ML for preconditioning large Stokes solves by
a Wathen/Sylvester style Schur complement preconditioner.  This means
that we use AMG v-cycles for a positive definite symmetric vector
system.  We are typically running on >1000 cores of TACC's Ranger and
would like to use ML with Zoltan reparitioning.  Unfortunately we are
experiencing what we think is a bug in ML+Zoltan or the way we use it.
We sent an email to the ML-Users mailing list but have not received a
response.  I hope it is ok to crosspost this issue as it seems that the
trilinos-users list has more traffic.

We have boiled our problem down to a simple single-processor test which
has a valgrind invalid read at the same place where the large-scale run
sometimes segfaults (as we have seen from the backtrace).  With trilinos
10.0 (also seen in 9.0) we have modified the example

   trilinos-10.0-Source/packages/ml/test/Zoltan/cxx_main.cpp

with our ML parameters

  MLList.set ("PDE equations", NumPDEEqns);
  MLList.set ("cycle applications", 1);
  MLList.set ("ML output", 10);
  MLList.set ("max levels", 10);
  MLList.set ("increasing or decreasing", "increasing");
  MLList.set ("aggregation: type", "Uncoupled");
  MLList.set ("aggregation: threshold", 0.01);
  MLList.set ("smoother: type", "Chebyshev");
  MLList.set ("smoother: sweeps", 3);
  MLList.set ("smoother: pre or post", "both");
  MLList.set ("coarse: type", "Amesos-KLU");
  MLList.set ("x-coordinates", x_coord);
  MLList.set ("y-coordinates", y_coord);
  MLList.set ("z-coordinates", z_coord);
  MLList.set ("repartition: enable", 1);
  MLList.set ("repartition: max min ratio", 1.3);
  MLList.set ("repartition: min per proc", 500);
  MLList.set ("repartition: partitioner", "Zoltan");
  MLList.set ("repartition: Zoltan dimensions", 2);

and we see the following invalid read using valgrind

  ==20427== Invalid read of size 8
  ==20427==    at 0x5C0E07: CSR_matvec (ml_mat_formats.c:910)
  ==20427==    by 0x5AD96E: ML_Operator_Apply (ml_operator.c:607)
  ==20427==    by 0x561E26: ML_repartition_Acoarse (ml_aggregate.c:2315)
  ==20427==    by 0x546BD5: ML_Gen_MultiLevelHierarchy (ml_agg_genP.c:2643)
  ==20427==    by 0x5462B8: ML_Gen_MultiLevelHierarchy_UsingAggregation (ml_agg_genP.c:2444)
  ==20427==    by 0x4DE071: ML_Epetra::MultiLevelPreconditioner::ComputePreconditioner(bool) (ml_MultiLevelPreconditioner.cpp:1474)
  ==20427==    by 0x4E4171: ML_Epetra::MultiLevelPreconditioner::MultiLevelPreconditioner(Epetra_RowMatrix const&, Teuchos::ParameterList const&, bool) (ml_MultiLevelPreconditioner.cpp:351)
  ==20427==    by 0x4C6EE7: main (cxx_main.cpp:138)
  ==20427==  Address 0x11391978 is 0 bytes after a block of size 9,912 alloc'd
  ==20427==    at 0x4C22FAB: malloc (vg_replace_malloc.c:207)
  ==20427==    by 0x544D03: ML_Project_Coordinates (ml_agg_genP.c:2116)
  ==20427==    by 0x546B0C: ML_Gen_MultiLevelHierarchy (ml_agg_genP.c:2629)
  ==20427==    by 0x5462B8: ML_Gen_MultiLevelHierarchy_UsingAggregation (ml_agg_genP.c:2444)
  ==20427==    by 0x4DE071: ML_Epetra::MultiLevelPreconditioner::ComputePreconditioner(bool) (ml_MultiLevelPreconditioner.cpp:1474)
  ==20427==    by 0x4E4171: ML_Epetra::MultiLevelPreconditioner::MultiLevelPreconditioner(Epetra_RowMatrix const&, Teuchos::ParameterList const&, bool) (ml_MultiLevelPreconditioner.cpp:351)
  ==20427==    by 0x4C6EE7: main (cxx_main.cpp:138)
  ==20427==

We have attached the full source of the example that fails.  Is there
anything we are doing wrong setting the ML parameters?  Any guidance
would be helpful.  We would be happy to use different parameters we are
just trying to reduce the time ML takes on large core counts.

Thanks,
Lucas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cxx_main.cpp
Type: text/x-c++src
Size: 4089 bytes
Desc: not available
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20091021/ae7b4233/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20091021/ae7b4233/attachment-0001.bin 


More information about the Trilinos-Users mailing list