###############################################################################
#                                                                             #
# Trilinos Release 12.8 Release Notes                                         #
#                                                                             #
###############################################################################

Overview:

The Trilinos Project is an effort to develop algorithms and enabling
technologies within an object-oriented software framework for the solution of
large-scale, complex multi-physics engineering and scientific problems.

Packages:

The Trilinos 12.8 general release contains 58 packages: Amesos, Amesos2,
Anasazi, AztecOO, Belos, CTrilinos, Didasko, Domi, Epetra, EpetraExt, FEI,
ForTrilinos, Galeri, GlobiPack, Ifpack, Ifpack2, Intrepid, Isorropia, Kokkos,
Komplex, LOCA, Mesquite, ML, Moertel, MOOCHO, MueLu, NOX, Optika, OptiPack,
Pamgen, Phalanx, Pike, Piro, Pliris, PyTrilinos, ROL, RTOp, Rythmos, Sacado,
SEACAS, Shards, ShyLU, STK, Stokhos, Stratimikos, Sundance, Teko, Teuchos,
ThreadPool, Thyra, Tpetra, TriKota, TrilinosCouplings, Trios, Triutils, Xpetra,
Zoltan, Zoltan2.

(* denotes package is being released externally as a part of Trilinos for the
first time.)

Domi

  - Added replicated boundaries
    - A replicated boundary exists only on a periodic domain, and is
      simply a convention that the end points are the same points. For
      example, a left end coordinate that represents 0 degrees and a
      right end coordinate that represents 360 degrees. Domi now
      supports either convention, and it affects communication.
    - Added additional tests for periodic domains

  - Enhancements
    - New MDVector constructor that takes a parent MDVector and an array
      of Slices
    - MDMap support for axis maps
    - MDMap getMDComm() method

PyTrilinos

  - General
    - Improved formatting in example scripts

  - Domi
    - Update MDMap constructor for replicated boundaries
    - Fixed ETI bugs

  - NOX/LOCA
    - Fixed memory leak by updating NOX typemaps

  - Tpetra
    - Fix difficult-to-wrap Map class by using %inline

Tpetra

  - Stop creating Node instances explicitly!

    Hi users!  Please don't create Node instances explicitly any more.
    Tpetra::Map creates one for you, if you really need one.  You really
    don't need Node instances: Map's constructors and nonmember
    "constructors" don't need them any more, nor do Tpetra's Matrix Market
    readers.

    Creating Node instances explicitly causes issues with Kokkos
    initialization.  Node will go away eventually, in favor of Kokkos
    execution spaces and memory spaces.

  - Lots of bug fixes, especially for CUDA

  - Computing offsets in CrsGraph and CrsMatrix is now thread parallel

    CrsGraph's and CrsMatrix's fillComplete method computes row offsets,
    if they have not yet been computed.  This is now thread parallel.  It
    uses Kokkos::parallel_scan.

  - More BlockCrsMatrix kernels are thread parallel

  - Interface changes to KokkosSparse::CrsMatrix (the "local" matrix)

    - The replaceValues and sumIntoValues methods now take "is_sorted" and
      "force_atomic" arguments.  These methods now use binary search
      (falling back to linear search for short rows) for the sorted case.

    - Row views in KokkosSparse::CrsMatrix are no longer templated.  They
      now use the ordinal type, rather than the offset type, for indexing.
      This suffices as long as there are not enough duplicate entries in a
      row to exceed ordinal_type.  This has the beneficial side effect of
      reducing the number of local sparse matrix-vector multiply kernel
      instantiations.

  - Got rid of LittleBlock and LittleVector (for Block* classes)

    Instead, use the little_block_type, const_little_block_type,
    little_vec_type, and const_little_vec_type typedefs in BlockCrsMatrix
    and other related classes.  Underlying data layout has NOT changed
    (yet), but constructors HAVE changed.  This is technically a
    non-backwards-compatible interface change, but all these classes are
    in an Experimental namespace anyway.

  - Got rid of KokkosClassic::DefaultArithmetic

    Stokhos was using this, so we had left it in place in previous
    releases for backwards compatibility.  Now that no other packages
    depend on it, we have gotten rid of it for good.  Its functionality
    has been replaced by various functions in TpetraKernels.

    The original idea behind DefaultArithmetic, as suggested in the name,
    was that users could swap out this "default" implementation of
    multivector operations with their own implementations.  This is
    generally less useful than swapping out the implementation of sparse
    matrix kernels (like sparse matrix-vector multiply or sparse
    triangular solve).  As a result, Tpetra never had an implementation
    (since at least January 2010) of multivector operations other than
    DefaultArithmetic.

ROL

  - NEW FEATURES
    - Methods   
      - New phi-divergence capabilities for distributionally-robust
        optimization.
      - NonlinearLeastSquaresObjective functionality enables the solution of
        nonlinear equations through the EqualityConstraint object.

    - Infrastructure
      - Composite bound constraint (ROL_BoundConstraint_Partitioned).
      - Composite equality constraint (ROL_EqualityConstraint_Partitioned)
      - Merit function for interior point methods.
      - Adapter for Teuchos::SerialDenseVector.
      - L1, Lp, Linf norms for interior point methods.
      - Allow user-defined bracketing objects.
      - Line searches can take user-defined scalar minimizers.
      - Ability to supply ScalarMinimizationLineSearch with custom
        ScalarFunction.
      - New application development and interface tools for PDE-constrained
        optimization in PDE-OPT.
      - New PDE-OPT examples: stochastic Stefan-Boltzmann, stochastic
        advection-diffusion, etc.
      - Adaptive sparse grid capabilities with TriKota.

Zoltan

  - Improved robustness of RCB partitioner for problems where many objects have
    weight = 0 (e.g., PIC codes).  Convergence is faster and the stopping 
    criteria are more robust.

  - Fixed bug that occurred when RETURN_LIST=PARTS and (Num_GID > 1 or
    Num_LID > 1); GIDs and LIDs are now copied correctly into return lists.

  - Fixed a bug related to struct padding in the siMPI serial MPI interface.


Zoltan2 

  - Graph/Matrix ordering
    - Scotch now can be used for graph/matrix ordering.
    - The ordering interface Zoltan2::OrderingSolution has been updated
      to allow users to access separator info, if it is available.
    - Zoltan2::OrderingSolution method getPermutation() is now 
      getPermutationView().

  - Partitioning Metrics
    - Partitioning metrics have been moved out of the PartitioningProblem.
      They are now accessed through a separate class:  
      Zoltan2::EvaluatePartition.  
    - EvaluatePartition accepts as input a
      Zoltan2::Adapter and, optionally, a Zoltan2::PartitioningSolution.
      Thus, it can be used before or after partitioning, and before or
      after migration.
    - Imbalance and graph metrics are available.

  - Task placement
    - A new PartitionMapping class maps parts to processors.  
    - The MachineRepresentation has been updated, and specializations using
      Cray RCA and IBM TopoMgr are provided.
    - Geometric task placement using Multijagged partitioning better handles
      cases where the machine's network dimension is greater than the 
      dimension of the coordinates.

  - Multijagged partitioning
    - Zoltan2's Multijagged partitioner can now partition wrt the longest
      coordinate dimension, or in specified x-y-z order.

  - TPLs
    - Conversions between the index types in TPLs (ParMETIS, Scotch, Zoltan)
      are handled more robustly through the TPL_Traits class.
    - Interfaces to ParMETIS' AdaptiveRepart and RefineKway algorithms were
      added.
    - Bugs in the Zoltan interface are fixed.