[Trilinos-Users] [EXTERNAL] Re: Updating values in a Tpetra CRS matrix

Sat Sep 3 15:46:24 EDT 2016

On 9/3/16, 11:40 AM, "Eric Marttila" <eric.marttila at thermoanalytics.com> wrote:
>On Friday, September 02, 2016 03:52:25 pm Hoemmen, Mark wrote:
>> Best practice for your use case, in both Epetra and Tpetra, is the
>> following: 1. Create the CrsGraph first
>> 2. Create the CrsMatrix using the constructor that takes a CrsGraph
>> 3. Do your Newton solves; use sumInto*Values / replace*Values, not
>> insert*Values, to change the matrix’s values
>> …
>
>Thank you Mark, that is very helpful. In my case I am not creating a 
>CrsGraph, but instead am creating a tpetra Map and then creating the 
>CrsMatrix from the Map.  So my steps are:
>
>1. Create Map
>2. Create CrsMatrix from Map
>3. Insert values into CrsMatrix
>4. Call fillComplete
>5. Peform Newton solves, and for each solve the code will:
>5.0  Call resumeFill
>5.1. Update values in CrsMatrix using sumInto*Values / replace*Values
>         (each process only updates rows that it owns)
>5.2 Call fillComplete
>5.3 Call belos solve
>
>Will the fill complete in my step 5.2 still be a cheap call as you 
>describe above?  Or do I have to create the CrsGraph?

Only the first fillComplete call (Step 4) will be expensive.  Subsequent fillComplete calls with the same matrix (Step 5.2) will not be expensive.  Nevertheless, best practice is still to create the graph first.  This makes expensive things explicit, which is a good idea in general.  Also, since we tell our applications to create the graph first, this is the case we optimize the most.  This is true for both Epetra and Tpetra.

CrsMatrix::insert*Values (dynamic structure changes to CrsMatrix) significantly complicates CrsMatrix’s implementation.  (That one sentence encapsulates so much frustration over the past three years.)  At some point, I would like to get rid of it, and make the optimized use case the _only_ use case.  I would provide a different, more dynamic data structure for sparse matrices with frequent structure changes.  However, for now, your use case will continue to work.

Note that {replace, sumInto}LocalValues are faster than {replace, sumInto}GlobalValues, because they need not incur overhead of converting to local indices.  Each global-to-local column index conversion requires a hash table lookup in general.  These hash table lookups are thread safe, but nevertheless cost something.  Thus, if you _can_ use local column indices, you should prefer them to global column indices.  

>Regarding preconditioners, I am interested in using MueLu.  My 
>understanding is that MueLu does use thread parallelization - can you 
>confirm whether or not this is true?

MueLu _apply_ currently uses thread parallelization.  More specifically, if you use MueLu with Tpetra data structures, then MueLu will use Tpetra, Ifpack2, Amesos2, Zoltan2, etc. for sparse matrix-vector multiply, vector operations, smoothers, coarse-grid solvers, and load balancing.  Thus, MueLu does whatever those packages do.  Some of those things are thread parallel, and some are not yet thread parallel.  For example, MueLu apply with Chebyshev or Jacobi smoothing is fully thread parallel.  We will bring other thread-parallel smoothers (e.g., Gauss-Seidel with coloring and ILU(k)) online this fall.  MueLu _setup_ is not yet thread parallel, although the MueLu developers are working on this right now. 

mfh