Epetra  Development
Epetra Lesson 04: Sparse matrix fill

Different ways to add entries to Epetra sparse matrix

# Lesson topics

This lesson explains how to add entries to a Epetra sparse matrix (Epetra_CrsMatrix). It explains how you can choose any of various ways to add entries to the matrix. Some are more flexible but less efficient, and others are more efficient but less flexible. In general, knowing the sparse matrix structure in advance will reduce both peak memory usage and total run time.

This lesson does not currently include code examples. The section Exercise at the end suggests modifying the example Epetra Lesson 03: Power method in order to learn about the different ways to add or change the entries of a sparse matrix.

# Filling a sparse matrix

We call adding entries to a sparse matrix filling it, and call the general procedure "sparse matrix fill" or just fill. Often people have finite element assembly in mind when they use this expression, but fill here means adding entries for any reason. For example, you might want to

• solve an optimization problem with a set of linear constraints, so you would add one row to the sparse matrix for each constraint; or
• represent a weighted graph as a sparse matrix, so that each row of the matrix represents a vertex's adjacency list and corresponding edge weights.

Also, by extension, "fill" might refer to modifying existing entries of a sparse matrix.

# Rowwise access, not rowwise distribution

Epetra_CrsMatrix inherits from the Epetra_RowMatrix interface; it "is a" RowMatrix. Some people have the misconception that "RowMatrix" (and therefore CrsMatrix) refers to a rowwise distribution of the sparse matrix over parallel processes. It does not. CrsMatrix supports an arbitrary "two-dimensional" distribution of rows and columns over processes. The term instead refers to rowwise access to the data. That is, the methods in this interface and in CrsMatrix let you add or access entries on each process by row.

This distinction matters because two or more processes might share entries in a row. Asking for "all the entries in a row" on a particular process only accesses the entries owned by that process, which is not necessarily all the entries in a row.

Whether adding entries or modifying existing ones, you may always do so for any number of entries in the row, such that their columns are owned by the calling process. You should always modify as many entries with one method call as possible, in order to amortize function call and data access overhead.

# Phases of fill

An Epetra_CrsMatrix may be in either of two different states:

1. Post construction, before user calls `FillComplete()`
2. After user calls `FillComplete()`

In both states, you are allowed to set or modify values in the matrix. In State 1, if the matrix was not created with a constant graph (see below), and if there is room in the row (see "static profile" discussion below), you are allowed to change the graph structure of the matrix by inserting entries.

We call a matrix in State 2 "fill complete." In State 2, you are not allowed to modify the graph structure of the matrix, but you are allowed to modify the values in the matrix.

Calling `FillComplete()` does a lot of work. In general, it requires distributed-memory communication as well as local rearrangement of data. The intent is that you should exploit this work for as long as possible before creating a new matrix with a different graph structure. You can do so by performing multiple sparse matrix-vector multiplies or triangular solves.

# Trade-offs between flexibility and performance

## The most general case: dynamic allocation, no size hint

CrsMatrix lets you make trade-offs between flexibility and performance. All of these trade-offs revolve around how much you know about the structure of the sparse matrix – that is, its graph – in advance. In the most general case, all you know is which process owns which row – the row Map.

Epetra_Map rowMap (...); // whatever row Map you want
Epetra_CrsMatrix A (Copy, rowMap, 0);

This code snippet says that you want to create a sparse matrix with the given row distribution. It also says that you intend each process to own all the entries in a given row. If you want to allow a more general "2-D" distribution, you must also tell the constructor which process owns which columns of the matrix. You do so by supplying a "column Map":

Epetra_Map rowMap (...); // whatever row Map you want
Epetra_Map colMap (...); // whatever column Map you want
Epetra_CrsMatrix A (Copy, rowMap, colMap, 0);

All the examples that follow allow you to supply a column Map if you which, so we omit this for brevity.

Both of these ways to create a sparse matrix allocate memory for the entries in each row dynamically. This means that it reallocates space as necessary. We call this dynamic profile, where "profile" refers to the structure (the graph) of the sparse matrix. (Static profile, which we describe later, does not reallocate; it fixes an upper bound on the number of entries allowed per row.) CrsMatrix implements this currently as an array of arrays, one array per row, though this may change. You might hear this array of arrays structure referred to as "2-D storage," versus the "1-D" storage of the conventional compressed sparse row format.

# Dynamic allocation with a size hint

In the above examples, the matrix has no information about how many entries you want to add to each row. Thus, it has to make some default assumption. The "size hint" defaults to zero, to ensure that empty or nearly empty sparse matrices take as little memory as possible. An initial reservation of zero entries per row may cause frequent reallocation if you add entries to a row many times. Reallocation takes time, because it requires talking to the C++ run-time library or the system library. It also tends to fragment memory and thus disturb locality, further reducing performance.

If you have a pretty good idea how many entries will be in each row, you may give a "size hint" to the constructor. You may do so either with a single number, which is an expected upper bound on the number of entries in any one row, or with an upper bound for each row of the matrix. The typical use case is a single upper bound, which we illustrate below.

Epetra_Map rowMap (...); // whatever row Map you want
// With dynamic profile, this is only a hint, not a hard constraint.
const int maxEntriesPerRow = 3;
Epetra_CrsMatrix A (Copy, rowMap, maxEntriesPerRow);

If you have an upper bound for each row, you probably could use static profile, but it is still legal to use dynamic profile in this case.

Giving an upper bound on the number of entries per row allows the matrix to preallocate space for each row. Dynamic profile still allows you to put more entries in a row than that, but if you don't, then the matrix will only allocate space once per row during the fill phase.

# Static allocation

Static allocation, what we call static profile, means that the "size hint" becomes a hard constraint. You may either specify a single upper bound on the number of entries in any row, or an array with a separate upper bound for each row. Here is an example of the former:

// With static profile, this is a hard constraint.
const int maxEntriesPerRow = 3;
const bool staticProfile = true;
Epetra_CrsMatrix A (Copy, rowMap, maxEntriesPerRow, staticProfile);

What happens when you specify static profile is that the matrix allocates "1-D" storage. That is, rather than the aforementioned "2-D" array of arrays, the matrix allocates the usual three compressed sparse row arrays (ptr, ind, val), with fixed space in each row (given either by the upper bound, or by the aforementioned array). When you're done filling and call `FillComplete()`, this means the matrix need only "pack" the 1-D storage into its final representation by copying the entries that you filled in (and not any of the "slack space" you might have left). Static profile also may use less memory than dynamic profile, because you don't have to store a complicated dynamic data structure for each row of the matrix.

# Constant graph

The most efficient but least flexible fill method is to create the Epetra_CrsMatrix with a constant graph. That is, you create a Epetra_CrsGraph separately, fill it, call its `FillComplete()` method, then pass the graph to the Epetra_CrsMatrix constructor. This completely constrains the structure of the CrsMatrix. You may only set or modify values in the matrix, not the structure. (This means you may not call `InsertGlobalValues()` or `InsertMyValues()`, only the `Replace*Values()`, `SumInto*Values`, `Scale()`, and `PutScalar()` methods.

// Note that the CrsMatrix constructor takes a "const" graph.
// This means that the CrsMatrix is not allowed to change the
// graph. You shouldn't either. The graph must be fill complete.
Epetra_CrsGraph graph (...);
// The graph comes with the row and column Maps already built in.
// CrsMatrix's constructor takes the graph by "value," but this
// does a shallow copy; it does not copy the entries in the graph.

Filling the CrsGraph works much like filling a CrsMatrix, except that you only specify structure, not values. Furthermore, the same size hint and dynamic profile vs. static profile options apply to CrsGraph.

# Local vs. global indices

Please see Epetra Lesson 02: Map and Vector for a detailed explanation of global and local indices.

In the sparse matrix's final state, it represents column indices as local indices. This is faster for operations like sparse matrix-vector multiply. It saves storage as well, especially if Epetra was configured to use 64-bit global indices. However, accessing or inserting entries by local index is only allowed under certain conditions. In particular, the matrix must have a column Map, which tells it how to convert between global and local indices for the column indices of the matrix. If you didn't create the matrix with a precomputed column Map (either in its constructor, or in the constant graph), it has to compute the column Map on its own. It does so in `FillComplete()`.

If the matrix doesn't have a column Map yet, you must use global indices, since local indices don't exist yet. However, if you can use local indices, you should. For global indices, the matrix must go to the Map to look up the corresponding local index for every global index. Local indices only require an array access.

Remember that if column indices are local, then they are counted with respect to the column Map. Local row and column indices may not necessarily be the same. For example, the matrix entry at position `(i_local, i_local)`, where the first `i_local` is the local row index and the second `i_local` the local column index, may not necessarily be a diagonal entry of the matrix. However, the matrix entry at global position `(i_global, i_global)` is always a diagonal entry of the matrix.

# Exercise

Start with the Epetra Lesson 03: Power method example, which uses a dynamic profile and global indices. Try all of the different fill techniques described in this lesson. If you like, you may also try cyclic boundary conditions instead of Dirichlet boundary conditions in the 1-D Poisson discretization. (This makes the number of entries per row the same for all rows.)