[Trilinos-Users] Copying Amesos2 Solver and Tpetra MultiVectors

D H mrhyde at stanford.edu
Sun Jun 8 23:37:26 MDT 2014


I have experimented with this a little bit, and SuperLU_MT seems to be working pretty well as my chosen solver implementation.  If I use one solver per thread, I can do solves in parallel and get accurate results / no crashes (so, things are acting like they're thread safe, at least on the surface).  I'm not totally sure about the thread scalability; after each thread's solver has done its LU factorization and is just repeatedly doing triangular solves, my CPU usage seems to only go up to about 70%.  That could be for a number of different reasons, so definitely take it with a grain of salt!, but in any case, things seem to be working pretty well.

I did think about some MPI options, but my whole simulation is parallelized based on OpenMP, so it would require a decent amount of effort to rewire everything to be MPI-based.  In light of this, I think SuperLU_MT is the right choice, since it will allow my simulation to compute the LU factorization using multiple threads (internally parallel), then I can simultaneously do triangular solves from different threads (externally parallel, even if it's internally serial that's fine).

I guess my only remaining concern is one I will wait to hear from some Amesos2 developers about, namely this question of getting the results of a factorization out of one solver instance and giving them to another solver instance.  Browsing through the source code, I can see that the computed "L" and "U" are tucked away in the private data_ member of the implementation-specific Amesos2::Superlumt< Matrix, Vector > class.  Looking at this, it doesn't look like there's an obvious way to achieve what I'm trying to do (I guess this is not a freqeuent feature request!), but hopefully some of the developers can shed some light on a way to do this, even if I have to tweak the source code a bit.

Thanks again!

David

----- Original Message -----
From: "Mark Hoemmen" <mhoemme at sandia.gov>
To: "D H" <mrhyde at stanford.edu>, trilinos-users at software.sandia.gov
Sent: Sunday, June 8, 2014 12:23:26 PM
Subject: Re: [Trilinos-Users] Copying Amesos2 Solver and Tpetra MultiVectors

On 6/7/14, 11:57 PM, "D H" <mrhyde at stanford.edu> wrote:
>Thanks very much for your reply!  I really appreciate your help.

Glad to help out!

>
>It's good to know about the deep copying for Tpetra::MultiVector.  I do
>have one follow-up, though.  In my use case, I would like to create a
>solver and compute the LU factorization (using SuperLU_MT), and then once
>that is done, I would like to simultaneously solve the system for a large
>number of right hand sides (using OpenMP).  I don't know all the
>right-hand sides before I start solving; the right-hand sides are
>computed one at a time on the fly on different threads, and each right
>hand side is destroyed as soon as the system is solved with it, so I
>can't just tell the solver to solve for an NxM right hand side.  Is
>solver->solve thread-safe, and will it run in parallel ... ?

Amesos2::Solver does not currently promise that solve() is thread safe.
There are two reasons for this:

  1. The underlying solver (e.g., SuperLU_MT, Cholmod, ...) might not be
thread safe.

Just because a solver implementation (e.g., SuperLU_MT) uses threads
_inside_, doesn't necessarily mean that it's safe to call multiple
instances of the solver concurrently.

  2. Teuchos::RCP reference count updates are not currently thread safe.

This would not be hard to fix, but we haven't had a chance to do it.
Also, just because something is thread safe, doesn¹t mean that it's thread
scalable.  (I know you understand the difference by how you asked the
question :-) .)  Anything that works like Teuchos::RCP -- including
boost::shared_ptr and std::shared_ptr (C++11) -- can never be thread
scalable, because atomic reference count updates are a serialization
bottleneck.  Any copy constructor or assignment operator call for any of
these "smart pointers" updates its reference count.  This is why we've
resisted making Teuchos::RCP thread safe.  (Another reason is that we get
very few requests for this functionality, so we haven't prioritized it.)

For your application, RCP reference count updates _should_ not be a
bottleneck, but we haven't done an experiment to test this, as far as I
know.

Whether solve() is parallel _inside_, depends on the solver
implementation.  I would have to look at SuperLU_MT to be sure, but I'm
guessing that it only parallelizes the LU factorization, not the
triangular solves.  Parallelizing triangular solves effectively incurs
set-up overhead and doesn't always scale well.


>If one solver per thread is a good way to go, though, ...

See Point #1 above.  It's important to check whether a solver
implementation promises that concurrent use of a factorization is thread
safe.  Amesos2 just wraps solvers; it doesn't implement them (currently).

>I would appreciate any insights into how one might extract the LU
>factorization from one solver instance and put it into another, even if
>it's not exactly a deep copy.

I'll have to defer to the Amesos2 developers to see whether this is
possible, and if so, how to do it.

>P.S. Thanks for the tip regarding Tpetra template parameters!

Glad to help! :-)  We may eventually even make the first template
parameter optional, so that you could write e.g., Map<> and Vector<>.
However, we haven't done that yet.

If you would like to do concurrent solves, another approach would be to
run them in different MPI processes, and use Tpetra's Import / Export
functionality to move data around between processes.

Thanks!
mfh

>
>----- Original Message -----
>From: "Mark Hoemmen" <mhoemme at sandia.gov>
>To: trilinos-users at software.sandia.gov
>Sent: Saturday, June 7, 2014 10:30:47 PM
>Subject: Re: [Trilinos-Users] Copying Amesos2 Solver and Tpetra
>MultiVectors
>
>On 6/7/14, 12:00 PM, "trilinos-users-request at software.sandia.gov"
><trilinos-users-request at software.sandia.gov> wrote:
>>Date: Sat, 7 Jun 2014 10:29:55 -0700 (PDT)
>>From: D H <mrhyde at stanford.edu>
>>To: trilinos-users at software.sandia.gov
>>Subject: [Trilinos-Users] Copying Amesos2 Solver and Tpetra
>>	MultiVectors
>>Message-ID:
>>	<2083721164.11288811.1402162195681.JavaMail.zimbra at stanford.edu>
>>Content-Type: text/plain; charset=utf-8
>>
>>Hi everyone,
>>
>>I am new to working with Trilinos and have what is hopefully an easy
>>C++/Trilinos question.
>>
>>I'm working with Amesos2 to solve linear systems, and I have code that is
>>structured like:
>>
>>class A {
>>...
>>public B b_instance;
>>...
>>}
>>
>>class B {
>>...
>>public Teuchos::RCP<Amesos2::Solver<
>>Tpetra::CrsMatrix<double,int,int,Tpetra::DefaultPlatform::DefaultPlatform
>>T
>>ype::NodeType> , 
>>Tpetra::MultiVector<double,int,int,Tpetra::DefaultPlatform::DefaultPlatfo
>>r
>>mType::NodeType> > > amesos2_solver;
>>public Teuchos::RCP<
>>Tpetra::MultiVector<double,int,int,Tpetra::DefaultPlatform::DefaultPlatfo
>>r
>>mType::NodeType> > amesos2_X;
>>public Teuchos::RCP<
>>Tpetra::MultiVector<double,int,int,Tpetra::DefaultPlatform::DefaultPlatfo
>>r
>>mType::NodeType> > amesos2_B;
>>...
>>}
>>
>>In my program, I have several instances of class A.  For the first
>>instance, I use its b_instance to compute the numeric (LU) factorization
>>of a sparse matrix, which is another class variable of B.  At that point,
>>amesos2_X and amesos2_B are just random Nx1 vectors, but they have been
>>associated with amesos2_solver using "amesos2_solver =
>>Amesos2::create<MAT,MV>(solver_name, my_matrix, amesos2_X, amesos2_B);".
>>
>>So now, what I would like to do is to copy the amesos2_solver, amesos2_X,
>>and amesos2_B variables into the b_instances of my other instances of
>>class A.  I don't want to just create copies of pointers that are all
>>pointing to the same place - I would like separate variables that contain
>>the same information.  The motivation for this is so I can create several
>>independent solvers with the same matrix but only have to compute the LU
>>factorization once.
>>
>>Can you recommend what is the best way to copy these variabes, or the
>>underlying computed factorization, to other instances?  I'd be very
>>grateful for any help you can provide - thanks very much!
>
>It is certainly possible to make a deep copy of a Tpetra::MultiVector.
>You may use operator= to do that for now; in the development version of
>Trilinos and in future releases, you may use the Tpetra::deep_copy
>nonmember function.
>
>As far as I know, Amesos2 does not currently give you a way to make a deep
>copy of a Solver instance.  However, for your use case, you don't need to
>do this.  You could simply share one Amesos2::Solver instance by RCP among
>all your class instances, and call it with different vectors by using the
>two-argument version of solve():
>
>solver->solve (x, b);
>
>The input arguments of the two-argument version of solve() overwrite any
>setB() or setX() inputs.
>
>btw, since you are using all default values of the Tpetra template
>parameters, you may omit all but one of the Tpetra template parameters.
>For example:
>
>Tpetra::CrsMatrix<double> A;
>Tpetra::MultiVector<double> x;
>
>Please let us know if you have any further questions.
>mfh
>
>_______________________________________________
>Trilinos-Users mailing list
>Trilinos-Users at software.sandia.gov
>https://software.sandia.gov/mailman/listinfo/trilinos-users



More information about the Trilinos-Users mailing list