[Trilinos-Users] [EXTERNAL] Re: Copying Amesos2 Solver and Tpetra MultiVectors

Mon Jun 9 09:40:23 MDT 2014

On 6/8/14, 11:37 PM, "D H" <mrhyde at stanford.edu> wrote:

>I have experimented with this a little bit, and SuperLU_MT seems to be
>working pretty well as my chosen solver implementation.  If I use one
>solver per thread,

Then why not just use SuperLU instead of SuperLU_MT? ;-)  (Has SuperLU
been made concurrency-safe?)  SuperLU might be a bit faster for the
single-threaded case, and it's probably easier to build.  Ah, but I see,
you would like to have the option to use multiple threads in the
factorization itself.

>I can do solves in parallel and get accurate results / no crashes (so,
>things are acting like they're thread safe, at least on the surface).
>I'm not totally sure about the thread scalability; after each thread's
>solver has done its LU factorization and is just repeatedly doing
>triangular solves, my CPU usage seems to only go up to about 70%.  That
>could be for a number of different reasons, so definitely take it with a
>grain of salt!, but in any case, things seem to be working pretty well.
>
>I did think about some MPI options, but my whole simulation is
>parallelized based on OpenMP, so it would require a decent amount of
>effort to rewire everything to be MPI-based.  In light of this, I think
>SuperLU_MT is the right choice, since it will allow my simulation to
>compute the LU factorization using multiple threads (internally
>parallel), then I can simultaneously do triangular solves from different
>threads (externally parallel, even if it's internally serial that's fine).
>
>I guess my only remaining concern is one I will wait to hear from some
>Amesos2 developers about, namely this question of getting the results of
>a factorization out of one solver instance and giving them to another
>solver instance.  Browsing through the source code, I can see that the
>computed "L" and "U" are tucked away in the private data_ member of the
>implementation-specific Amesos2::Superlumt< Matrix, Vector > class.
>Looking at this, it doesn't look like there's an obvious way to achieve
>what I'm trying to do (I guess this is not a freqeuent feature request!),
>but hopefully some of the developers can shed some light on a way to do
>this, even if I have to tweak the source code a bit.

It could be hard to support this feature for all solvers.  Not every
solver implementation exposes its post-factorization data structures.
However, it shouldn't be hard to add a mix-in to ask the solver if it
knows how to give back the L and U factors and any permutation
information.  The trick is getting time to do it...

btw, how would you feel about making a little contribution to Amesos2
yourself?  No pressure, just curious if you're interested.  We have a
contributor agreement form for making it official, but there's no rush.
It sounds like you find it a useful tool.

Thanks!
mfh

>
>Thanks again!
>
>David
>
>----- Original Message -----
>From: "Mark Hoemmen" <mhoemme at sandia.gov>
>To: "D H" <mrhyde at stanford.edu>, trilinos-users at software.sandia.gov
>Sent: Sunday, June 8, 2014 12:23:26 PM
>Subject: Re: [Trilinos-Users] Copying Amesos2 Solver and Tpetra
>MultiVectors
>
>On 6/7/14, 11:57 PM, "D H" <mrhyde at stanford.edu> wrote:
>>Thanks very much for your reply!  I really appreciate your help.
>
>Glad to help out!
>
>>
>>It's good to know about the deep copying for Tpetra::MultiVector.  I do
>>have one follow-up, though.  In my use case, I would like to create a
>>solver and compute the LU factorization (using SuperLU_MT), and then once
>>that is done, I would like to simultaneously solve the system for a large
>>number of right hand sides (using OpenMP).  I don't know all the
>>right-hand sides before I start solving; the right-hand sides are
>>computed one at a time on the fly on different threads, and each right
>>hand side is destroyed as soon as the system is solved with it, so I
>>can't just tell the solver to solve for an NxM right hand side.  Is
>>solver->solve thread-safe, and will it run in parallel ... ?
>
>Amesos2::Solver does not currently promise that solve() is thread safe.
>There are two reasons for this:
>
>  1. The underlying solver (e.g., SuperLU_MT, Cholmod, ...) might not be
>thread safe.
>
>Just because a solver implementation (e.g., SuperLU_MT) uses threads
>_inside_, doesn't necessarily mean that it's safe to call multiple
>instances of the solver concurrently.
>
>  2. Teuchos::RCP reference count updates are not currently thread safe.
>
>This would not be hard to fix, but we haven't had a chance to do it.
>Also, just because something is thread safe, doesn¹t mean that it's thread
>scalable.  (I know you understand the difference by how you asked the
>question :-) .)  Anything that works like Teuchos::RCP -- including
>boost::shared_ptr and std::shared_ptr (C++11) -- can never be thread
>scalable, because atomic reference count updates are a serialization
>bottleneck.  Any copy constructor or assignment operator call for any of
>these "smart pointers" updates its reference count.  This is why we've
>resisted making Teuchos::RCP thread safe.  (Another reason is that we get
>very few requests for this functionality, so we haven't prioritized it.)
>
>For your application, RCP reference count updates _should_ not be a
>bottleneck, but we haven't done an experiment to test this, as far as I
>know.
>
>Whether solve() is parallel _inside_, depends on the solver
>implementation.  I would have to look at SuperLU_MT to be sure, but I'm
>guessing that it only parallelizes the LU factorization, not the
>triangular solves.  Parallelizing triangular solves effectively incurs
>set-up overhead and doesn't always scale well.
>
>
>>If one solver per thread is a good way to go, though, ...
>
>See Point #1 above.  It's important to check whether a solver
>implementation promises that concurrent use of a factorization is thread
>safe.  Amesos2 just wraps solvers; it doesn't implement them (currently).
>
>>I would appreciate any insights into how one might extract the LU
>>factorization from one solver instance and put it into another, even if
>>it's not exactly a deep copy.
>
>I'll have to defer to the Amesos2 developers to see whether this is
>possible, and if so, how to do it.
>
>>P.S. Thanks for the tip regarding Tpetra template parameters!
>
>Glad to help! :-)  We may eventually even make the first template
>parameter optional, so that you could write e.g., Map<> and Vector<>.
>However, we haven't done that yet.
>
>If you would like to do concurrent solves, another approach would be to
>run them in different MPI processes, and use Tpetra's Import / Export
>functionality to move data around between processes.
>
>Thanks!
>mfh
>
>>
>>----- Original Message -----
>>From: "Mark Hoemmen" <mhoemme at sandia.gov>
>>To: trilinos-users at software.sandia.gov
>>Sent: Saturday, June 7, 2014 10:30:47 PM
>>Subject: Re: [Trilinos-Users] Copying Amesos2 Solver and Tpetra
>>MultiVectors
>>
>>On 6/7/14, 12:00 PM, "trilinos-users-request at software.sandia.gov"
>><trilinos-users-request at software.sandia.gov> wrote:
>>>Date: Sat, 7 Jun 2014 10:29:55 -0700 (PDT)
>>>From: D H <mrhyde at stanford.edu>
>>>To: trilinos-users at software.sandia.gov
>>>Subject: [Trilinos-Users] Copying Amesos2 Solver and Tpetra
>>>	MultiVectors
>>>Message-ID:
>>>	<2083721164.11288811.1402162195681.JavaMail.zimbra at stanford.edu>
>>>Content-Type: text/plain; charset=utf-8
>>>
>>>Hi everyone,
>>>
>>>I am new to working with Trilinos and have what is hopefully an easy
>>>C++/Trilinos question.
>>>
>>>I'm working with Amesos2 to solve linear systems, and I have code that
>>>is
>>>structured like:
>>>
>>>class A {
>>>...
>>>public B b_instance;
>>>...
>>>}
>>>
>>>class B {
>>>...
>>>public Teuchos::RCP<Amesos2::Solver<
>>>Tpetra::CrsMatrix<double,int,int,Tpetra::DefaultPlatform::DefaultPlatfor
>>>m
>>>T
>>>ype::NodeType> ,
>>>Tpetra::MultiVector<double,int,int,Tpetra::DefaultPlatform::DefaultPlatf
>>>o
>>>r
>>>mType::NodeType> > > amesos2_solver;
>>>public Teuchos::RCP<
>>>Tpetra::MultiVector<double,int,int,Tpetra::DefaultPlatform::DefaultPlatf
>>>o
>>>r
>>>mType::NodeType> > amesos2_X;
>>>public Teuchos::RCP<
>>>Tpetra::MultiVector<double,int,int,Tpetra::DefaultPlatform::DefaultPlatf
>>>o
>>>r
>>>mType::NodeType> > amesos2_B;
>>>...
>>>}
>>>
>>>In my program, I have several instances of class A.  For the first
>>>instance, I use its b_instance to compute the numeric (LU) factorization
>>>of a sparse matrix, which is another class variable of B.  At that
>>>point,
>>>amesos2_X and amesos2_B are just random Nx1 vectors, but they have been
>>>associated with amesos2_solver using "amesos2_solver =
>>>Amesos2::create<MAT,MV>(solver_name, my_matrix, amesos2_X, amesos2_B);".
>>>
>>>So now, what I would like to do is to copy the amesos2_solver,
>>>amesos2_X,
>>>and amesos2_B variables into the b_instances of my other instances of
>>>class A.  I don't want to just create copies of pointers that are all
>>>pointing to the same place - I would like separate variables that
>>>contain
>>>the same information.  The motivation for this is so I can create
>>>several
>>>independent solvers with the same matrix but only have to compute the LU
>>>factorization once.
>>>
>>>Can you recommend what is the best way to copy these variabes, or the
>>>underlying computed factorization, to other instances?  I'd be very
>>>grateful for any help you can provide - thanks very much!
>>
>>It is certainly possible to make a deep copy of a Tpetra::MultiVector.
>>You may use operator= to do that for now; in the development version of
>>Trilinos and in future releases, you may use the Tpetra::deep_copy
>>nonmember function.
>>
>>As far as I know, Amesos2 does not currently give you a way to make a
>>deep
>>copy of a Solver instance.  However, for your use case, you don't need to
>>do this.  You could simply share one Amesos2::Solver instance by RCP
>>among
>>all your class instances, and call it with different vectors by using the
>>two-argument version of solve():
>>
>>solver->solve (x, b);
>>
>>The input arguments of the two-argument version of solve() overwrite any
>>setB() or setX() inputs.
>>
>>btw, since you are using all default values of the Tpetra template
>>parameters, you may omit all but one of the Tpetra template parameters.
>>For example:
>>
>>Tpetra::CrsMatrix<double> A;
>>Tpetra::MultiVector<double> x;
>>
>>Please let us know if you have any further questions.
>>mfh
>>
>>_______________________________________________
>>Trilinos-Users mailing list
>>Trilinos-Users at software.sandia.gov
>>https://software.sandia.gov/mailman/listinfo/trilinos-users
>