[Trilinos-Users] [EXTERNAL] Solving linear systems on GPU

Elliott, James John jjellio at sandia.gov
Tue Jan 31 13:15:54 EST 2017


Ashesh,

You're welcome.

If you are curious about devices and execution spaces, you may want to write a small function like this:

The compiler defines are needed because the types will not be defined if you did not compile with that execution space enabled. The process is convoluted, and perhaps in the future, this functionality will be pushed into Kokkos. The point is to be able to quickly report where your code will run and which memory it will use.

  template <typename node_type>
  std::string
  execspace_toString ()
  {
    typedef typename node_type::device_type device_type;
    typedef typename device_type::execution_space ExecSpace;

    bool have_openmp = false;
    bool have_serial = false;
    bool have_cuda = false;

    #ifdef KOKKOS_HAVE_OPENMP
      have_openmp = true;
    #endif

    #ifdef KOKKOS_HAVE_SERIAL
      have_serial = true;
    #endif

    #ifdef KOKKOS_HAVE_CUDA
      have_cuda = true;
    #endif

      if (!(have_openmp || have_serial || have_cuda)) {
        return ("Not OpenMP, Not Serial, and Not CUDA, are you GOD?");
      }

    #ifdef KOKKOS_HAVE_SERIAL
      else if (std::is_same<ExecSpace, Kokkos::Serial>::value) {
        return ("Serial");
      }
    #endif

    #ifdef KOKKOS_HAVE_OPENMP
      else if (std::is_same<ExecSpace, Kokkos::OpenMP>::value) {
        return ("OpenMP");
      }
    #endif

    #ifdef KOKKOS_HAVE_CUDA
      else if (std::is_same <ExecSpace , Kokkos::Cuda>::value)
      {
        return ("CUDA");
      }
    #endif
      else {
        return ("How did we get here? Why do we exist?");
      }
  }

  template <typename node_type>
  std::string
  memspace_toString ()
  {

    typedef typename node_type::device_type device_type;
    typedef typename device_type::memory_space MemSpace;

    if (std::is_same<MemSpace, Kokkos::HostSpace>::value) {
      return ("HostSpace");
    }

    #ifdef KOKKOS_HAVE_CUDA
      else if (std::is_same<MemSpace, Kokkos::CudaSpace>::value) {
        return ("CudaSpace");
      }
    #endif

    #ifdef KOKKOS_HAVE_HBWSPACE
      else if (std::is_same<MemSpace, Kokkos::HBWSpace>::value) {
        return ("HBWSpace");
      }
    #endif

    else {
      return ("Memory space did not match CudaSpace, HBWSpace, or HostSpace.");
    }

  }

  std::cout << "I ran on " << execspace_toString<node_type> ()
                << std::endl
                << "I was stored in " << memspace_toString<node_type> ()
                << std::endl;

Output: (E.g., for an OpenMP code)
I ran on OpenMP
I was stored in HostSpace

James

________________________________
From: Ashesh Chattopadhyay <ashesh6810 at gmail.com>
Sent: Tuesday, January 31, 2017 10:55 AM
To: Elliott, James John; Trilinos-Users at trilinos.org
Subject: Re: [Trilinos-Users] [EXTERNAL] Solving linear systems on GPU


Sir,
Thanks. This helps a lot. I was looking into the node type and I guess I understand it now. Once again, thank you.
Ashesh

Ashesh,

Look at the 4th template parameter for Tpetra CRS Matrix and MultiVector (the node_type).

If you built with CUDA, then

  typedef double   scalar_type;
  typedef int local_ordinal_type;
  typedef int global_ordinal_type;
  // Use the CUDA execution space.
  // A 'device' as a tuple of ExecutionSpace, MemorySpace
  // Specify the ExecSpace, but leave the other out, and it will use the default memory space for that execution space (CUDA UVM)
  typedef Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::CUDA> node_type;

  // Now define your matrix, MV, etc..
  typedef Tpetra::CrsMatrix<scalar_type, local_ordinal_type, global_ordinal_type, node_type> crs_matrix_type;

The device is a Kokkos device, but wrapped by Tpetra for compatibility reasons.
https://trilinos.org/docs/dev/packages/kokkos/doc/html/structKokkos_1_1Device.html

James



________________________________
From: Trilinos-Users <trilinos-users-bounces at trilinos.org<mailto:trilinos-users-bounces at trilinos.org>> on behalf of Rajamanickam, Sivasankaran <srajama at sandia.gov<mailto:srajama at sandia.gov>>
Sent: Tuesday, January 31, 2017 8:38 AM
To: Ashesh Chattopadhyay
Cc: Trilinos-Users at trilinos.org<mailto:Trilinos-Users at trilinos.org>
Subject: Re: [Trilinos-Users] [EXTERNAL] Solving linear systems on GPU


Ashes

   The execution space is based on the Tpetra matrix and vector execution space.


    Let me see if we can find a short example for you. BTW, please CC the list so you get responses from different people and the responses are archived.


-Siva


________________________________
From: Ashesh Chattopadhyay <ashesh6810 at gmail.com<mailto:ashesh6810 at gmail.com>>
Sent: Tuesday, January 31, 2017 8:18 AM
To: Rajamanickam, Sivasankaran
Subject: Re: [Trilinos-Users] [EXTERNAL] Solving linear systems on GPU

Sir,
I was using this function to set everything up in Belos to solve the linear system.

template<class MV, class OP>
MV  solve ( MV& X, const MV& B, const OP& A)
{
  using Teuchos::ParameterList;
  using Teuchos::parameterList;
  using Teuchos::RCP;
  using Teuchos::rcp;
  using Teuchos::rcpFromRef;
  typedef typename MV::scalar_type scalar_type;

  RCP<ParameterList> solverParams = parameterList();
solverParams->set ("Num Blocks", 40);
solverParams->set ("Maximum Iterations", 5000);
solverParams->set ("Convergence Tolerance", 1.0e-3);
Belos::SolverFactory<scalar_type, MV, OP> factory;
  RCP<Belos::SolverManager<scalar_type, MV, OP> > solver =
    factory.create ("GMRES", solverParams);


  typedef Belos::LinearProblem<scalar_type, MV, OP> problem_type;
  RCP<problem_type> problem =
    rcp (new problem_type (rcpFromRef (A), rcpFromRef (X), rcpFromRef (B)));
 problem->setProblem ();

  solver->setProblem (problem);
  Belos::ReturnType result = solver->solve();
 int numIters = solver->getNumIters();
if (result == Belos::Converged) {
   std::cout << "The Belos solve took " << numIters << " iteration(s) to reach "
      "a relative residual tolerance of " << 1.0e-4 << "." << std::endl;
  } else {
   std::cout << "The Belos solve took " << numIters << " iteration(s), but did not reach "
      "a relative residual tolerance of " << 1.0e-4 << "." << std::endl;
  }


return X;

Is there a way to specify, something like execution space as device or something like that, the way one can do in Kokkos to execute on GPU? Or any other way, kindly point me to a reference from which I can learn how to do it.

Thanks
Ashesh



On Tue, Jan 31, 2017 at 8:06 AM, Ashesh Chattopadhyay <ashesh6810 at gmail.com<mailto:ashesh6810 at gmail.com>> wrote:

Sir,
Thank you. Yes , I have a Tpetra Crs Matrix A and a multivector b. I simply wanna solve Ax=b,without preconditioning. My A matrix is a saddle point matrix coming out of a modified nodal analysis algorithm. Can you please point me to an example or documentation as to how to use Belos to solve this on GPU? Till now I was doing this on CPU using GMRES in belos, the way the examples are there on git.

Thanks
Ashesh

On Jan 31, 2017 7:33 AM, "Rajamanickam, Sivasankaran" <srajama at sandia.gov<mailto:srajama at sandia.gov>> wrote:
...



--
Research Assistant, Multi-Scale Multi-Physics Computation Lab
University of Texas at El Paso
Texas, El Paso, USA
(+1) 915-355-5013<tel:(915)%20355-5013>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20170131/1870deb3/attachment.html>


More information about the Trilinos-Users mailing list