[Trilinos-Users] performance of trilinos

Andrey Prokopenko prok at math.uh.edu
Wed Aug 22 19:49:19 MDT 2012


Chen-Liang,

Have you tried running the program with a custom rankfile binding the
processes to cores? I think it might be possible that the migration messes
up statistics.
Also, have you compiled Trilinos with threads? If yes, could it be that
your OMP_NUM_THREADS != 1?

Sincerely,
Andrey




On Wed, Aug 22, 2012 at 8:05 AM, Andrey Prokopenko <prok at math.uh.edu> wrote:

> Sorry everyone, we've accidentally moved the discussion away from the
> list. Here are the missing messages:
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Chen-Liang,
>
> I don't see any problems with the code. Could you provide with some info
> on the cluster you use and how do you run it (i.e. mpirun parameters)?
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Andrey:
> The attachments are log files about make and run. I run "run.sh" to
> compare the speedup. The time is 78.645025 (sec.),71.102732
> (sec.),66.204640 (sec.),52.381109 (sec.) and 65.338082 (sec.) ,respectively.
> I also tested it using pbs. The result is similiar. The pbs file is
> attached .
>
> ps:
> management sever:
> Dell PowerEdge R710 x1
> CPU:Intel(R) Quad Core E5520 Xeon(R) CPU x2 , 2.26GHz, 8M Cache, 5.86 GT/s
> QPI, Turbo;
> MEM:16GB Memory (8x2GB), 1066MHz, Dual Ranked UDIMMs for 2 Processors;
> HardDisk:146GB*2 3.5, 15K RPM Hot Plug SAS HD; 1TB*3 3.5-inch 7.2K RPM Hot
> Plug SAS HD
>
> compute server:
> Dell PowerEdge R900 x3
> CPU:PowerEdge(TM) R900 x4, 6 Core E7450, 12MB Cache, 2x2.40GHz, 1066MHz
> FSB, 90W;
> MEM:32GB (8x4GB), 667MHz, ECC
> HD:146GB*2 3.5, 15K RPM Hot Plug SAS HD; 1TB*3 3.5-inch 7.2K RPM Hot Plug
> SAS HD
>
>
>
> the echo of pbsnodes
> compute-0-1
> state = free
> np = 24
> ntype = cluster
> status = opsys=linux,uname=Linux compute-0-1.local 2.6.18-164.6.1.el5 #1
> SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
> 0,nusers=0,idletime=11236734,totmem=33959576kb,availmem=31820776kb,physmem=32939460kb,ncpus=24,loadave=0.00,netload=16239304497,state=free,jobs=
> 315.lreis-hpc.lreis.ac.cn 385.lreis-hpc.lreis.ac.cn
> ,varattr=,rectime=1345621337
>
> compute-0-2
> state = free
> np = 24
> ntype = cluster
> status = opsys=linux,uname=Linux compute-0-2.local 2.6.18-164.6.1.el5 #1
> SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
> 0,nusers=0,idletime=11236724,totmem=33959576kb,availmem=31818208kb,physmem=32939460kb,ncpus=24,loadave=0.00,netload=15779316071,state=free,jobs=
> 385.lreis-hpc.lreis.ac.cn 315.lreis-hpc.lreis.ac.cn
> ,varattr=,rectime=1345621307
>
> compute-0-0
> state = free
> np = 24
> ntype = cluster
> status = opsys=linux,uname=Linux compute-0-0.local 2.6.18-164.6.1.el5 #1
> SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
> 0,nusers=0,idletime=11236753,totmem=33959576kb,availmem=33523540kb,physmem=32939460kb,ncpus=24,loadave=0.02,netload=14747635908,state=free,jobs=,varattr=,rectime=1345621347
>
> On Wed, Aug 22, 2012 at 2:00 AM, chenliang wang <hi181904665 at msn.com>wrote:
>
>>  Andrey:
>> The attachments are log files about make and run. I run "run.sh" to
>> compare the speedup. The time is 78.645025 (sec.),71.102732
>> (sec.),66.204640 (sec.),52.381109 (sec.) and 65.338082 (sec.) ,respectively.
>> I also tested it using pbs. The result is similiar. The pbs file is
>> attached .
>>
>> ps:
>> management sever:
>> Dell PowerEdge R710  x1
>> CPU:Intel(R) Quad Core E5520 Xeon(R) CPU  x2 , 2.26GHz, 8M Cache, 5.86
>> GT/s QPI, Turbo;
>> MEM:16GB Memory (8x2GB), 1066MHz, Dual Ranked UDIMMs for 2 Processors;
>> HardDisk:146GB*2 3.5, 15K RPM Hot Plug SAS HD; 1TB*3 3.5-inch 7.2K RPM
>> Hot Plug SAS  HD
>>
>> compute server:
>> Dell PowerEdge R900  x3
>> CPU:PowerEdge(TM) R900 x4, 6 Core E7450, 12MB Cache, 2x2.40GHz, 1066MHz
>> FSB, 90W;
>> MEM:32GB (8x4GB), 667MHz, ECC
>> HD:146GB*2 3.5, 15K RPM Hot Plug SAS HD; 1TB*3 3.5-inch 7.2K RPM Hot Plug
>> SAS  HD
>>
>>
>>
>> the echo of pbsnodes
>> compute-0-1
>>      state = free
>>      np = 24
>>      ntype = cluster
>>      status = opsys=linux,uname=Linux compute-0-1.local
>> 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=?
>> 0,nsessions=?
>> 0,nusers=0,idletime=11236734,totmem=33959576kb,availmem=31820776kb,physmem=32939460kb,ncpus=24,loadave=0.00,netload=16239304497,state=free,jobs=
>> 315.lreis-hpc.lreis.ac.cn 385.lreis-hpc.lreis.ac.cn
>> ,varattr=,rectime=1345621337
>>
>> compute-0-2
>>      state = free
>>      np = 24
>>      ntype = cluster
>>      status = opsys=linux,uname=Linux compute-0-2.local
>> 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=?
>> 0,nsessions=?
>> 0,nusers=0,idletime=11236724,totmem=33959576kb,availmem=31818208kb,physmem=32939460kb,ncpus=24,loadave=0.00,netload=15779316071,state=free,jobs=
>> 385.lreis-hpc.lreis.ac.cn 315.lreis-hpc.lreis.ac.cn
>> ,varattr=,rectime=1345621307
>>
>> compute-0-0
>>      state = free
>>      np = 24
>>      ntype = cluster
>>      status = opsys=linux,uname=Linux compute-0-0.local
>> 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=?
>> 0,nsessions=?
>> 0,nusers=0,idletime=11236753,totmem=33959576kb,availmem=33523540kb,physmem=32939460kb,ncpus=24,loadave=0.02,netload=14747635908,state=free,jobs=,varattr=,rectime=1345621347
>>
>>
>>
>>
>> 于 2012-8-22 14:03, Andrey Prokopenko 写道:
>>
>> Chen-Liang,
>>
>>  I don't see any problems with the code. Could you provide with some
>> info on the cluster you use and how do you run it (i.e. mpirun parameters)?
>>
>>  Andrey
>>
>> On Tue, Aug 21, 2012 at 10:32 PM, chenliang wang <hi181904665 at msn.com>wrote:
>>
>>>  Hi,Andrey,
>>> I attached cpp file and Makefile(if u wanna use this, the path should be
>>> modified to the right location).   Thanks !
>>>
>>> 于 2012-8-22 11:06, Andrey Prokopenko 写道:
>>>
>>> Hi  Chen-Liang ,
>>>
>>>  This code cannot be compiled. For instance, the prototype of
>>> get_neighbors requires "int" as a first argument, but in the body of the
>>> main MyGlobalElements pointer is passed. Similar with A.InsertGlobalValues.
>>>
>>>  Are there misprints around during some copy/paste?
>>>
>>>  Andrey
>>>
>>> On Tue, Aug 21, 2012 at 8:38 PM, chenliang wang <hi181904665 at msn.com>wrote:
>>>
>>>>
>>>> Hi,
>>>> I use the code listed at the end of this mail to test the speedup using
>>>> Trilinos (10.10.2 compiled with mkl) in our DELL cluster.To my surprise,
>>>> there is no significant speedup no matter how many processors I use.(almost
>>>> 1.5x~1.2x)
>>>> Is that possible? Is there any more rough benchmarks to compare the
>>>> speedup ?
>>>>
>>>> Chen-Liang Wang
>>>>
>>>>
>>>> ***********************************************************************************************************************************************************************************************************************************************************************
>>>> //the example code:
>>>> // Solve a 2D Laplacian problem
>>>> // This example builds the matrix and solves it with AztecOO.
>>>> #include "Epetra_ConfigDefs.h"
>>>> #ifdef HAVE_MPI
>>>> #include "mpi.h"
>>>> #include "Epetra_MpiComm.h"
>>>> #else
>>>> #include "Epetra_SerialComm.h"
>>>> #endif
>>>> #include "Epetra_Map.h"
>>>> #include "Epetra_Vector.h"
>>>> #include "Epetra_CrsMatrix.h"
>>>> #include "AztecOO.h"
>>>>
>>>> // external function
>>>> void get_neighbours( const int i, const int nx, const int ny,
>>>>               int & left, int & right,
>>>>               int & lower, int & upper);
>>>>
>>>> // =========== //
>>>> // main driver //
>>>> // =========== //
>>>>
>>>> int main(int argc, char *argv[])
>>>> {
>>>>
>>>> #ifdef HAVE_MPI
>>>> MPI_Init(&argc, &argv);
>>>> Epetra_MpiComm Comm(MPI_COMM_WORLD);
>>>> #else
>>>> Epetra_SerialComm Comm;
>>>> #endif
>>>>
>>>> // number of nodes along the x- and y-axis
>>>> int nx = 4806;
>>>> int ny = 4046;
>>>> int NumGlobalElements = nx * ny;
>>>>
>>>> // create a linear map
>>>> Epetra_Map Map(NumGlobalElements,0,Comm);
>>>>
>>>> // local number of rows
>>>> int NumMyElements = Map.NumMyElements();
>>>> // get update list
>>>> int * MyGlobalElements = new int [NumMyElements];
>>>> Map.MyGlobalElements( MyGlobalElements );
>>>>
>>>> // Create a Epetra_Matrix with 5 nonzero per rows
>>>>
>>>> Epetra_CrsMatrix A(Copy,Map,5);
>>>>
>>>> // Add rows one-at-a-time
>>>> // Need some vectors to help
>>>>
>>>> double Values[4];
>>>> int Indices[4];
>>>> int NumEntries;
>>>> int left, right, lower, upper;
>>>> double diag = 4.0;
>>>>
>>>> for( int i=0 ; i<NumMyElements; ++i ) {
>>>>     int NumEntries=0;
>>>>     get_neighbours( MyGlobalElements, nx, ny,
>>>>              left, right, lower, upper);
>>>>     if( left != -1 ) {
>>>>     Indices[NumEntries] = left;
>>>>     Values[NumEntries] = -1.0;
>>>>     ++NumEntries;
>>>>     }
>>>>     if( right != -1 ) {
>>>>       Indices[NumEntries] = right;
>>>>       Values[NumEntries] = -1.0;
>>>>       ++NumEntries;
>>>>     }
>>>>     if( lower != -1 ) {
>>>>       Indices[NumEntries] = lower;
>>>>       Values[NumEntries] = -1.0;
>>>>       ++NumEntries;
>>>>     }
>>>>     if( upper != -1 ) {
>>>>       Indices[NumEntries] = upper;
>>>>       Values[NumEntries] = -1.0;
>>>>       ++NumEntries;
>>>>     }
>>>>     // put the off-diagonal entries
>>>>     A.InsertGlobalValues(MyGlobalElements, NumEntries, Values, Indices);
>>>>     // Put in the diagonal entry
>>>>     A.InsertGlobalValues(MyGlobalElements, 1, &diag,
>>>> MyGlobalElements+i);
>>>> }
>>>>
>>>> // Finish up
>>>> A.FillComplete();
>>>>
>>>> // create x and b vectors
>>>> Epetra_Vector x(Map);
>>>> Epetra_Vector b(Map);
>>>> //b.PutScalar(100.0);
>>>> b.Random();
>>>> x.Random();
>>>> // ==================== AZTECOO INTERFACE ======================
>>>>
>>>> // create linear problem  Ax=b
>>>> Epetra_LinearProblem Problem(&A,&x,&b);
>>>> // create AztecOO instance
>>>> AztecOO Solver(Problem);
>>>>
>>>> Solver.SetAztecOption( AZ_precond, AZ_none );//
>>>> //Solver.SetAztecOption( AZ_precond, AZ_Jacobi );//Jacobi
>>>> Solver.SetAztecOption( AZ_solver, AZ_cg );//CG
>>>> //Solver.SetAztecOption( AZ_solver, AZ_gmres );//
>>>> Solver.SetAztecOption(AZ_output,AZ_all);//
>>>> Solver.Iterate(100,1E-4);//
>>>>
>>>> // ==================== END OF AZTECOO INTERFACE ================
>>>>
>>>> if( Comm.MyPID() == 0 ) {
>>>>     cout << "Solver performed " << Solver.NumIters()
>>>>     << "iterations.\n";
>>>>     cout << "Norm of the true residual = " << Solver.TrueResidual() <<
>>>> endl;
>>>> }
>>>>
>>>> #ifdef HAVE_MPI
>>>> MPI_Finalize();
>>>> #endif
>>>>
>>>> return(EXIT_SUCCESS);
>>>>
>>>> }
>>>>
>>>>
>>>> /****************************************************************************/
>>>>
>>>> /****************************************************************************/
>>>>
>>>> /****************************************************************************/
>>>>
>>>> void get_neighbours( const int i, const int nx, const int ny,
>>>>               int & left, int & right,
>>>>               int & lower, int & upper)
>>>> {
>>>>
>>>> int ix, iy;
>>>> ix = i%nx;
>>>> iy = (i - ix)/nx;
>>>>
>>>> if( ix == 0 )
>>>>     left = -1;
>>>> else
>>>>     left = i-1;
>>>> if( ix == nx-1 )
>>>>     right = -1;
>>>> else
>>>>     right = i+1;
>>>> if( iy == 0 )
>>>>     lower = -1;
>>>> else
>>>>     lower = i-nx;
>>>> if( iy == ny-1 )
>>>>     upper = -1;
>>>> else
>>>>     upper = i+nx;
>>>>
>>>> return;
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Trilinos-Users mailing list
>>>> Trilinos-Users at software.sandia.gov
>>>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://software.sandia.gov/pipermail/trilinos-users/attachments/20120822/fe4b928b/attachment.html 


More information about the Trilinos-Users mailing list