[Trilinos-Users] performance of trilinos

Andrey Prokopenko prok at math.uh.edu
Wed Aug 22 08:05:14 MDT 2012


Sorry everyone, we've accidentally moved the discussion away from the list.
Here are the missing messages:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Chen-Liang,

I don't see any problems with the code. Could you provide with some info on
the cluster you use and how do you run it (i.e. mpirun parameters)?

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Andrey:
The attachments are log files about make and run. I run "run.sh" to compare
the speedup. The time is 78.645025 (sec.),71.102732 (sec.),66.204640
(sec.),52.381109 (sec.) and 65.338082 (sec.) ,respectively.
I also tested it using pbs. The result is similiar. The pbs file is
attached .

ps:
management sever:
Dell PowerEdge R710 x1
CPU:Intel(R) Quad Core E5520 Xeon(R) CPU x2 , 2.26GHz, 8M Cache, 5.86 GT/s
QPI, Turbo;
MEM:16GB Memory (8x2GB), 1066MHz, Dual Ranked UDIMMs for 2 Processors;
HardDisk:146GB*2 3.5, 15K RPM Hot Plug SAS HD; 1TB*3 3.5-inch 7.2K RPM Hot
Plug SAS HD

compute server:
Dell PowerEdge R900 x3
CPU:PowerEdge(TM) R900 x4, 6 Core E7450, 12MB Cache, 2x2.40GHz, 1066MHz
FSB, 90W;
MEM:32GB (8x4GB), 667MHz, ECC
HD:146GB*2 3.5, 15K RPM Hot Plug SAS HD; 1TB*3 3.5-inch 7.2K RPM Hot Plug
SAS HD



the echo of pbsnodes
compute-0-1
state = free
np = 24
ntype = cluster
status = opsys=linux,uname=Linux compute-0-1.local 2.6.18-164.6.1.el5 #1
SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=11236734,totmem=33959576kb,availmem=31820776kb,physmem=32939460kb,ncpus=24,loadave=0.00,netload=16239304497,state=free,jobs=
315.lreis-hpc.lreis.ac.cn 385.lreis-hpc.lreis.ac.cn
,varattr=,rectime=1345621337

compute-0-2
state = free
np = 24
ntype = cluster
status = opsys=linux,uname=Linux compute-0-2.local 2.6.18-164.6.1.el5 #1
SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=11236724,totmem=33959576kb,availmem=31818208kb,physmem=32939460kb,ncpus=24,loadave=0.00,netload=15779316071,state=free,jobs=
385.lreis-hpc.lreis.ac.cn 315.lreis-hpc.lreis.ac.cn
,varattr=,rectime=1345621307

compute-0-0
state = free
np = 24
ntype = cluster
status = opsys=linux,uname=Linux compute-0-0.local 2.6.18-164.6.1.el5 #1
SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=11236753,totmem=33959576kb,availmem=33523540kb,physmem=32939460kb,ncpus=24,loadave=0.02,netload=14747635908,state=free,jobs=,varattr=,rectime=1345621347

On Wed, Aug 22, 2012 at 2:00 AM, chenliang wang <hi181904665 at msn.com> wrote:

>  Andrey:
> The attachments are log files about make and run. I run "run.sh" to
> compare the speedup. The time is 78.645025 (sec.),71.102732
> (sec.),66.204640 (sec.),52.381109 (sec.) and 65.338082 (sec.) ,respectively.
> I also tested it using pbs. The result is similiar. The pbs file is
> attached .
>
> ps:
> management sever:
> Dell PowerEdge R710  x1
> CPU:Intel(R) Quad Core E5520 Xeon(R) CPU  x2 , 2.26GHz, 8M Cache, 5.86
> GT/s QPI, Turbo;
> MEM:16GB Memory (8x2GB), 1066MHz, Dual Ranked UDIMMs for 2 Processors;
> HardDisk:146GB*2 3.5, 15K RPM Hot Plug SAS HD; 1TB*3 3.5-inch 7.2K RPM Hot
> Plug SAS  HD
>
> compute server:
> Dell PowerEdge R900  x3
> CPU:PowerEdge(TM) R900 x4, 6 Core E7450, 12MB Cache, 2x2.40GHz, 1066MHz
> FSB, 90W;
> MEM:32GB (8x4GB), 667MHz, ECC
> HD:146GB*2 3.5, 15K RPM Hot Plug SAS HD; 1TB*3 3.5-inch 7.2K RPM Hot Plug
> SAS  HD
>
>
>
> the echo of pbsnodes
> compute-0-1
>      state = free
>      np = 24
>      ntype = cluster
>      status = opsys=linux,uname=Linux compute-0-1.local 2.6.18-164.6.1.el5
> #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
> 0,nusers=0,idletime=11236734,totmem=33959576kb,availmem=31820776kb,physmem=32939460kb,ncpus=24,loadave=0.00,netload=16239304497,state=free,jobs=
> 315.lreis-hpc.lreis.ac.cn 385.lreis-hpc.lreis.ac.cn
> ,varattr=,rectime=1345621337
>
> compute-0-2
>      state = free
>      np = 24
>      ntype = cluster
>      status = opsys=linux,uname=Linux compute-0-2.local 2.6.18-164.6.1.el5
> #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
> 0,nusers=0,idletime=11236724,totmem=33959576kb,availmem=31818208kb,physmem=32939460kb,ncpus=24,loadave=0.00,netload=15779316071,state=free,jobs=
> 385.lreis-hpc.lreis.ac.cn 315.lreis-hpc.lreis.ac.cn
> ,varattr=,rectime=1345621307
>
> compute-0-0
>      state = free
>      np = 24
>      ntype = cluster
>      status = opsys=linux,uname=Linux compute-0-0.local 2.6.18-164.6.1.el5
> #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 0,nsessions=?
> 0,nusers=0,idletime=11236753,totmem=33959576kb,availmem=33523540kb,physmem=32939460kb,ncpus=24,loadave=0.02,netload=14747635908,state=free,jobs=,varattr=,rectime=1345621347
>
>
>
>
> 于 2012-8-22 14:03, Andrey Prokopenko 写道:
>
> Chen-Liang,
>
>  I don't see any problems with the code. Could you provide with some info
> on the cluster you use and how do you run it (i.e. mpirun parameters)?
>
>  Andrey
>
> On Tue, Aug 21, 2012 at 10:32 PM, chenliang wang <hi181904665 at msn.com>wrote:
>
>>  Hi,Andrey,
>> I attached cpp file and Makefile(if u wanna use this, the path should be
>> modified to the right location).   Thanks !
>>
>> 于 2012-8-22 11:06, Andrey Prokopenko 写道:
>>
>> Hi  Chen-Liang ,
>>
>>  This code cannot be compiled. For instance, the prototype of
>> get_neighbors requires "int" as a first argument, but in the body of the
>> main MyGlobalElements pointer is passed. Similar with A.InsertGlobalValues.
>>
>>  Are there misprints around during some copy/paste?
>>
>>  Andrey
>>
>> On Tue, Aug 21, 2012 at 8:38 PM, chenliang wang <hi181904665 at msn.com>wrote:
>>
>>>
>>> Hi,
>>> I use the code listed at the end of this mail to test the speedup using
>>> Trilinos (10.10.2 compiled with mkl) in our DELL cluster.To my surprise,
>>> there is no significant speedup no matter how many processors I use.(almost
>>> 1.5x~1.2x)
>>> Is that possible? Is there any more rough benchmarks to compare the
>>> speedup ?
>>>
>>> Chen-Liang Wang
>>>
>>>
>>> ***********************************************************************************************************************************************************************************************************************************************************************
>>> //the example code:
>>> // Solve a 2D Laplacian problem
>>> // This example builds the matrix and solves it with AztecOO.
>>> #include "Epetra_ConfigDefs.h"
>>> #ifdef HAVE_MPI
>>> #include "mpi.h"
>>> #include "Epetra_MpiComm.h"
>>> #else
>>> #include "Epetra_SerialComm.h"
>>> #endif
>>> #include "Epetra_Map.h"
>>> #include "Epetra_Vector.h"
>>> #include "Epetra_CrsMatrix.h"
>>> #include "AztecOO.h"
>>>
>>> // external function
>>> void get_neighbours( const int i, const int nx, const int ny,
>>>               int & left, int & right,
>>>               int & lower, int & upper);
>>>
>>> // =========== //
>>> // main driver //
>>> // =========== //
>>>
>>> int main(int argc, char *argv[])
>>> {
>>>
>>> #ifdef HAVE_MPI
>>> MPI_Init(&argc, &argv);
>>> Epetra_MpiComm Comm(MPI_COMM_WORLD);
>>> #else
>>> Epetra_SerialComm Comm;
>>> #endif
>>>
>>> // number of nodes along the x- and y-axis
>>> int nx = 4806;
>>> int ny = 4046;
>>> int NumGlobalElements = nx * ny;
>>>
>>> // create a linear map
>>> Epetra_Map Map(NumGlobalElements,0,Comm);
>>>
>>> // local number of rows
>>> int NumMyElements = Map.NumMyElements();
>>> // get update list
>>> int * MyGlobalElements = new int [NumMyElements];
>>> Map.MyGlobalElements( MyGlobalElements );
>>>
>>> // Create a Epetra_Matrix with 5 nonzero per rows
>>>
>>> Epetra_CrsMatrix A(Copy,Map,5);
>>>
>>> // Add rows one-at-a-time
>>> // Need some vectors to help
>>>
>>> double Values[4];
>>> int Indices[4];
>>> int NumEntries;
>>> int left, right, lower, upper;
>>> double diag = 4.0;
>>>
>>> for( int i=0 ; i<NumMyElements; ++i ) {
>>>     int NumEntries=0;
>>>     get_neighbours( MyGlobalElements, nx, ny,
>>>              left, right, lower, upper);
>>>     if( left != -1 ) {
>>>     Indices[NumEntries] = left;
>>>     Values[NumEntries] = -1.0;
>>>     ++NumEntries;
>>>     }
>>>     if( right != -1 ) {
>>>       Indices[NumEntries] = right;
>>>       Values[NumEntries] = -1.0;
>>>       ++NumEntries;
>>>     }
>>>     if( lower != -1 ) {
>>>       Indices[NumEntries] = lower;
>>>       Values[NumEntries] = -1.0;
>>>       ++NumEntries;
>>>     }
>>>     if( upper != -1 ) {
>>>       Indices[NumEntries] = upper;
>>>       Values[NumEntries] = -1.0;
>>>       ++NumEntries;
>>>     }
>>>     // put the off-diagonal entries
>>>     A.InsertGlobalValues(MyGlobalElements, NumEntries, Values, Indices);
>>>     // Put in the diagonal entry
>>>     A.InsertGlobalValues(MyGlobalElements, 1, &diag, MyGlobalElements+i);
>>> }
>>>
>>> // Finish up
>>> A.FillComplete();
>>>
>>> // create x and b vectors
>>> Epetra_Vector x(Map);
>>> Epetra_Vector b(Map);
>>> //b.PutScalar(100.0);
>>> b.Random();
>>> x.Random();
>>> // ==================== AZTECOO INTERFACE ======================
>>>
>>> // create linear problem  Ax=b
>>> Epetra_LinearProblem Problem(&A,&x,&b);
>>> // create AztecOO instance
>>> AztecOO Solver(Problem);
>>>
>>> Solver.SetAztecOption( AZ_precond, AZ_none );//
>>> //Solver.SetAztecOption( AZ_precond, AZ_Jacobi );//Jacobi
>>> Solver.SetAztecOption( AZ_solver, AZ_cg );//CG
>>> //Solver.SetAztecOption( AZ_solver, AZ_gmres );//
>>> Solver.SetAztecOption(AZ_output,AZ_all);//
>>> Solver.Iterate(100,1E-4);//
>>>
>>> // ==================== END OF AZTECOO INTERFACE ================
>>>
>>> if( Comm.MyPID() == 0 ) {
>>>     cout << "Solver performed " << Solver.NumIters()
>>>     << "iterations.\n";
>>>     cout << "Norm of the true residual = " << Solver.TrueResidual() <<
>>> endl;
>>> }
>>>
>>> #ifdef HAVE_MPI
>>> MPI_Finalize();
>>> #endif
>>>
>>> return(EXIT_SUCCESS);
>>>
>>> }
>>>
>>>
>>> /****************************************************************************/
>>>
>>> /****************************************************************************/
>>>
>>> /****************************************************************************/
>>>
>>> void get_neighbours( const int i, const int nx, const int ny,
>>>               int & left, int & right,
>>>               int & lower, int & upper)
>>> {
>>>
>>> int ix, iy;
>>> ix = i%nx;
>>> iy = (i - ix)/nx;
>>>
>>> if( ix == 0 )
>>>     left = -1;
>>> else
>>>     left = i-1;
>>> if( ix == nx-1 )
>>>     right = -1;
>>> else
>>>     right = i+1;
>>> if( iy == 0 )
>>>     lower = -1;
>>> else
>>>     lower = i-nx;
>>> if( iy == ny-1 )
>>>     upper = -1;
>>> else
>>>     upper = i+nx;
>>>
>>> return;
>>>
>>> }
>>>
>>>
>>>
>>> _______________________________________________
>>> Trilinos-Users mailing list
>>> Trilinos-Users at software.sandia.gov
>>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://software.sandia.gov/pipermail/trilinos-users/attachments/20120822/81f468c0/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run.log
Type: application/octet-stream
Size: 35996 bytes
Desc: not available
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20120822/81f468c0/attachment-0003.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run.sh
Type: application/x-sh
Size: 140 bytes
Desc: not available
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20120822/81f468c0/attachment-0001.sh 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tltest.pbs
Type: application/octet-stream
Size: 255 bytes
Desc: not available
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20120822/81f468c0/attachment-0004.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log
Type: application/octet-stream
Size: 984 bytes
Desc: not available
Url : https://software.sandia.gov/pipermail/trilinos-users/attachments/20120822/81f468c0/attachment-0005.obj 


More information about the Trilinos-Users mailing list