[Trilinos-Users] strange performance on simple
matrix/vector product
Hoekstra, Robert J
rjhoeks at sandia.gov
Fri Feb 29 10:35:35 MST 2008
Daniele,
I would also recommend looping this test as well, at least 100 times. This
is a small enough problem that you will be significantly impacted by system calls, etc.
If you want good timings, you really want the average of many operations.
Rob
Robert Hoekstra
____________________________
Electrical & Microsystems Modeling
Sandia National Laboratories
P.O. Box 5800 / MS 0316
Albuquerque, NM 87185
phone: 505-844-7627
fax: 505-284-5451
e-mail: rjhoeks at sandia.gov
web: http://www.cs.sandia.gov
-----Original Message-----
From: trilinos-users-bounces at software.sandia.gov [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of Daniele Bettella
Sent: Friday, February 29, 2008 7:55 AM
To: trilinos-users at software.sandia.gov
Subject: Re: [Trilinos-Users] strange performance on simple matrix/vector product
Thanks for the answer,
1) yes, you are correct, sorry I didn't state it in my original post 2)this is the code used for importing, it's based on the teuchos tutorial
#ifdef HAVE_MPI
MPI_Init(&argc, &argv);
Epetra_MpiComm Comm(MPI_COMM_WORLD);
#else
Epetra_SerialComm Comm;
#endif
Epetra_Map* readMap;
Epetra_CrsMatrix* readA;
Epetra_Vector* readx;
Epetra_Vector* readb;
Epetra_Vector* readxexact;
char* matrix_file;
int matrix_format;
if(argc > 1){
matrix_file = argv[1];
}
else{
matrix_file = "tols4000.rua";
}
Trilinos_Util_ReadHb2Epetra(matrix_file, Comm, readMap, readA, readx, readb, readxexact);
Epetra_Map map(readMap->NumGlobalElements(), 0, Comm);
Epetra_CrsMatrix A(Copy, map, 0);
const Epetra_Map &OriginalMap = readA->RowMatrixRowMap() ;
assert (OriginalMap.SameAs(*readMap));
Epetra_Export exporter(OriginalMap, map);
A.Export(*readA, exporter, Add);
A.FillComplete();
3)this is how I configured Trilinos on my laptop ../configure --disable-default-packages --enable-pytrilinos --enable-teuchos --enable-epetra --enable-triutils --enable-aztecoo --with-mpi-compilers=/usr/local/bin/
on the Xeon I used basically the same command except for the fact that I had to use the -fPic flag for cxx, c and fortran flags.
as for compiling the test itself I just use -O3 (same results with -O2)
I read that guide already, thanks for pointing it out, nothing that tells me why the same test runs slower on the xeon than on my laptop...
at least for what I can understand...
hope that helps, thanks again
Daniele
Heroux, Michael A ha scritto:
> Daniele,
>
> A few questions:
>
> 1) I assume first that you are using Epetra_CrsMatrix and its
> Multiply method. Correct?
>
> 2) How are you importing the matrix into Epetra?
>
> 3) How are you compiling Trilinos? Depending on how the matrix is
> imported, the Multiply method is very sensitive to either the C++ or
> the Fortran compiler optimization flags.
>
> Also, there is a performance optimization guide
>
> http://trilinos.sandia.gov/packages/epetra/EpetraPerformanceGuide.pdf
>
> Mike
>
>
> On 2/28/08 11:11 AM, "Daniele Bettella" <jagfsdfhf at libero.it> wrote:
>
> Hello, I have a strange performance problem on Epetra, using a simple
> multiply.
> I have two configurations for testing purposes, the first one is a
> personal laptop, single core duo T2300 with 1GB of RAM, the second one
> is a dual Xeon quad core E5345 with 16GB of RAM
>
> I take as an example the matrix e30r5000 from matrix market. The
> matrix
> is imported into trilinos and multiplied via Epetra multiply
> function by
> a random vector, i take the time and i get 0.0148 seconds on my laptop
> and 0.00227 seconds on the dual Xeon; now, since I'm not the only one
> using the Xeon I wrote another test for comparison; this time I
> multiply
> "by hand" storing the matrix in crs, this is what I got:
> on my laptop - 0.00168 seconds
> on the Xeon - 0.00168 seconds
> so basically I have the same time on manual implementation, but
> trilinos
> is sensibly slower on the dual xeon.
> I prepared many more matrixes on which I tested the same program; on
> many of them the Xeon has better times performing the manual multiply
> then the laptop, but Trilinos multiply is always slower...
>
> I understand it's quite hard to tell what the problem might be, but is
> there something I'm missing?
> I mean, I'm using just one core on each machine, but a core on my
> laptop
> is a 1.6Ghz, wether the Xeon goes at 2.33Ghz... 32 bit vs 64 bit,
> 2MB vs
> 8MB of cache... and it generally shows on the product I wrote...
> if anyone has any idea I'd be gratefull
>
> thanks in advance and sorry for my english
>
> Daniele
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov
http://software.sandia.gov/mailman/listinfo/trilinos-users
More information about the Trilinos-Users
mailing list