[Trilinos-Users] strange performance on simple matrix/vector product

Fri Feb 29 10:35:35 MST 2008

Daniele,

I would also recommend looping this test as well, at least 100 times.  This
is a small enough problem that you will be significantly impacted by system calls, etc.
If you want good timings, you really want the average of many operations.

Rob

Robert Hoekstra
____________________________
Electrical & Microsystems Modeling
Sandia National Laboratories
P.O. Box 5800 / MS 0316
Albuquerque, NM 87185
phone: 505-844-7627
fax: 505-284-5451
e-mail: rjhoeks at sandia.gov
web: http://www.cs.sandia.gov

-----Original Message-----
From: trilinos-users-bounces at software.sandia.gov [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of Daniele Bettella
Sent: Friday, February 29, 2008 7:55 AM
To: trilinos-users at software.sandia.gov
Subject: Re: [Trilinos-Users] strange performance on simple matrix/vector product

Thanks for the answer,
1) yes, you are correct, sorry I didn't state it in my original post 2)this is the code used for importing, it's based on the teuchos tutorial

#ifdef HAVE_MPI
  MPI_Init(&argc, &argv);
  Epetra_MpiComm Comm(MPI_COMM_WORLD);
#else
  Epetra_SerialComm Comm;
#endif
  Epetra_Map* readMap;
  Epetra_CrsMatrix* readA;
  Epetra_Vector* readx;
  Epetra_Vector* readb;
  Epetra_Vector* readxexact;
  char* matrix_file;
  int matrix_format;
  if(argc > 1){
    matrix_file = argv[1];
  }
  else{
    matrix_file = "tols4000.rua";
  }
  Trilinos_Util_ReadHb2Epetra(matrix_file, Comm, readMap, readA, readx, readb, readxexact);
  Epetra_Map map(readMap->NumGlobalElements(), 0, Comm);
  Epetra_CrsMatrix A(Copy, map, 0);

  const Epetra_Map &OriginalMap = readA->RowMatrixRowMap() ;
  assert (OriginalMap.SameAs(*readMap));
  Epetra_Export exporter(OriginalMap, map);
  A.Export(*readA, exporter, Add);
  A.FillComplete();

3)this is how I configured Trilinos on my laptop ../configure --disable-default-packages --enable-pytrilinos --enable-teuchos --enable-epetra --enable-triutils --enable-aztecoo --with-mpi-compilers=/usr/local/bin/
on the Xeon I used basically the same command except for the fact that I had to use the -fPic flag for cxx, c and fortran flags.

as for compiling the test itself I just use -O3 (same results with -O2)

I read that guide already, thanks for pointing it out, nothing that tells me why the same test runs slower on the xeon than on my laptop...
at least for what I can understand...

hope that helps, thanks again

Daniele

Heroux, Michael A ha scritto:
> Daniele,
>
> A few questions:
>
> 1)  I assume first that you are using Epetra_CrsMatrix and its
> Multiply method.  Correct?
>
> 2) How are you importing the matrix into Epetra?
>
> 3) How are you compiling Trilinos?  Depending on how the matrix is
> imported, the Multiply method is very sensitive to either the C++ or
> the Fortran compiler optimization flags.
>
> Also, there is a performance optimization guide
>
> http://trilinos.sandia.gov/packages/epetra/EpetraPerformanceGuide.pdf
>
> Mike
>
>
> On 2/28/08 11:11 AM, "Daniele Bettella" <jagfsdfhf at libero.it> wrote:
>
>     Hello, I have a strange performance problem on Epetra, using a simple
>     multiply.
>     I have two configurations for testing purposes, the first one is a
>     personal laptop, single core duo T2300 with 1GB of RAM, the second one
>     is a dual Xeon quad core E5345 with 16GB of RAM
>
>     I take as an example the matrix e30r5000 from matrix market. The
>     matrix
>     is imported into trilinos and multiplied via Epetra multiply
>     function by
>     a random vector, i take the time and i get 0.0148 seconds on my laptop
>     and 0.00227 seconds on the dual Xeon; now, since I'm not the only one
>     using the Xeon I wrote another test for comparison; this time I
>     multiply
>     "by hand" storing the matrix in crs, this is what I got:
>     on my laptop - 0.00168 seconds
>     on the Xeon - 0.00168 seconds
>     so basically I have the same time on manual implementation, but
>     trilinos
>     is sensibly slower on the dual xeon.
>     I prepared many more matrixes on which I tested the same program; on
>     many of them the Xeon has better times performing the manual multiply
>     then the laptop, but Trilinos multiply is always slower...
>
>     I understand it's quite hard to tell what the problem might be, but is
>     there something I'm missing?
>     I mean, I'm using just one core on each machine, but a core on my
>     laptop
>     is a 1.6Ghz, wether the Xeon goes at 2.33Ghz... 32 bit vs 64 bit,
>     2MB vs
>     8MB of cache... and it generally shows on the product I wrote...
>     if anyone has any idea I'd be gratefull
>
>     thanks in advance and sorry for my english
>
>     Daniele
>
>     _______________________________________________
>     Trilinos-Users mailing list
>     Trilinos-Users at software.sandia.gov
>     http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>

_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov
http://software.sandia.gov/mailman/listinfo/trilinos-users