[Trilinos-Users] strange performance on simple matrix/vector product

Fri Feb 29 10:45:47 MST 2008

Actually I already do that (exactly 100 times :D) and then take the average
time...
too many things I forgot to mention...

Daniele

On Fri, Feb 29, 2008 at 6:35 PM, Hoekstra, Robert J <rjhoeks at sandia.gov>
wrote:

>
> Daniele,
>
> I would also recommend looping this test as well, at least 100 times.
>  This
> is a small enough problem that you will be significantly impacted by
> system calls, etc.
> If you want good timings, you really want the average of many operations.
>
> Rob
>
> Robert Hoekstra
> ____________________________
> Electrical & Microsystems Modeling
> Sandia National Laboratories
> P.O. Box 5800 / MS 0316
> Albuquerque, NM 87185
> phone: 505-844-7627
> fax: 505-284-5451
> e-mail: rjhoeks at sandia.gov
> web: http://www.cs.sandia.gov
>
>
> -----Original Message-----
> From: trilinos-users-bounces at software.sandia.gov [mailto:
> trilinos-users-bounces at software.sandia.gov] On Behalf Of Daniele Bettella
> Sent: Friday, February 29, 2008 7:55 AM
> To: trilinos-users at software.sandia.gov
> Subject: Re: [Trilinos-Users] strange performance on simple matrix/vector
> product
>
> Thanks for the answer,
> 1) yes, you are correct, sorry I didn't state it in my original post
> 2)this is the code used for importing, it's based on the teuchos tutorial
>
> #ifdef HAVE_MPI
>  MPI_Init(&argc, &argv);
>  Epetra_MpiComm Comm(MPI_COMM_WORLD);
> #else
>  Epetra_SerialComm Comm;
> #endif
>  Epetra_Map* readMap;
>  Epetra_CrsMatrix* readA;
>  Epetra_Vector* readx;
>  Epetra_Vector* readb;
>  Epetra_Vector* readxexact;
>  char* matrix_file;
>  int matrix_format;
>  if(argc > 1){
>    matrix_file = argv[1];
>  }
>  else{
>    matrix_file = "tols4000.rua";
>  }
>  Trilinos_Util_ReadHb2Epetra(matrix_file, Comm, readMap, readA, readx,
> readb, readxexact);
>  Epetra_Map map(readMap->NumGlobalElements(), 0, Comm);
>  Epetra_CrsMatrix A(Copy, map, 0);
>
>  const Epetra_Map &OriginalMap = readA->RowMatrixRowMap() ;
>  assert (OriginalMap.SameAs(*readMap));
>  Epetra_Export exporter(OriginalMap, map);
>  A.Export(*readA, exporter, Add);
>  A.FillComplete();
>
> 3)this is how I configured Trilinos on my laptop ../configure
> --disable-default-packages --enable-pytrilinos --enable-teuchos
> --enable-epetra --enable-triutils --enable-aztecoo
> --with-mpi-compilers=/usr/local/bin/
> on the Xeon I used basically the same command except for the fact that I
> had to use the -fPic flag for cxx, c and fortran flags.
>
> as for compiling the test itself I just use -O3 (same results with -O2)
>
> I read that guide already, thanks for pointing it out, nothing that tells
> me why the same test runs slower on the xeon than on my laptop...
> at least for what I can understand...
>
> hope that helps, thanks again
>
> Daniele
>
> Heroux, Michael A ha scritto:
> > Daniele,
> >
> > A few questions:
> >
> > 1)  I assume first that you are using Epetra_CrsMatrix and its
> > Multiply method.  Correct?
> >
> > 2) How are you importing the matrix into Epetra?
> >
> > 3) How are you compiling Trilinos?  Depending on how the matrix is
> > imported, the Multiply method is very sensitive to either the C++ or
> > the Fortran compiler optimization flags.
> >
> > Also, there is a performance optimization guide
> >
> > http://trilinos.sandia.gov/packages/epetra/EpetraPerformanceGuide.pdf
> >
> > Mike
> >
> >
> > On 2/28/08 11:11 AM, "Daniele Bettella" <jagfsdfhf at libero.it> wrote:
> >
> >     Hello, I have a strange performance problem on Epetra, using a
> simple
> >     multiply.
> >     I have two configurations for testing purposes, the first one is a
> >     personal laptop, single core duo T2300 with 1GB of RAM, the second
> one
> >     is a dual Xeon quad core E5345 with 16GB of RAM
> >
> >     I take as an example the matrix e30r5000 from matrix market. The
> >     matrix
> >     is imported into trilinos and multiplied via Epetra multiply
> >     function by
> >     a random vector, i take the time and i get 0.0148 seconds on my
> laptop
> >     and 0.00227 seconds on the dual Xeon; now, since I'm not the only
> one
> >     using the Xeon I wrote another test for comparison; this time I
> >     multiply
> >     "by hand" storing the matrix in crs, this is what I got:
> >     on my laptop - 0.00168 seconds
> >     on the Xeon - 0.00168 seconds
> >     so basically I have the same time on manual implementation, but
> >     trilinos
> >     is sensibly slower on the dual xeon.
> >     I prepared many more matrixes on which I tested the same program; on
> >     many of them the Xeon has better times performing the manual
> multiply
> >     then the laptop, but Trilinos multiply is always slower...
> >
> >     I understand it's quite hard to tell what the problem might be, but
> is
> >     there something I'm missing?
> >     I mean, I'm using just one core on each machine, but a core on my
> >     laptop
> >     is a 1.6Ghz, wether the Xeon goes at 2.33Ghz... 32 bit vs 64 bit,
> >     2MB vs
> >     8MB of cache... and it generally shows on the product I wrote...
> >     if anyone has any idea I'd be gratefull
> >
> >     thanks in advance and sorry for my english
> >
> >     Daniele
> >
> >     _______________________________________________
> >     Trilinos-Users mailing list
> >     Trilinos-Users at software.sandia.gov
> >     http://software.sandia.gov/mailman/listinfo/trilinos-users
> >
> >
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://software.sandia.gov/mailman/private/trilinos-users/attachments/20080229/fe595c1e/attachment.html