[Trilinos-Users] Performance of InsertGlobalValue() in Epetra

Fri Apr 18 09:52:28 MDT 2008

Thanks Andrea,

One more question: did you run your program in parallel (more than one MPI process) or serial (1 MPI process, or no MPI at all)?

This matters because in serial the FECrsMatrix class forwards all data straight through to its base class CrsMatrix. In parallel it does some work on each process temporarily storing "overlapping" matrix coefficients that belong on other processes.

Alan

> -----Original Message-----
> From: andrea3.villa at mail.polimi.it
> [mailto:andrea3.villa at mail.polimi.it]
> Sent: Friday, April 18, 2008 7:17 AM
> To: Williams, Alan B
> Cc: trilinos-users at software.sandia.gov
> Subject: RE: [Trilinos-Users] Performance of
> InsertGlobalValue() in Epetra
>
> I have used this function:
>
> int Epetra_FECrsMatrix::InsertGlobalValues  (  int  numRows,
>    const int *  rows,
>    int  numCols,
>    const int *  cols,
>    const double *const *  values,
>    int  format = Epetra_FECrsMatrix::ROW_MAJOR
>   )
>
> Thank you!
>
>
>
> Citando "Williams, Alan B" <william at sandia.gov>:
>
> > Andrea,
> >
> > That is a large and unexpected performance difference.
> >
> > I looked through the FECrsMatrix code and there is no function
> > called exactly 'InsertGlobalValue'. Do you mean one of the
> > overloaded 'InsertGlobalValues' functions? There are several of
> > these, both in FECrsMatrix and its base class CrsMatrix. Can you
> > tell which of these it is?
> >
> > I'll set up a test of my own and attempt to reproduce this.
> > Alan
> >
> >
> >> -----Original Message-----
> >> From: trilinos-users-bounces at software.sandia.gov
> >> [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf
> >> Of andrea3.villa at mail.polimi.it
> >> Sent: Thursday, April 17, 2008 8:17 AM
> >> To: trilinos-users at software.sandia.gov
> >> Subject: [Trilinos-Users] Performance of
> InsertGlobalValue() in Epetra
> >>
> >>   Hi,
> >>
> >> I've a problem assembling a FECrsMatrix, in particoular in the
> >> InsertGlobalValue() function. While on my laptop an
> assemble process
> >> of a 500.000 row square matrix takes about 50 seconds, on my
> >> workstation
> >> it takes about 2300 seconds. I've made a profile of the code
> >> I've found
> >> that the difference is inside the InsertGlobalValue() function.
> >>
> >> The laptop is:
> >> 2.5Ghz Intel T9300 (4Mb cache), 4Gb ram, with OS: Suse 10.3
> >>
> >> The worstation is:
> >> 2.0Ghz AMD Opteron 8212 (1Mb cache), 16Gb ram, with OS: Scientific
> >> Linux 5.0
> >>
> >> On both I've installed Trilinos 8.0.5, with this flags:
> >> CXXFLAGS="-DMPICH_SKIP_MPICXX -O3"
> >> CFLAGS="-O3"
> >> FFLAGS="-O5 -funroll-all-loops -ftree-vectorize"
> >>
> >> Have anyone experienced a similar problem?
> >> Any idea to close the gap?
> >>
> >> Thanks for any help,
> >>
> >>     Andrea V.
> >>
> >>
> >>
> >> _______________________________________________
> >> Trilinos-Users mailing list
> >> Trilinos-Users at software.sandia.gov
> >> http://software.sandia.gov/mailman/listinfo/trilinos-users
> >>
> >
> >
>
>
>
>
>
>