[Trilinos-Users] Sending the resulting eigenvector to master node

Mehmet Salih YILDIRIM linux at isadamlari.org
Fri Jun 4 11:43:33 MDT 2010


Thanks everybody for their replies, I have found the problem. In case
somebody else has encountered or will encounter the same issue, i would like
to write down the reason and how i solved it:

The problem was because of this section of code:

Teuchos::RCP<Epetra_Vector> vectorToSend;

//and if solver manager converges:
vectorToSend = Teuchos::rcp((*evecs)(ourEvIdx));

well, because of this, when vectorToSend's destructor is called, the
resulting eigenvector is *delete*d somehow because it was the only
Teuchos::RCP<Epetra_Vector> object which points to that vector, and after
when the BasicEigenProblem's destructor is called, it was deleted *again*.
Hence it was a double freeing problem (sometimes the error message indicated
double free also.) How i solved it was simple, I changed the type of
vectorToSend from Teuchos::RCP<Epetra_Vector> to Epetra_Vector * and this
way the resulting eigenvector was not deleted upon returning from the
function because no destructors are called. So the problem is solved.

Again, thanks everybody their help.

Best regards.


On Fri, Jun 4, 2010 at 6:31 PM, Bartlett, Roscoe A <rabartl at sandia.gov>wrote:

>  Mehmet,
>
>
>
> Did you run valgrind or purify on the code?   If there is some memory usage
> error, the segfault you are seeing may have been caused by other code and
> the problem may now just be manifesting itself here.
>
>
>
> If all of the code is not following the idioms described in
>
>
>
>      http://www.cs.sandia.gov/~rabartl/TeuchosMemoryManagementSAND.pdf
>
>
>
> then you can see these types of memory usage errors that result in
> undefined behavior (that often result in segfaults at some point).
>
>
>
> This is the nature of C++ programming.
>
>
>
> - Ross
>
>
>
>
>
>
>
>
>
> *From:* trilinos-users-bounces at software.sandia.gov [mailto:
> trilinos-users-bounces at software.sandia.gov] *On Behalf Of *Mehmet Salih
> YILDIRIM
> *Sent:* Friday, June 04, 2010 6:26 AM
>
> *To:* trilinos-users at software.sandia.gov
> *Subject:* Re: [Trilinos-Users] Sending the resulting eigenvector to
> master node
>
>
>
> I am sorry for my wrong comment and sending the email twice,
> BasicEigenproblem is actually pointed by a RCP pointer. However, as i told
> before, this segfault doesn't happen when i run the code, for example like
> this:
>
>
>
>  if (0 && solverReturn == Anasazi::Converged) {
>
>          Anasazi::Eigensolution<ST, MV> solution =
> generalizedProblem->getSolution();
>
>
>
>          int numEigenPairs = solution.numVecs;
>
>  //and the rest of the code...
>
>
>
>
>
> but BasicEigenproblem is nothing to do with this part of code. So I don't
> understand why.
>
>
>
> Best regards,
>
> Mehmet Salih YILDIRIM
>
>
>
> On Fri, Jun 4, 2010 at 3:19 PM, Mehmet Salih YILDIRIM <
> linux at isadamlari.org> wrote:
>
> Hello,
>
>
>
> I tried to debug the program with gdb (I don't know how useful it would be
> since the program is supposed to be run with mpirun normally) and i got the
> following results when i backtrace:
>
>
>
> (gdb) backtrace
>
>  #0  0x0000003f7b552c38 in main_arena () from /lib64/libc.so.6
>
>  #1  0x00000000004c8d8f in Epetra_MultiVector::~Epetra_MultiVector ()
>
>  #2  0x0000000000481765 in Teuchos::RCPNodeHandle::unbindOne ()
>
>  #3  0x0000000000432262 in ~BasicEigenproblem (this=0x18c0aeb0) at
> /usr/local/include/Teuchos_RCPNode.hpp:907
>
>  #4  0x0000000000481765 in Teuchos::RCPNodeHandle::unbindOne ()
>
>  #5  0x000000000040ee8e in DistNCut::Image::doSegmentate (this=0x18bef670,
> segment=0x7fff6ef8dfd0, output=0x18bef6a0)
>
>      at /usr/local/include/Teuchos_RCPNode.hpp:907
>
>  #6  0x000000000040e898 in DistNCut::Image::doSegmentate (this=0x18bef670,
> segment=0x7fff6ef8e700, output=0x18bef6a0) at DistNCut.cpp:436
>
>  #7  0x000000000040fc38 in DistNCut::Image::segmentate (this=0x18bef670)
> at DistNCut.cpp:223
>
>  #8  0x0000000000474806 in main (argc=1, argv=0x7fff6ef8e8a8) at
> main.cpp:33
>
>  (gdb)
>
>
>
> I don't know why the RCPNodeHandle::unbindOne() is invoked (and it calls
> the destructor of BasicEigenproblem surprisingly, even though
> BasicEigenproblem was not pointed by some RCP object.) from my function
> Image::doSegmentate since i didn't call it in my code.
>
>
>
> Any ideas about why the segmentation fault happens after seeing this
> backtrace?
>
> Any help is appreciated.
>
>
>
> Best regards.
>
> Mehmet Salih YILDIRIM
>
>
>
> On Thu, Jun 3, 2010 at 8:07 PM, Mehmet Salih YILDIRIM <
> linux at isadamlari.org> wrote:
>
>   Hi,
>
>
>
> Well, I commented out the Multiply method, however nothing changed.
>
>
>
> I was unable to solve the eigenproblem before, so i decided to send a
> random vector to master node and go on, so, when i produce a random vector,
> everything works ok, but the problem is only when i try to send the
> resulting eigenvector, i'm sending the code snippet again, with the
> producing random vector part, so maybe it would be more clear:
>
>
>
>     int numMyElements;
>
>
>
>      if (Comm.MyPID() == 0)
>
>          numMyElements = segment->pixelIndices.size();
>
>       else
>
>           numMyElements = 0;
>
>
>
>      Epetra_Map targetMap(-1, numMyElements, 0, Comm);
>
>
>
>      Epetra_Vector eigenVector(targetMap); //will be collected on the
> master node.
>
>
>
>      Teuchos::RCP<Epetra_Vector> vectorToSend;
>
>
>
>      const Epetra_BlockMap *sourceMap;
>
>      if (solverReturn == Anasazi::Converged) {
>
>          Anasazi::Eigensolution<ST, MV> solution =
> generalizedProblem->getSolution();
>
>
>
>          int numEigenPairs = solution.numVecs;
>
>
>
>          Teuchos::RCP<MV> evecs = solution.Evecs;
>
>          std::vector<Anasazi::Value<ST> > evals = solution.Evals;
>
>
>
>          int ourEvIdx = 0;
>
>           if (numEigenPairs == 2) //send the second smaller eigenvector
>
>              if (evals[0].realpart < evals[1].realpart)
>
>                  ourEvIdx = 1;
>
>              else
>
>                  ourEvIdx = 0;
>
>
>
>          //evecs->Multiply(1,*invSqrtD, *evecs, 0.0); commented this one
> out, nothing changed.
>
>           vectorToSend = Teuchos::rcp((*evecs)(ourEvIdx));
>
>          sourceMap = &vectorToSend->Map();
>
>      } else {
>
>          vectorToSend = Teuchos::rcp(new
> Epetra_Vector(DminusW->RowMap()));
>
>          vectorToSend->Random();
>
>           sourceMap = &vectorToSend->Map();
>
>      Epetra_Import importer(targetMap, *sourceMap);
>
>      eigenVector.Import(*vectorToSend, importer, Add); // send eigenvector
> to processor 1
>
>
>
>
>
> Well, the error message is, as follows again:
>
>
>
> *** glibc detected *** ./opencv_deneme: corrupted double-linked list:
> 0x000000001a254ec0 ***
>
>  *** glibc detected *** ./opencv_deneme: corrupted double-linked list:
> 0x000000001d68fa00 ***
>
>   ======= Backtrace: =========
>
>  /lib64/libc.so.6[0x3f7b2723e5]
>
>  /lib64/libc.so.6(cfree+0x4b)[0x3f7b27273b]
>
>   ./opencv_deneme[0x4c930f]
>
>  ./opencv_deneme[0x481ce5]
>
>  ./opencv_deneme[0x4358b2]
>
>  ./opencv_deneme[0x481ce5]
>
>  ./opencv_deneme[0x40f139]
>
>  ./opencv_deneme[0x40eacf]
>
>  ./opencv_deneme[0x40eacf]
>
>  ./opencv_deneme[0x4102f4]
>
>  ./opencv_deneme[0x474d86]
>
>   /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f7b21d994]
>
>  ./opencv_deneme(__gxx_personality_v0+0x299)[0x40ba89]
>
>   ======= Memory map: ========
>
>  00400000-0051e000 r-xp 00000000 03:01 9864086
>  /home/proje/trilinos_ile_ilk_deneme/opencv
> deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
>
>   0071d000-00724000 rw-p 0011d000 03:01 9864086
>  /home/proje/trilinos_ile_ilk_deneme/opencv
> deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
>
>  1d61a000-1d722000 rw-p 1d61a000 00:00 0
>  [heap]
>
>   3a68600000-3a689f6000 r-xp 00000000 03:01 6687839
>  /usr/lib64/atlas/liblapack.so.3.0
>
>  3a689f6000-3a68bf6000 ---p 003f6000 03:01 6687839
>  /usr/lib64/atlas/liblapack.so.3.0
>
>  3a68bf6000-3a68bf9000 rw-p 003f6000 03:01 6687839
>  /usr/lib64/atlas/liblapack.so.3.0
>
>  3a68bf9000-3a68cfd000 rw-p 3a68bf9000 00:00 0
>
>  3a69600000-3a69cec000 r-xp 00000000 03:01 6687846
>  /usr/lib64/atlas/libatlas.so.3.0
>
>  3a69cec000-3a69eeb000 ---p 006ec000 03:01 6687846
>  /usr/lib64/atlas/libatlas.so.3.0
>
>  3a69eeb000-3a69ef5000 rw-p 006eb000 03:01 6687846
>  /usr/lib64/atlas/libatlas.so.3.0
>
>  3a6c200000-3a6c254000 r-xp 00000000 03:01 5380294
>  /usr/lib64/libblas.so.3.0.3
>
>  3a6c254000-3a6c453000 ---p 00054000 03:01 5380294
>  /usr/lib64/libblas.so.3.0.3
>
>  3a6c453000-3a6c454000 rw-p 00053000 03:01 5380294
>  /usr/lib64/libblas.so.3.0.3
>
>  3a6ca00000-3a6ca1e000 r-xp 00000000 03:01 6687833
>  /usr/lib64/atlas/libcblas.so.3.0
>
>   3a6ca1e000-3a6cc1e000 ---p 0001e000 03:01 6687833
>  /usr/lib64/atlas/libcblas.so.3.0
>
>  3a6cc1e000-3a6cc1f000 rw-p 0001e000 03:01 6687833
>  /usr/lib64/atlas/libcblas.so.3.0
>
>  3a6ce00000-3a6ce1c000 r-xp 00000000 03:01 6687835
>  /usr/lib64/atlas/libf77blas.so.3.0
>
>  3a6ce1c000-3a6d01c000 ---p 0001c000 03:01 6687835
>  /usr/lib64/atlas/libf77blas.so.3.0
>
>
>
> Any help still would be appreciated :)
>
>
>
> Best regards,
>
> Mehmet Salih YILDIRIM
>
>
>
> On Thu, Jun 3, 2010 at 1:58 AM, Thornquist, Heidi K <hkthorn at sandia.gov>
> wrote:
>
>  Hi Mehmet,
>
> If you are getting a seg fault that traces back to LAPACK, then that would
> incriminate the “Multiply” method.  If you comment out
> that line, does that help?  Also, I’m not sure how safe it is to call
> multiply on the same multivector that you are including in the arguments and
> also
> asking to be zeroed out.
>
> Thanks,
> Heidi
>
>
>
>
>
> On 6/2/10 2:02 PM, "Mehmet Salih YILDIRIM" <linux at isadamlari.org> wrote:
>
> Hey!
>
> I'd been trying to send the resulting eigenvector to the master node (that
> is, with MyPID() =0) and i wrote the following code:
>
>  int numMyElements;
>
>     if (Comm.MyPID() == 0)
>         numMyElements = segment->pixelIndices.size(); //that is, the total
> number of elements of an eigenvector (also the row count of operator of
> eigenproblem)
>     else
>         numMyElements = 0;
>
>     Epetra_Map targetMap(-1, numMyElements, 0, Comm);
>
>     Epetra_Vector eigenVector(targetMap); //will be collected on the master
> node.
>
>     Teuchos::RCP<Epetra_Vector> vectorToSend;
>
>     const Epetra_BlockMap *sourceMap;
>     if (solverReturn == Anasazi::Converged) {
>         Anasazi::Eigensolution<ST, MV> solution =
> generalizedProblem->getSolution();
>
>         int numEigenPairs = solution.numVecs;
>
>         Teuchos::RCP<MV> evecs = solution.Evecs;
>         std::vector<Anasazi::Value<ST> > evals = solution.Evals;
>
>         int ourEvIdx = 0;
>         if (numEigenPairs == 2)
>             if (evals[0].realpart < evals[1].realpart)
>                 ourEvIdx = 1;
>         evecs->Multiply(1,*invSqrtD, *evecs, 0.0);
>         vectorToSend = Teuchos::rcp((*evecs)(ourEvIdx));
>         sourceMap = &vectorToSend->Map();
>     }
>
>     Epetra_Import importer(targetMap, *sourceMap);
>
>     eigenVector.Import(*vectorToSend, importer, Add); // send eigenvector
> to processor 1
>
>
> However i get a fault like : (something like segmentation fault i think)
>
> *** glibc detected *** ./opencv_deneme: corrupted double-linked list:
> 0x000000002006cec0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3f7b2723e5]
> /lib64/libc.so.6(cfree+0x4b)[0x3f7b27273b]
> ./opencv_deneme[0x4c976f]
> ./opencv_deneme[0x482145]
> ./opencv_deneme[0x42a292]
> ./opencv_deneme[0x482145]
> ./opencv_deneme[0x40f0be]
> ./opencv_deneme[0x40ea52]
> ./opencv_deneme[0x40ea52]
> ./opencv_deneme[0x410294]
> ./opencv_deneme[0x4751e6]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f7b21d994]
> ./opencv_deneme(__gxx_personality_v0+0x299)[0x40b9d9]
> ======= Memory map: ========
> 00400000-0051e000 r-xp 00000000 03:01 9864086
>  /home/proje/trilinos_ile_ilk_deneme/opencv
> deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
> 0071e000-00725000 rw-p 0011e000 03:01 9864086
>  /home/proje/trilinos_ile_ilk_deneme/opencv
> deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
> 1fff8000-20100000 rw-p 1fff8000 00:00 0
>  [heap]
> 3a68600000-3a689f6000 r-xp 00000000 03:01 6687839
>  /usr/lib64/atlas/liblapack.so.3.0
> 3a689f6000-3a68bf6000 ---p 003f6000 03:01 6687839
>  /usr/lib64/atlas/liblapack.so.3.0
> 3a68bf6000-3a68bf9000 rw-p 003f6000 03:01 6687839
>  /usr/lib64/atlas/liblapack.so.3.0
> 3a68bf9000-3a68cfd000 rw-p 3a68bf9000 00:00 0
> 3a69600000-3a69cec000 r-xp 00000000 03:01 6687846
>  /usr/lib64/atlas/libatlas.so.3.0
> 3a69cec000-3a69eeb000 ---p 006ec000 03:01 6687846
>  /usr/lib64/atlas/libatlas.so.3.0
> 3a69eeb000-3a69ef5000 rw-p 006eb000 03:01 6687846
>  /usr/lib64/atlas/libatlas.so.3.0
> 3a6c200000-3a6c254000 r-xp 00000000 03:01 5380294
>  /usr/lib64/libblas.so.3.0.3
> 3a6c254000-3a6c453000 ---p 00054000 03:01 5380294
>  /usr/lib64/libblas.so.3.0.3
> 3a6c453000-3a6c454000 rw-p 00053000 03:01 5380294
>  /usr/lib64/libblas.so.3.0.3
> 3a6ca00000-3a6ca1e000 r-xp 00000000 03:01 6687833
>  /usr/lib64/atlas/libcblas.so.3.0
>
> and so on.
>
>
>
> So how may i solve this problem? I couldn't figure out where I am
> mistaken.
>
> Any help is appreciated,
> Mehmet Salih YILDIRIM
>
>
>
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://software.sandia.gov/pipermail/trilinos-users/attachments/20100604/62289246/attachment-0001.html 


More information about the Trilinos-Users mailing list