[Trilinos-Users] Sending the resulting eigenvector to master node

Mehmet Salih YILDIRIM linux at isadamlari.org
Fri Jun 4 06:26:05 MDT 2010


I am sorry for my wrong comment and sending the email twice,
BasicEigenproblem is actually pointed by a RCP pointer. However, as i told
before, this segfault doesn't happen when i run the code, for example like
this:

 if (0 && solverReturn == Anasazi::Converged) {

        Anasazi::Eigensolution<ST, MV> solution =
> generalizedProblem->getSolution();


>         int numEigenPairs = solution.numVecs;

//and the rest of the code...



but BasicEigenproblem is nothing to do with this part of code. So I don't
understand why.

Best regards,
Mehmet Salih YILDIRIM


On Fri, Jun 4, 2010 at 3:19 PM, Mehmet Salih YILDIRIM
<linux at isadamlari.org>wrote:

> Hello,
>
> I tried to debug the program with gdb (I don't know how useful it would be
> since the program is supposed to be run with mpirun normally) and i got the
> following results when i backtrace:
>
> (gdb) backtrace
>
> #0  0x0000003f7b552c38 in main_arena () from /lib64/libc.so.6
>
> #1  0x00000000004c8d8f in Epetra_MultiVector::~Epetra_MultiVector ()
>
> #2  0x0000000000481765 in Teuchos::RCPNodeHandle::unbindOne ()
>
> #3  0x0000000000432262 in ~BasicEigenproblem (this=0x18c0aeb0) at
>> /usr/local/include/Teuchos_RCPNode.hpp:907
>
> #4  0x0000000000481765 in Teuchos::RCPNodeHandle::unbindOne ()
>
> #5  0x000000000040ee8e in DistNCut::Image::doSegmentate (this=0x18bef670,
>> segment=0x7fff6ef8dfd0, output=0x18bef6a0)
>
>     at /usr/local/include/Teuchos_RCPNode.hpp:907
>
> #6  0x000000000040e898 in DistNCut::Image::doSegmentate (this=0x18bef670,
>> segment=0x7fff6ef8e700, output=0x18bef6a0) at DistNCut.cpp:436
>
> #7  0x000000000040fc38 in DistNCut::Image::segmentate (this=0x18bef670) at
>> DistNCut.cpp:223
>
> #8  0x0000000000474806 in main (argc=1, argv=0x7fff6ef8e8a8) at main.cpp:33
>
> (gdb)
>
>
> I don't know why the RCPNodeHandle::unbindOne() is invoked (and it calls
> the destructor of BasicEigenproblem surprisingly, even though
> BasicEigenproblem was not pointed by some RCP object.) from my function
> Image::doSegmentate since i didn't call it in my code.
>
> Any ideas about why the segmentation fault happens after seeing this
> backtrace?
> Any help is appreciated.
>
> Best regards.
> Mehmet Salih YILDIRIM
>
> On Thu, Jun 3, 2010 at 8:07 PM, Mehmet Salih YILDIRIM <
> linux at isadamlari.org> wrote:
>
>> Hi,
>>
>> Well, I commented out the Multiply method, however nothing changed.
>>
>> I was unable to solve the eigenproblem before, so i decided to send a
>> random vector to master node and go on, so, when i produce a random vector,
>> everything works ok, but the problem is only when i try to send the
>> resulting eigenvector, i'm sending the code snippet again, with the
>> producing random vector part, so maybe it would be more clear:
>>
>>     int numMyElements;
>>
>>
>>>     if (Comm.MyPID() == 0)
>>
>>         numMyElements = segment->pixelIndices.size();
>>
>>     else
>>
>>         numMyElements = 0;
>>
>>
>>>     Epetra_Map targetMap(-1, numMyElements, 0, Comm);
>>
>>
>>>     Epetra_Vector eigenVector(targetMap); //will be collected on the
>>> master node.
>>
>>
>>>     Teuchos::RCP<Epetra_Vector> vectorToSend;
>>
>>
>>>     const Epetra_BlockMap *sourceMap;
>>
>>     if (solverReturn == Anasazi::Converged) {
>>
>>         Anasazi::Eigensolution<ST, MV> solution =
>>> generalizedProblem->getSolution();
>>
>>
>>>         int numEigenPairs = solution.numVecs;
>>
>>
>>>         Teuchos::RCP<MV> evecs = solution.Evecs;
>>
>>         std::vector<Anasazi::Value<ST> > evals = solution.Evals;
>>
>>
>>>         int ourEvIdx = 0;
>>
>>         if (numEigenPairs == 2) //send the second smaller eigenvector
>>
>>             if (evals[0].realpart < evals[1].realpart)
>>
>>                 ourEvIdx = 1;
>>
>>             else
>>
>>                 ourEvIdx = 0;
>>
>>
>>>         //evecs->Multiply(1,*invSqrtD, *evecs, 0.0); commented this one
>>> out, nothing changed.
>>
>>         vectorToSend = Teuchos::rcp((*evecs)(ourEvIdx));
>>
>>         sourceMap = &vectorToSend->Map();
>>
>>     } else {
>>
>>         vectorToSend = Teuchos::rcp(new Epetra_Vector(DminusW->RowMap()));
>>
>>         vectorToSend->Random();
>>
>>         sourceMap = &vectorToSend->Map();
>>
>>     Epetra_Import importer(targetMap, *sourceMap);
>>
>>     eigenVector.Import(*vectorToSend, importer, Add); // send eigenvector
>>> to processor 1
>>
>>
>>
>> Well, the error message is, as follows again:
>>
>> *** glibc detected *** ./opencv_deneme: corrupted double-linked list:
>>> 0x000000001a254ec0 ***
>>
>> *** glibc detected *** ./opencv_deneme: corrupted double-linked list:
>>> 0x000000001d68fa00 ***
>>
>> ======= Backtrace: =========
>>
>> /lib64/libc.so.6[0x3f7b2723e5]
>>
>> /lib64/libc.so.6(cfree+0x4b)[0x3f7b27273b]
>>
>> ./opencv_deneme[0x4c930f]
>>
>> ./opencv_deneme[0x481ce5]
>>
>> ./opencv_deneme[0x4358b2]
>>
>> ./opencv_deneme[0x481ce5]
>>
>> ./opencv_deneme[0x40f139]
>>
>> ./opencv_deneme[0x40eacf]
>>
>> ./opencv_deneme[0x40eacf]
>>
>> ./opencv_deneme[0x4102f4]
>>
>> ./opencv_deneme[0x474d86]
>>
>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f7b21d994]
>>
>> ./opencv_deneme(__gxx_personality_v0+0x299)[0x40ba89]
>>
>> ======= Memory map: ========
>>
>> 00400000-0051e000 r-xp 00000000 03:01 9864086
>>>  /home/proje/trilinos_ile_ilk_deneme/opencv
>>> deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
>>
>> 0071d000-00724000 rw-p 0011d000 03:01 9864086
>>>  /home/proje/trilinos_ile_ilk_deneme/opencv
>>> deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
>>
>> 1d61a000-1d722000 rw-p 1d61a000 00:00 0
>>>  [heap]
>>
>> 3a68600000-3a689f6000 r-xp 00000000 03:01 6687839
>>>  /usr/lib64/atlas/liblapack.so.3.0
>>
>> 3a689f6000-3a68bf6000 ---p 003f6000 03:01 6687839
>>>  /usr/lib64/atlas/liblapack.so.3.0
>>
>> 3a68bf6000-3a68bf9000 rw-p 003f6000 03:01 6687839
>>>  /usr/lib64/atlas/liblapack.so.3.0
>>
>> 3a68bf9000-3a68cfd000 rw-p 3a68bf9000 00:00 0
>>
>> 3a69600000-3a69cec000 r-xp 00000000 03:01 6687846
>>>  /usr/lib64/atlas/libatlas.so.3.0
>>
>> 3a69cec000-3a69eeb000 ---p 006ec000 03:01 6687846
>>>  /usr/lib64/atlas/libatlas.so.3.0
>>
>> 3a69eeb000-3a69ef5000 rw-p 006eb000 03:01 6687846
>>>  /usr/lib64/atlas/libatlas.so.3.0
>>
>> 3a6c200000-3a6c254000 r-xp 00000000 03:01 5380294
>>>  /usr/lib64/libblas.so.3.0.3
>>
>> 3a6c254000-3a6c453000 ---p 00054000 03:01 5380294
>>>  /usr/lib64/libblas.so.3.0.3
>>
>> 3a6c453000-3a6c454000 rw-p 00053000 03:01 5380294
>>>  /usr/lib64/libblas.so.3.0.3
>>
>> 3a6ca00000-3a6ca1e000 r-xp 00000000 03:01 6687833
>>>  /usr/lib64/atlas/libcblas.so.3.0
>>
>> 3a6ca1e000-3a6cc1e000 ---p 0001e000 03:01 6687833
>>>  /usr/lib64/atlas/libcblas.so.3.0
>>
>> 3a6cc1e000-3a6cc1f000 rw-p 0001e000 03:01 6687833
>>>  /usr/lib64/atlas/libcblas.so.3.0
>>
>> 3a6ce00000-3a6ce1c000 r-xp 00000000 03:01 6687835
>>>  /usr/lib64/atlas/libf77blas.so.3.0
>>
>> 3a6ce1c000-3a6d01c000 ---p 0001c000 03:01 6687835
>>>  /usr/lib64/atlas/libf77blas.so.3.0
>>
>>
>> Any help still would be appreciated :)
>>
>> Best regards,
>> Mehmet Salih YILDIRIM
>>
>> On Thu, Jun 3, 2010 at 1:58 AM, Thornquist, Heidi K <hkthorn at sandia.gov>wrote:
>>
>>>  Hi Mehmet,
>>>
>>> If you are getting a seg fault that traces back to LAPACK, then that
>>> would incriminate the “Multiply” method.  If you comment out
>>> that line, does that help?  Also, I’m not sure how safe it is to call
>>> multiply on the same multivector that you are including in the arguments and
>>> also
>>> asking to be zeroed out.
>>>
>>> Thanks,
>>> Heidi
>>>
>>>
>>>
>>>
>>> On 6/2/10 2:02 PM, "Mehmet Salih YILDIRIM" <linux at isadamlari.org> wrote:
>>>
>>> Hey!
>>>
>>> I'd been trying to send the resulting eigenvector to the master node
>>> (that is, with MyPID() =0) and i wrote the following code:
>>>
>>>  int numMyElements;
>>>
>>>     if (Comm.MyPID() == 0)
>>>         numMyElements = segment->pixelIndices.size(); //that is, the
>>> total number of elements of an eigenvector (also the row count of operator
>>> of eigenproblem)
>>>     else
>>>         numMyElements = 0;
>>>
>>>     Epetra_Map targetMap(-1, numMyElements, 0, Comm);
>>>
>>>     Epetra_Vector eigenVector(targetMap); //will be collected on the
>>> master node.
>>>
>>>     Teuchos::RCP<Epetra_Vector> vectorToSend;
>>>
>>>     const Epetra_BlockMap *sourceMap;
>>>     if (solverReturn == Anasazi::Converged) {
>>>         Anasazi::Eigensolution<ST, MV> solution =
>>> generalizedProblem->getSolution();
>>>
>>>         int numEigenPairs = solution.numVecs;
>>>
>>>         Teuchos::RCP<MV> evecs = solution.Evecs;
>>>         std::vector<Anasazi::Value<ST> > evals = solution.Evals;
>>>
>>>         int ourEvIdx = 0;
>>>         if (numEigenPairs == 2)
>>>             if (evals[0].realpart < evals[1].realpart)
>>>                 ourEvIdx = 1;
>>>         evecs->Multiply(1,*invSqrtD, *evecs, 0.0);
>>>         vectorToSend = Teuchos::rcp((*evecs)(ourEvIdx));
>>>         sourceMap = &vectorToSend->Map();
>>>     }
>>>
>>>     Epetra_Import importer(targetMap, *sourceMap);
>>>
>>>     eigenVector.Import(*vectorToSend, importer, Add); // send eigenvector
>>> to processor 1
>>>
>>>
>>> However i get a fault like : (something like segmentation fault i think)
>>>
>>> *** glibc detected *** ./opencv_deneme: corrupted double-linked list:
>>> 0x000000002006cec0 ***
>>> ======= Backtrace: =========
>>> /lib64/libc.so.6[0x3f7b2723e5]
>>> /lib64/libc.so.6(cfree+0x4b)[0x3f7b27273b]
>>> ./opencv_deneme[0x4c976f]
>>> ./opencv_deneme[0x482145]
>>> ./opencv_deneme[0x42a292]
>>> ./opencv_deneme[0x482145]
>>> ./opencv_deneme[0x40f0be]
>>> ./opencv_deneme[0x40ea52]
>>> ./opencv_deneme[0x40ea52]
>>> ./opencv_deneme[0x410294]
>>> ./opencv_deneme[0x4751e6]
>>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f7b21d994]
>>> ./opencv_deneme(__gxx_personality_v0+0x299)[0x40b9d9]
>>> ======= Memory map: ========
>>> 00400000-0051e000 r-xp 00000000 03:01 9864086
>>>  /home/proje/trilinos_ile_ilk_deneme/opencv
>>> deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
>>> 0071e000-00725000 rw-p 0011e000 03:01 9864086
>>>  /home/proje/trilinos_ile_ilk_deneme/opencv
>>> deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
>>> 1fff8000-20100000 rw-p 1fff8000 00:00 0
>>>  [heap]
>>> 3a68600000-3a689f6000 r-xp 00000000 03:01 6687839
>>>  /usr/lib64/atlas/liblapack.so.3.0
>>> 3a689f6000-3a68bf6000 ---p 003f6000 03:01 6687839
>>>  /usr/lib64/atlas/liblapack.so.3.0
>>> 3a68bf6000-3a68bf9000 rw-p 003f6000 03:01 6687839
>>>  /usr/lib64/atlas/liblapack.so.3.0
>>> 3a68bf9000-3a68cfd000 rw-p 3a68bf9000 00:00 0
>>> 3a69600000-3a69cec000 r-xp 00000000 03:01 6687846
>>>  /usr/lib64/atlas/libatlas.so.3.0
>>> 3a69cec000-3a69eeb000 ---p 006ec000 03:01 6687846
>>>  /usr/lib64/atlas/libatlas.so.3.0
>>> 3a69eeb000-3a69ef5000 rw-p 006eb000 03:01 6687846
>>>  /usr/lib64/atlas/libatlas.so.3.0
>>> 3a6c200000-3a6c254000 r-xp 00000000 03:01 5380294
>>>  /usr/lib64/libblas.so.3.0.3
>>> 3a6c254000-3a6c453000 ---p 00054000 03:01 5380294
>>>  /usr/lib64/libblas.so.3.0.3
>>> 3a6c453000-3a6c454000 rw-p 00053000 03:01 5380294
>>>  /usr/lib64/libblas.so.3.0.3
>>> 3a6ca00000-3a6ca1e000 r-xp 00000000 03:01 6687833
>>>  /usr/lib64/atlas/libcblas.so.3.0
>>>
>>> and so on.
>>>
>>>
>>>
>>> So how may i solve this problem? I couldn't figure out where I am
>>> mistaken.
>>>
>>> Any help is appreciated,
>>> Mehmet Salih YILDIRIM
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Trilinos-Users mailing list
>> Trilinos-Users at software.sandia.gov
>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://software.sandia.gov/pipermail/trilinos-users/attachments/20100604/61d1f143/attachment-0001.html 


More information about the Trilinos-Users mailing list