[Trilinos-Users] Sending the resulting eigenvector to master node

Bartlett, Roscoe A rabartl at sandia.gov
Fri Jun 4 09:31:47 MDT 2010


Mehmet,

Did you run valgrind or purify on the code?   If there is some memory usage error, the segfault you are seeing may have been caused by other code and the problem may now just be manifesting itself here.

If all of the code is not following the idioms described in

     http://www.cs.sandia.gov/~rabartl/TeuchosMemoryManagementSAND.pdf

then you can see these types of memory usage errors that result in undefined behavior (that often result in segfaults at some point).

This is the nature of C++ programming.

- Ross




From: trilinos-users-bounces at software.sandia.gov [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of Mehmet Salih YILDIRIM
Sent: Friday, June 04, 2010 6:26 AM
To: trilinos-users at software.sandia.gov
Subject: Re: [Trilinos-Users] Sending the resulting eigenvector to master node

I am sorry for my wrong comment and sending the email twice, BasicEigenproblem is actually pointed by a RCP pointer. However, as i told before, this segfault doesn't happen when i run the code, for example like this:

 if (0 && solverReturn == Anasazi::Converged) {
        Anasazi::Eigensolution<ST, MV> solution = generalizedProblem->getSolution();

        int numEigenPairs = solution.numVecs;
//and the rest of the code...


but BasicEigenproblem is nothing to do with this part of code. So I don't understand why.

Best regards,
Mehmet Salih YILDIRIM

On Fri, Jun 4, 2010 at 3:19 PM, Mehmet Salih YILDIRIM <linux at isadamlari.org<mailto:linux at isadamlari.org>> wrote:
Hello,

I tried to debug the program with gdb (I don't know how useful it would be since the program is supposed to be run with mpirun normally) and i got the following results when i backtrace:

(gdb) backtrace
#0  0x0000003f7b552c38 in main_arena () from /lib64/libc.so.6
#1  0x00000000004c8d8f in Epetra_MultiVector::~Epetra_MultiVector ()
#2  0x0000000000481765 in Teuchos::RCPNodeHandle::unbindOne ()
#3  0x0000000000432262 in ~BasicEigenproblem (this=0x18c0aeb0) at /usr/local/include/Teuchos_RCPNode.hpp:907
#4  0x0000000000481765 in Teuchos::RCPNodeHandle::unbindOne ()
#5  0x000000000040ee8e in DistNCut::Image::doSegmentate (this=0x18bef670, segment=0x7fff6ef8dfd0, output=0x18bef6a0)
    at /usr/local/include/Teuchos_RCPNode.hpp:907
#6  0x000000000040e898 in DistNCut::Image::doSegmentate (this=0x18bef670, segment=0x7fff6ef8e700, output=0x18bef6a0) at DistNCut.cpp:436
#7  0x000000000040fc38 in DistNCut::Image::segmentate (this=0x18bef670) at DistNCut.cpp:223
#8  0x0000000000474806 in main (argc=1, argv=0x7fff6ef8e8a8) at main.cpp:33
(gdb)

I don't know why the RCPNodeHandle::unbindOne() is invoked (and it calls the destructor of BasicEigenproblem surprisingly, even though BasicEigenproblem was not pointed by some RCP object.) from my function Image::doSegmentate since i didn't call it in my code.

Any ideas about why the segmentation fault happens after seeing this backtrace?
Any help is appreciated.

Best regards.
Mehmet Salih YILDIRIM

On Thu, Jun 3, 2010 at 8:07 PM, Mehmet Salih YILDIRIM <linux at isadamlari.org<mailto:linux at isadamlari.org>> wrote:
Hi,

Well, I commented out the Multiply method, however nothing changed.

I was unable to solve the eigenproblem before, so i decided to send a random vector to master node and go on, so, when i produce a random vector, everything works ok, but the problem is only when i try to send the resulting eigenvector, i'm sending the code snippet again, with the producing random vector part, so maybe it would be more clear:

    int numMyElements;

    if (Comm.MyPID() == 0)
        numMyElements = segment->pixelIndices.size();
    else
        numMyElements = 0;

    Epetra_Map targetMap(-1, numMyElements, 0, Comm);

    Epetra_Vector eigenVector(targetMap); //will be collected on the master node.

    Teuchos::RCP<Epetra_Vector> vectorToSend;

    const Epetra_BlockMap *sourceMap;
    if (solverReturn == Anasazi::Converged) {
        Anasazi::Eigensolution<ST, MV> solution = generalizedProblem->getSolution();

        int numEigenPairs = solution.numVecs;

        Teuchos::RCP<MV> evecs = solution.Evecs;
        std::vector<Anasazi::Value<ST> > evals = solution.Evals;

        int ourEvIdx = 0;
        if (numEigenPairs == 2) //send the second smaller eigenvector
            if (evals[0].realpart < evals[1].realpart)
                ourEvIdx = 1;
            else
                ourEvIdx = 0;

        //evecs->Multiply(1,*invSqrtD, *evecs, 0.0); commented this one out, nothing changed.
        vectorToSend = Teuchos::rcp((*evecs)(ourEvIdx));
        sourceMap = &vectorToSend->Map();
    } else {
        vectorToSend = Teuchos::rcp(new Epetra_Vector(DminusW->RowMap()));
        vectorToSend->Random();
        sourceMap = &vectorToSend->Map();
    Epetra_Import importer(targetMap, *sourceMap);
    eigenVector.Import(*vectorToSend, importer, Add); // send eigenvector to processor 1


Well, the error message is, as follows again:

*** glibc detected *** ./opencv_deneme: corrupted double-linked list: 0x000000001a254ec0 ***
*** glibc detected *** ./opencv_deneme: corrupted double-linked list: 0x000000001d68fa00 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3f7b2723e5]
/lib64/libc.so.6(cfree+0x4b)[0x3f7b27273b]
./opencv_deneme[0x4c930f]
./opencv_deneme[0x481ce5]
./opencv_deneme[0x4358b2]
./opencv_deneme[0x481ce5]
./opencv_deneme[0x40f139]
./opencv_deneme[0x40eacf]
./opencv_deneme[0x40eacf]
./opencv_deneme[0x4102f4]
./opencv_deneme[0x474d86]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3f7b21d994]
./opencv_deneme(__gxx_personality_v0+0x299)[0x40ba89]
======= Memory map: ========
00400000-0051e000 r-xp 00000000 03:01 9864086                            /home/proje/trilinos_ile_ilk_deneme/opencv deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
0071d000-00724000 rw-p 0011d000 03:01 9864086                            /home/proje/trilinos_ile_ilk_deneme/opencv deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
1d61a000-1d722000 rw-p 1d61a000 00:00 0                                  [heap]
3a68600000-3a689f6000 r-xp 00000000 03:01 6687839                        /usr/lib64/atlas/liblapack.so.3.0
3a689f6000-3a68bf6000 ---p 003f6000 03:01 6687839                        /usr/lib64/atlas/liblapack.so.3.0
3a68bf6000-3a68bf9000 rw-p 003f6000 03:01 6687839                        /usr/lib64/atlas/liblapack.so.3.0
3a68bf9000-3a68cfd000 rw-p 3a68bf9000 00:00 0
3a69600000-3a69cec000 r-xp 00000000 03:01 6687846                        /usr/lib64/atlas/libatlas.so.3.0
3a69cec000-3a69eeb000 ---p 006ec000 03:01 6687846                        /usr/lib64/atlas/libatlas.so.3.0
3a69eeb000-3a69ef5000 rw-p 006eb000 03:01 6687846                        /usr/lib64/atlas/libatlas.so.3.0
3a6c200000-3a6c254000 r-xp 00000000 03:01 5380294                        /usr/lib64/libblas.so.3.0.3
3a6c254000-3a6c453000 ---p 00054000 03:01 5380294                        /usr/lib64/libblas.so.3.0.3
3a6c453000-3a6c454000 rw-p 00053000 03:01 5380294                        /usr/lib64/libblas.so.3.0.3
3a6ca00000-3a6ca1e000 r-xp 00000000 03:01 6687833                        /usr/lib64/atlas/libcblas.so.3.0
3a6ca1e000-3a6cc1e000 ---p 0001e000 03:01 6687833                        /usr/lib64/atlas/libcblas.so.3.0
3a6cc1e000-3a6cc1f000 rw-p 0001e000 03:01 6687833                        /usr/lib64/atlas/libcblas.so.3.0
3a6ce00000-3a6ce1c000 r-xp 00000000 03:01 6687835                        /usr/lib64/atlas/libf77blas.so.3.0
3a6ce1c000-3a6d01c000 ---p 0001c000 03:01 6687835                        /usr/lib64/atlas/libf77blas.so.3.0

Any help still would be appreciated :)

Best regards,
Mehmet Salih YILDIRIM

On Thu, Jun 3, 2010 at 1:58 AM, Thornquist, Heidi K <hkthorn at sandia.gov<mailto:hkthorn at sandia.gov>> wrote:
Hi Mehmet,

If you are getting a seg fault that traces back to LAPACK, then that would incriminate the "Multiply" method.  If you comment out
that line, does that help?  Also, I'm not sure how safe it is to call multiply on the same multivector that you are including in the arguments and also
asking to be zeroed out.

Thanks,
Heidi




On 6/2/10 2:02 PM, "Mehmet Salih YILDIRIM" <linux at isadamlari.org<mailto:linux at isadamlari.org>> wrote:
Hey!

I'd been trying to send the resulting eigenvector to the master node (that is, with MyPID() =0) and i wrote the following code:
 int numMyElements;

    if (Comm.MyPID() == 0)
        numMyElements = segment->pixelIndices.size(); //that is, the total number of elements of an eigenvector (also the row count of operator of eigenproblem)
    else
        numMyElements = 0;

    Epetra_Map targetMap(-1, numMyElements, 0, Comm);

    Epetra_Vector eigenVector(targetMap); //will be collected on the master node.

    Teuchos::RCP<Epetra_Vector> vectorToSend;

    const Epetra_BlockMap *sourceMap;
    if (solverReturn == Anasazi::Converged) {
        Anasazi::Eigensolution<ST, MV> solution = generalizedProblem->getSolution();

        int numEigenPairs = solution.numVecs;

        Teuchos::RCP<MV> evecs = solution.Evecs;
        std::vector<Anasazi::Value<ST> > evals = solution.Evals;

        int ourEvIdx = 0;
        if (numEigenPairs == 2)
            if (evals[0].realpart < evals[1].realpart)
                ourEvIdx = 1;
        evecs->Multiply(1,*invSqrtD, *evecs, 0.0);
        vectorToSend = Teuchos::rcp((*evecs)(ourEvIdx));
        sourceMap = &vectorToSend->Map();
    }

    Epetra_Import importer(targetMap, *sourceMap);

    eigenVector.Import(*vectorToSend, importer, Add); // send eigenvector to processor 1

However i get a fault like : (something like segmentation fault i think)
*** glibc detected *** ./opencv_deneme: corrupted double-linked list: 0x000000002006cec0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3f7b2723e5]
/lib64/libc.so.6(cfree+0x4b)[0x3f7b27273b]
./opencv_deneme[0x4c976f]
./opencv_deneme[0x482145]
./opencv_deneme[0x42a292]
./opencv_deneme[0x482145]
./opencv_deneme[0x40f0be]
./opencv_deneme[0x40ea52]
./opencv_deneme[0x40ea52]
./opencv_deneme[0x410294]
./opencv_deneme[0x4751e6]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3f7b21d994]
./opencv_deneme(__gxx_personality_v0+0x299)[0x40b9d9]
======= Memory map: ========
00400000-0051e000 r-xp 00000000 03:01 9864086                            /home/proje/trilinos_ile_ilk_deneme/opencv deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
0071e000-00725000 rw-p 0011e000 03:01 9864086                            /home/proje/trilinos_ile_ilk_deneme/opencv deneme/dist/Debug/GNU-Linux-x86/opencv_deneme
1fff8000-20100000 rw-p 1fff8000 00:00 0                                  [heap]
3a68600000-3a689f6000 r-xp 00000000 03:01 6687839                        /usr/lib64/atlas/liblapack.so.3.0
3a689f6000-3a68bf6000 ---p 003f6000 03:01 6687839                        /usr/lib64/atlas/liblapack.so.3.0
3a68bf6000-3a68bf9000 rw-p 003f6000 03:01 6687839                        /usr/lib64/atlas/liblapack.so.3.0
3a68bf9000-3a68cfd000 rw-p 3a68bf9000 00:00 0
3a69600000-3a69cec000 r-xp 00000000 03:01 6687846                        /usr/lib64/atlas/libatlas.so.3.0
3a69cec000-3a69eeb000 ---p 006ec000 03:01 6687846                        /usr/lib64/atlas/libatlas.so.3.0
3a69eeb000-3a69ef5000 rw-p 006eb000 03:01 6687846                        /usr/lib64/atlas/libatlas.so.3.0
3a6c200000-3a6c254000 r-xp 00000000 03:01 5380294                        /usr/lib64/libblas.so.3.0.3
3a6c254000-3a6c453000 ---p 00054000 03:01 5380294                        /usr/lib64/libblas.so.3.0.3
3a6c453000-3a6c454000 rw-p 00053000 03:01 5380294                        /usr/lib64/libblas.so.3.0.3
3a6ca00000-3a6ca1e000 r-xp 00000000 03:01 6687833                        /usr/lib64/atlas/libcblas.so.3.0

and so on.


So how may i solve this problem? I couldn't figure out where I am mistaken.

Any help is appreciated,
Mehmet Salih YILDIRIM



_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>
http://software.sandia.gov/mailman/listinfo/trilinos-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://software.sandia.gov/pipermail/trilinos-users/attachments/20100604/556559dd/attachment-0001.html 


More information about the Trilinos-Users mailing list