[Trilinos-Users] Epetra with MPT

SungHwan Choi sunghwanchoi91 at gmail.com
Tue Jul 5 23:02:16 EDT 2016


Dear all,
I tried to run Anasazi with Epetra library on SGI machine.
MPT is a specialized mpi on SGI machine. During a diagonalization, mpt gave
a error message. It is a fail of not my program but Epetra communication.
More wired situation is that the first few eigenvalue iteration works fine
but at certain point program shut down.
If anyone have some hints, please let me know. I don't know this is problem
of MPT or Epetra. I have no experiences for this situation.
Sincerely
Sunghwan Choi

MPT ERROR: rank:2, function:MPI_ALLREDUCE, Message truncated on receive:
sender sent too much data
MPT ERROR: rank:1, function:MPI_ALLREDUCE, Message truncated on receive:
sender sent too much data
MPT: Global rank 2 is aborting with error code 0.
     Process ID: 7497, Host: r8i0n1, Program:
/home/shchoi/program/ACE-Molecule.0626/ace

MPT: --------stack traceback-------
MPT: Global rank 1 is aborting with error code 0.
     Process ID: 10053, Host: r8i0n0, Program:
/home/shchoi/program/ACE-Molecule.0626/ace

MPT: --------stack traceback-------
MPT: Attaching to program: /proc/7497/exe, process 7497
MPT: Try: zypper install -C
"debuginfo(build-id)=1866bf73eac40a9b336d493c5620d2611ef06ed9"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=f0721cb50ab9fbdf06314a53bff5af581bbefe64"
MPT: (no debugging symbols found)...done.
MPT: done.
MPT: Try: zypper install -C
"debuginfo(build-id)=2f51a06469a025d507534fe292dcf4e02235bd18"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=3b149eccd897f1f37dce50ad22614043eba757a2"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=48172710254f4e2549684d7d3e9f9622272d6c66"
MPT: (no debugging symbols found)...done.
MPT: [New LWP 7522]
MPT: [New LWP 7520]
MPT: [New LWP 7518]
MPT: [New LWP 7516]
MPT: [New LWP 7514]
MPT: [New LWP 7512]
MPT: [New LWP 7510]
MPT: [New LWP 7508]
MPT: [New LWP 7506]
MPT: [New LWP 7503]
MPT: [New LWP 7501]
MPT: [New LWP 7499]
MPT: [Thread debugging using libthread_db enabled]
MPT: Using host libthread_db library "/lib64/libthread_db.so.1".
MPT: Try: zypper install -C
"debuginfo(build-id)=e2cab3c95cb1189420734b4af264b047355be2e5"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=732292820e69f70459cb927ade5b49bc56d32b0f"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=9fdc592b21682a31f460f6f043f50eea8c8b6821"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=bf68e74bb76519b8748d888d18b5d3b2c0b58593"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=e1a13ecb56367b69b89d1c9ca1a4c42167336030"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=b9520e6d84e0c308008b8f365e221fe8b4414043"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=64c277cdb888f64f47291cb237fc7b5dc4c0dac4"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=8b72ed29ee8ae44b89251097bf571d29b0438d04"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=a9c9a6309b08729440240fb0b19855a3cc5eef72"
MPT: (no debugging symbols found)...done.
MPT: 0x00002aaab3ca93bf in waitpid () from /lib64/libpthread.so.0
MPT: (gdb) #0  0x00002aaab3ca93bf in waitpid () from /lib64/libpthread.so.0
MPT: #1  0x00002aaab31bc44c in mpi_sgi_system (command=<optimized out>) at
sig.c:97
MPT: #2  MPI_SGI_stacktraceback (header=<optimized out>) at sig.c:281
MPT: #3  0x00002aaab311804a in print_traceback (ecode=0) at abort.c:176
MPT: #4  0x00002aaab311839c in MPI_SGI_abort () at abort.c:87
MPT: #5  0x00002aaab314dc82 in errors_are_fatal (comm=<optimized out>,
MPT:     code=<optimized out>) at errhandler.c:224
MPT: #6  0x00002aaab314dfe3 in MPI_SGI_error (comm=6, code=15) at
errhandler.c:60
MPT: #7  0x00002aaab31b63da in MPI_SGI_request_test
(request=0x7fffffff6c64,
MPT:     status=0x2aaab344cbd0 <mpi_sgi_status_ignore>, set=0x7fffffff6c60,
MPT:     rc=0x7fffffff6c5c) at req.c:1450
MPT: #8  0x00002aaab31b6441 in MPI_SGI_request_wait
(request=0x7fffffff6c64,
MPT:     status=0x2aaab344cbd0 <mpi_sgi_status_ignore>, set=0x7fffffff6c60,
MPT:     gen_rc=0x7fffffff6c5c) at req.c:1711
MPT: #9  0x00002aaab31c01fd in MPI_SGI_recv (buf=<optimized out>,
MPT:     count=<optimized out>, type=<optimized out>, des=<optimized out>,
MPT:     tag=<optimized out>, comm=<optimized out>,
MPT:     status=0x2aaab344cbd0 <mpi_sgi_status_ignore>) at sugar.c:40
MPT: #10 0x00002aaab311f6c8 in allreduce_idoub (sendbuf=0x7fffffff6f50,
MPT:     recvbuf=0x7fffffff7184, count=1, type=3, op=<optimized out>,
comm=6)
MPT:     at allreduce.c:650
MPT: #11 0x00002aaab311fd48 in MPI_SGI_allreduce (sendbuf=0x7fffffff6f50,
MPT:     recvbuf=0x7fffffff7184, count=1, type=3, op=2, comm=6) at
allreduce.c:513
MPT: #12 0x00002aaab31204d1 in MPI_SGI_allreduce (sendbuf=0x7fffffff7180,
MPT:     recvbuf=0x7fffffff7184, count=1, type=3, op=2, comm=1) at
allreduce.c:451
MPT: Attaching to program: /proc/10053/exe, process 10053
MPT: Try: zypper install -C
"debuginfo(build-id)=1866bf73eac40a9b336d493c5620d2611ef06ed9"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=f0721cb50ab9fbdf06314a53bff5af581bbefe64"
MPT: (no debugging symbols found)...done.
MPT: done.
MPT: Try: zypper install -C
"debuginfo(build-id)=2f51a06469a025d507534fe292dcf4e02235bd18"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=3b149eccd897f1f37dce50ad22614043eba757a2"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=48172710254f4e2549684d7d3e9f9622272d6c66"
MPT: (no debugging symbols found)...done.
MPT: [New LWP 10077]
MPT: [New LWP 10075]
MPT: [New LWP 10073]
MPT: [New LWP 10071]
MPT: [New LWP 10069]
MPT: [New LWP 10067]
MPT: [New LWP 10065]
MPT: [New LWP 10063]
MPT: [New LWP 10061]
MPT: [New LWP 10059]
MPT: [New LWP 10058]
MPT: [New LWP 10056]
MPT: [Thread debugging using libthread_db enabled]
MPT: Using host libthread_db library "/lib64/libthread_db.so.1".
MPT: Try: zypper install -C
"debuginfo(build-id)=e2cab3c95cb1189420734b4af264b047355be2e5"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=732292820e69f70459cb927ade5b49bc56d32b0f"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=9fdc592b21682a31f460f6f043f50eea8c8b6821"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=bf68e74bb76519b8748d888d18b5d3b2c0b58593"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=e1a13ecb56367b69b89d1c9ca1a4c42167336030"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=b9520e6d84e0c308008b8f365e221fe8b4414043"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=64c277cdb888f64f47291cb237fc7b5dc4c0dac4"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=8b72ed29ee8ae44b89251097bf571d29b0438d04"
MPT: (no debugging symbols found)...done.
MPT: Try: zypper install -C
"debuginfo(build-id)=a9c9a6309b08729440240fb0b19855a3cc5eef72"
MPT: (no debugging symbols found)...done.
MPT: 0x00002aaab3ca93bf in waitpid () from /lib64/libpthread.so.0
MPT: (gdb) #0  0x00002aaab3ca93bf in waitpid () from /lib64/libpthread.so.0
MPT: #1  0x00002aaab31bc44c in mpi_sgi_system (command=<optimized out>) at
sig.c:97
MPT: #2  MPI_SGI_stacktraceback (header=<optimized out>) at sig.c:281
MPT: #3  0x00002aaab311804a in print_traceback (ecode=0) at abort.c:176
MPT: #4  0x00002aaab311839c in MPI_SGI_abort () at abort.c:87
MPT: #5  0x00002aaab314dc82 in errors_are_fatal (comm=<optimized out>,
MPT:     code=<optimized out>) at errhandler.c:224
MPT: #6  0x00002aaab314dfe3 in MPI_SGI_error (comm=5, code=15) at
errhandler.c:60
MPT: #7  0x00002aaab31b63da in MPI_SGI_request_test
(request=0x7fffffff6e74,
MPT:     status=0x2aaab344cbd0 <mpi_sgi_status_ignore>, set=0x7fffffff6e70,
MPT:     rc=0x7fffffff6e6c) at req.c:1450
MPT: #8  0x00002aaab31b6441 in MPI_SGI_request_wait
(request=0x7fffffff6e74,
MPT:     status=0x2aaab344cbd0 <mpi_sgi_status_ignore>, set=0x7fffffff6e70,
MPT:     gen_rc=0x7fffffff6e6c) at req.c:1711
MPT: #9  0x00002aaab31c01fd in MPI_SGI_recv (buf=<optimized out>,
MPT:     count=<optimized out>, type=<optimized out>, des=<optimized out>,
MPT:     tag=<optimized out>, comm=<optimized out>,
MPT:     status=0x2aaab344cbd0 <mpi_sgi_status_ignore>) at sugar.c:40
MPT: #10 0x00002aaab312aed7 in MPI_SGI_bcast_basic (comm=<optimized out>,
MPT:     root=<optimized out>, type=<optimized out>, count=<optimized out>,
MPT:     buffer=<optimized out>) at bcast.c:242
MPT: #11 MPI_SGI_bcast (buffer=0x7fffffff7184, count=1, type=3, root=0,
comm=5)
MPT:     at bcast.c:397
MPT: #12 0x00002aaab311fe66 in MPI_SGI_allreduce (sendbuf=0x7fffffff7180,
MPT:     recvbuf=0x7fffffff7184, count=1, type=3, op=2, comm=1) at
allreduce.c:461
MPT: #13 0x00002aaab3120e58 in PMPI_Allreduce (sendbuf=0x7fffffff7180,
MPT:     recvbuf=0x7fffffff7184, count=0, type=3, op=2, comm=1) at
allreduce.c:97
MPT: #14 0x00002aaaaca9c35f in Epetra_MpiComm::MinAll(int*, int*, int)
const ()
MPT:    from /home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #15 0x00002aaaac9c3930 in Epetra_BlockMap::IsDistributedGlobal(long
long, int) const () from
/home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #16 0x00002aaaac9c4468 in Epetra_BlockMap::ConstructUserLinear(long
long, int, int, long long, Epetra_Comm const&, bool) ()
MPT:    from /home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #17 0x00002aaaac9c435b in Epetra_BlockMap::Epetra_BlockMap(int, int,
int, int, Epetra_Comm const&) ()
MPT:    from /home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #18 0x00002aaaaca59148 in Epetra_Map::Epetra_Map(int, int, int,
Epetra_Comm const&) () from
/home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #19 0x00002aaaaca58ba6 in Epetra_LocalMap::Epetra_LocalMap(int, int,
Epetra_Comm const&) () from
/home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #20 0x0000000000572c6a in Anasazi::MultiVecTraits<double,
Epetra_MultiVector>::MvTimesMatAddMv (alpha=0, A=..., B=...,
beta=9.532824124368238e-130, mv=...)
MPT:     at
/home/shchoi/program/trilinos-12.6.3/include/AnasaziEpetraAdapter.hpp:1038
MPT: #21 0x000000000056a91a in Anasazi::LOBPCG<double, Epetra_MultiVector,
Epetra_Operator>::iterate (this=0x1d8d)
MPT:     at
/home/shchoi/program/trilinos-12.6.3/include/AnasaziLOBPCG.hpp:1870
MPT: #22 0x00000000004e9b12 in Anasazi::modifiedLOBPCGSolMgr<double,
Epetra_MultiVector, Epetra_Operator>::solve (this=0x1d8d, Ritz_vectors=...,
Ritz_values=...)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Core/Diagonalize/AnasazimodifiedLOBPCGSolMgr.hpp:1120
MPT: #23 0x0000000000495e2e in Pure_Diagonalize::diagonalize (this=0x1d8d,
MPT:     matrix=..., numev=0,
MPT:     initial_eigenvector=<error reading variable: Cannot access memory
at address 0xffffffffffffffff>,
MPT:     overlap_matrix=<error reading variable: Cannot access memory at
address 0x0>)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Core/Diagonalize/Pure_Diagonalize.cpp:163
MPT: #24 0x0000000000691f9b in Scf::iterate (this=0x1d8d,
initial_state=...,
MPT:     final_state=...)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Compute/Scf.cpp:335
MPT: #25 0x000000000068ee1b in Scf::compute (this=0x1d8d, mesh=...,
states=...)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Compute/Scf.cpp:133
MPT: #26 0x0000000000686109 in main (argc=7565, argv=0x7fffffff666c)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Compute/main.cpp:363
MPT: (gdb) A debugging session is active.
MPT:
MPT:  Inferior 1 [process 7497] will be detached.
MPT:
MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
MPT: Detaching from program: /proc/7497/exe, process 7497
MPT: #13 0x00002aaab3120e58 in PMPI_Allreduce (sendbuf=0x7fffffff7180,
MPT:     recvbuf=0x7fffffff7184, count=0, type=3, op=2, comm=1) at
allreduce.c:97
MPT: #14 0x00002aaaaca9c35f in Epetra_MpiComm::MinAll(int*, int*, int)
const ()
MPT:    from /home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #15 0x00002aaaac9c3930 in Epetra_BlockMap::IsDistributedGlobal(long
long, int) const () from
/home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #16 0x00002aaaac9c4468 in Epetra_BlockMap::ConstructUserLinear(long
long, int, int, long long, Epetra_Comm const&, bool) ()
MPT:    from /home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #17 0x00002aaaac9c435b in Epetra_BlockMap::Epetra_BlockMap(int, int,
int, int, Epetra_Comm const&) ()
MPT:    from /home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #18 0x00002aaaaca59148 in Epetra_Map::Epetra_Map(int, int, int,
Epetra_Comm const&) () from
/home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #19 0x00002aaaaca58ba6 in Epetra_LocalMap::Epetra_LocalMap(int, int,
Epetra_Comm const&) () from
/home/shchoi/program/trilinos-12.6.3/lib/libepetra.so.12
MPT: #20 0x0000000000572c6a in Anasazi::MultiVecTraits<double,
Epetra_MultiVector>::MvTimesMatAddMv (alpha=0, A=..., B=...,
beta=9.532824124368238e-130, mv=...)
MPT:     at
/home/shchoi/program/trilinos-12.6.3/include/AnasaziEpetraAdapter.hpp:1038
MPT: #21 0x000000000056a91a in Anasazi::LOBPCG<double, Epetra_MultiVector,
Epetra_Operator>::iterate (this=0x2787)
MPT:     at
/home/shchoi/program/trilinos-12.6.3/include/AnasaziLOBPCG.hpp:1870
MPT: #22 0x00000000004e9b12 in Anasazi::modifiedLOBPCGSolMgr<double,
Epetra_MultiVector, Epetra_Operator>::solve (this=0x2787, Ritz_vectors=...,
Ritz_values=...)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Core/Diagonalize/AnasazimodifiedLOBPCGSolMgr.hpp:1120
MPT: #23 0x0000000000495e2e in Pure_Diagonalize::diagonalize (this=0x2787,
MPT:     matrix=..., numev=0,
MPT:     initial_eigenvector=<error reading variable: Cannot access memory
at address 0xffffffffffffffff>,
MPT:     overlap_matrix=<error reading variable: Cannot access memory at
address 0x0>)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Core/Diagonalize/Pure_Diagonalize.cpp:163
MPT: #24 0x0000000000691f9b in Scf::iterate (this=0x2787,
initial_state=...,
MPT:     final_state=...)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Compute/Scf.cpp:335
MPT: #25 0x000000000068ee1b in Scf::compute (this=0x2787, mesh=...,
states=...)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Compute/Scf.cpp:133
MPT: #26 0x0000000000686109 in main (argc=10119, argv=0x7fffffff687c)
MPT:     at
/home/shchoi/program/ACE-Molecule.0626/source/Compute/main.cpp:363
MPT: (gdb) A debugging session is active.
MPT:
MPT:  Inferior 1 [process 10053] will be detached.
MPT:
MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
MPT: Detaching from program: /proc/10053/exe, process 10053

MPT: -----stack traceback ends-----

MPT: -----stack traceback ends-----
MPT: MPI_COMM_WORLD rank 1 has terminated without calling MPI_Finalize()
 aborting job
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20160706/a1f4801c/attachment.html>


More information about the Trilinos-Users mailing list