[Trilinos-Users] Trilinos-Users Digest, Vol 101, Issue 9
Riccardo Rossi
rrossi at cimne.upc.edu
Wed Jan 22 00:53:30 MST 2014
Just as a comment with respect to eventual future Epetra-Tpetra interfaces,
in my opinion it would be rather nice-to-have a performant (ideally
sero-copy) translation of Epetra_CRS to Tpetra, particularly since as i
understand TPetra misses the FECrs matrix capabilities which are very very
handy for FE programs.
having said this i shall also say that i never used TPetra, but indeed it
would be nice to have a simple way to try out eventual performance
improvements without writing a lot of code...
greetings
Riccardo
On Tue, Jan 21, 2014 at 8:00 PM, <trilinos-users-request at software.sandia.gov
> wrote:
> Send Trilinos-Users mailing list submissions to
> trilinos-users at software.sandia.gov
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://software.sandia.gov/mailman/listinfo/trilinos-users
> or, via email, send a message with subject or body 'help' to
> trilinos-users-request at software.sandia.gov
>
> You can reach the person managing the list at
> trilinos-users-owner at software.sandia.gov
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Trilinos-Users digest..."
>
>
> Today's Topics:
>
> 1. Re: Simple way to convert Epetra CrsMatrix to Tpetra
> CrsMatrix (Hoemmen, Mark)
> 2. Re: aztec00 problem (Heroux, Mike)
> 3. Re: [EXTERNAL] Re: Trilinos Build Problem
> (Gary.Myers.Contractor at unnpp.gov)
> 4. CMAKE issues in PyTrilinos (Bunde, Kermit A)
> 5. Re: aztec00 problem (Matthias Heil)
> 6. Re: aztec00 problem (Heroux, Mike)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 20 Jan 2014 21:03:37 +0000
> From: "Hoemmen, Mark" <mhoemme at sandia.gov>
> Subject: Re: [Trilinos-Users] Simple way to convert Epetra CrsMatrix
> to Tpetra CrsMatrix
> To: "<trilinos-users at software.sandia.gov>"
> <trilinos-users at software.sandia.gov>
> Message-ID: <FFFE87AD-71A4-4BEC-88C9-E80F68155C69 at sandia.gov>
> Content-Type: text/plain; charset="Windows-1252"
>
> On Jan 20, 2014, at 12:00 PM, <trilinos-users-request at software.sandia.gov>
> <trilinos-users-request at software.sandia.gov> wrote:
> > Date: Mon, 20 Jan 2014 14:02:08 +0000
> > From: Timo Betcke <t.betcke at ucl.ac.uk>
> > Subject: [Trilinos-Users] Simple way to convert Epetra CrsMatrix to
> > Tpetra CrsMatrix
> > To: trilinos-users at software.sandia.gov
> >
> > Dear Trilinos Users,
> >
> > I am starting to move some code from Epetra to Tpetra. To begin I'd like
> to
> > create Tpetra Crs matrices given Epetra Crs matrices as input. Is there a
> > more simple way to do this rather than iterating through the rows of the
> > Epetra matrix and copy elements over to a Tpetra matrix?
>
> Hi Timo --
>
> You could use the setAllValues() and expertStaticFillComplete() methods of
> Tpetra::CrsMatrix to speed up the process. I won't say that this is
> "simpler" but it may perform better. Here is how it would work:
>
> 1. Create the Epetra_CrsMatrix A_e.
>
> 2. Call FillComplete on A_e.
>
> 3. Extract raw pointers to the data in A_e, and their lengths.
>
> int* ptr;
> int* ind;
> double* val;
>
> int info = A_e.ExtractCrsDataPointers (&ptr, &ind, &val);
> if (info != 0) {
> // ? report error and exit ?
> }
> const int numRows = A_e.Graph ().NumMyRows ();
> const int nnz = A_e.Graph ().NumMyEntries ();
>
> 4. Copy data into new Teuchos::ArrayRCP arrays. (Note that ptr2 and ptr
> store data of different types.)
>
> Teuchos::ArrayRCP<size_t> ptr2 (numRows+1);
> Teuchos::ArrayRCP<int> ind2 (nnz);
> Teuchos::ArrayRCP<double> val2 (nnz);
>
> std::copy (ptr, ptr + numRows + 1, ptr2.begin ());
> std::copy (ind, ind + nnz, ind2.begin ());
> std::copy (val, val + nnz, val2.begin ());
>
> 5. Create a Tpetra::CrsMatrix A_t with the appropriate row (and column)
> Map(s), just as you created A_e above.
>
> 6. Give A_t the new data arrays:
>
> A_t.setAllValues (ptr2, ind2, val2);
>
> 7. Call fillComplete on A_t. You may also use expertStaticFillComplete if
> you already have Import (and Export) objects constructed.
>
> If there is sufficient interest, we may also provide a Tpetra::RowMatrix
> interface to an Epetra_CrsMatrix object. This would let you use Ifpack2
> with Epetra objects.
>
> mfh
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 20 Jan 2014 21:18:01 +0000
> From: "Heroux, Mike" <MHeroux at csbsju.edu>
> Subject: Re: [Trilinos-Users] aztec00 problem
> To: Matthias Heil <matthias.heil at manchester.ac.uk>,
> "trilinos-bugs at software.sandia.gov"
> <trilinos-bugs at software.sandia.gov>,
> "trilinos-users at software.sandia.gov"
> <trilinos-users at software.sandia.gov>
> Cc: Andrew Hazel <ahazel at maths.manchester.ac.uk>, David Wilke
> <david.wilke at student.adelaide.edu.au>
> Message-ID: <CF02E861.B2724%mheroux at csbsju.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> Matthias,
>
> Do you have a sense of whether or not the data sizes you are using would
> result in array indexing that exceed 2.1 billion? The kinds of issues you
> are seeing would be consistent with trying to address an array using an
> integer value that is bigger than what signed int can handle.
>
> The hex values you are printing are very large (more than 140 trillion),
> which seems to indicate an incorrect address calculation somewhere. I
> agree that the memory manager should detect the issue, no matter what.
>
> Mike
>
> On 1/20/14 8:24 AM, "Matthias Heil" <matthias.heil at manchester.ac.uk>
> wrote:
>
> >Hi,
> >
> > we've come across a possible bug in trilinos aztecoo.
> >The code seg faults when trying to execute the line
> >
> > *dst_ptr++ = s;
> >
> >in
> >
> >trilinos-11.4.3-Source/packages/epetra/src/Epetra_CrsMatrix.cpp:3327
> >
> >An attempt to de-reference that pointer (in ddd) shows:
> >
> >(gdb) print *dst_ptr
> >Cannot access memory at address 0x8001cb466110
> >
> >Moving back through the call stack shows that the
> >memory is initially allocated in AZ_manage_memory(...)
> >which is called from just under
> >
> > trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:239
> >
> >The problem arises only for large values of kspace which is related
> >to the max number of iterations. We've set this to a rather large
> >value of 5000 (We don't usually need that many, BUT the code
> >should hopefully still be able to handle this or fail
> >gracefully. Things work ok for smaller values, e.g kspace=1000).
> >
> >Following the return from this call, the memory allocated in
> >AZ_manage_memory(...) gets distributed into two vectors, hh
> >and v, and it's v that contains the illegal memory address:
> >Placing a breakpoint in
> >
> > trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:248
> >
> >(just after that loop) and interrogating various values of v yields:
> >
> >(gdb) print v[5000]
> >$1 = (double *) 0x8001cb466110
> >
> >and, predictably:
> >
> >(gdb) print *v[5000]
> >Cannot access memory at address 0x8001cb466110
> >
> >whereas
> >
> >(gdb) print *v[500]
> >$4 = 0
> >
> >is fine.
> >
> >Trial and error shows that things go wrong beyond entry 518:
> >
> >(gdb) print *v[519]
> >Cannot access memory at address 0x7ffff0bf14f0
> >(gdb) print *v[518]
> >$8 = 0
> >
> >Further information:
> >
> > -- All code was completely built from source, using gcc
> > without optimisation and with -g.
> >
> > -- Based on a (small) sample of machines, the problem only
> > arises on 64 bit machines (not 32)
> >
> > -- The problem only arises for sufficiently big problem sizes
> > (though they are still way short of the machines' total
> > available memory). When running on a machine with very
> > little memory, the call to AZ_manage_memory(...) fails
> > gracefully with the "maybe you should try a smaller problem"
> > message.
> >
> > -- The problem arises with both serial and parallel installations
> > (i.e. when the code is compiled with and without mpi support)
> > and with different trilinos releases.
> >
> > -- The problem is difficult to isolate further since we use
> > trilinos from within our own big library (which provides the
> > preconditioner). Note that our code works fine if we use our
> > own (serial) GMRES solver (or a direct solver).
> >
> > Does any of this ring a bell?
> >
> > Happy to run further tests here or provide additional diagnostic
> >information.
> >
> > Best wishes,
> >
> > Matthias
> >
> >--
> >--------------------------------------------------------------------------
> >-
> >Professor Matthias Heil
> >
> >Alan Turing Building, Room 2.224
> >School of Mathematics Tel. +44 (0)161 275 5808
> >University of Manchester Fax. +44 (0)161 275 5819
> >Oxford Road email: M.Heil at maths.man.ac.uk
> >Manchester M13 9PL WWW: http://www.maths.man.ac.uk/~mheil/
> >U.K.
> >
> >NEWS: The beta release of oomph-lib, the object-oriented
> > multi-physics finite-element library is now available
> > as free open-source software at
> >
> > http://www.oomph-lib.org
> >
> >--------------------------------------------------------------------------
> >-
> >
> >_______________________________________________
> >Trilinos-Users mailing list
> >Trilinos-Users at software.sandia.gov
> >http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 21 Jan 2014 09:33:47 -0500
> From: <Gary.Myers.Contractor at unnpp.gov>
> Subject: Re: [Trilinos-Users] [EXTERNAL] Re: Trilinos Build Problem
> To: <Gary.Myers.Contractor at unnpp.gov>, <bmpersc at sandia.gov>,
> <trilinos-users at software.sandia.gov>
> Message-ID:
> <7FC4686EDC94604991223BE416ADD1DA0D3BF0 at UIBETPEXV1.ias.unrpnet.gov
> >
> Content-Type: text/plain; charset="us-ascii"
>
> Brent,
>
> As you suspected, there is a conflict with a older version of Trilinos
> which was installed in the same area as my referenced to BOOST in the
> build of Trilinos 11.4.2.
>
> I managed to separate the BOOST from the old Trilinos installation and
> test against the simple Trilinos 11.4.2 combined teuchos and sacado
> package build with success.
>
> THANKS VERY MUCH FOR YOUR HELP!
>
> Gary
>
> -----Original Message-----
> From: trilinos-users-bounces at software.sandia.gov
> [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of
> Gary.Myers.Contractor at unnpp.gov
> Sent: Monday, January 20, 2014 1:05 PM
> To: bmpersc at sandia.gov; trilinos-users at software.sandia.gov
> Subject: Re: [Trilinos-Users] [EXTERNAL] Re: Trilinos Build Problem
>
> Brent,
>
> I am in the process of getting the full verbose compile over to our
> computers which interface with the internet.
>
> In the meantime, I have successfully built and test teuchos by itself
> from Trilinos 11.4.2. Then I attempted to just build and test teuchos
> and sacado. This attempt failed in the manner that was consistent with
> what I have been getting with the full build. I looked at the
> Teuchos_config.h file and the TEUCHOSCORE_LIB_DLL_EXPORT macros is not
> defined.
>
> Now, there is a possibility that my environmental pathnames could be
> pulling in an older version. Let me check into this and see if this is
> causing an issue.
>
> I will get back to you on this.
>
> Thanks,
>
> Gary
>
> -----Original Message-----
> From: Perschbacher, Brent M [mailto:bmpersc at sandia.gov]
> Sent: Monday, January 20, 2014 12:29 PM
> To: Myers, Gary T (Contractor); bundeka at id.doe.gov;
> trilinos-users at software.sandia.gov
> Subject: Re: [EXTERNAL] Re: [Trilinos-Users] Trilinos Build Problem
>
> Gary,
> That macro being empty is fine, in fact it is expected to be empty
> more often than not. However, the error in question is stating that the
> macro isn't defined, even as empty. Since the preprocessor didn't
> replace it the compiler is trying to figure it out based on context and
> isn't able to.
> Getting the build files from you would help greatly specifically the
> full verbose compile line and error message. I'm not sure if this is the
> cause of your issues, but do you have a previously installed copy of
> Trilinos somewhere on your machine? Some versions of linux have been
> distributing older versions of Trilinos for a while now. It is possible
> that an older copy of Teuchos_DLLExport_Macro.h is being picked up
> instead of the copy in your source tree. When teuchos was refactored
> into subpackages that file was changed to no longer be a generated file,
> but instead to use a static copy. This was the solution we settled on
> for teuchos when we discovered how the subpackages interacted with our
> windows DLL build.
>
> Brent
>
> On 1/17/14 10:15 AM, "Gary.Myers.Contractor at unnpp.gov"
> <Gary.Myers.Contractor at unnpp.gov> wrote:
>
> >Brent and Kermit,
> >
> >Thanks for the suggestions. Although its goes against what I want in
> >my final product, I did set BUILD_SHARED_LIBS to OFF. However, I still
>
> >get a similar error in Teuchos. In either case for BUILD_SHARED_LIBS,
> >I think TEUCHOSCORE_LIB_DLL_EXPORT is being defined, but as an empty
> >macro. However, because this macro is being used in some declarations,
>
> >it is producing compiling errors. It's not clear to me how I am digging
>
> >myself into this hole, so I am in the process of getting output files
> >over to a computer so I can share with everyone. In the mean time,
> >here is my configuration command line which I am typing into this
> >e-mail (so there may be some typos). Because many items are referenced
> >from nonstandard areas, they need to be explicitly defined in the cmake
>
> >commandline.
> >
> >cmake -D BUILD_SHARED_LIBS:BOOL=ON \
> > -D CMAKE_BUILD_TYPE:STRING='Rlease' \
> > -D CMAKE_INSTALL_PREFIX:PATH (path name here) \
> > -D Trilinos_ENABLE_ALL_PACKAGES:BOOL=ON \
> > -D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON \
> > -D Trilinos_ENABLE_TESTS:BOOL=ON \
> > -D Trilinos_ENABLE_SECONDARY_STABLE_CODE:BOOL=ON \
> > -D Trilinos_ENABLE_Fortran:BOOL=ON \
> > -D TPL_ENABLE_MPI:BOOL=ON \
> > -D MPI_BASE_DIR:PATH= (path name here) \
> > -D MPI_EXEC:FILEPATH= (path name here) \
> > -D MPI_C_COMPILER:FILEPATH= (path name to openmpi 1.4.5 mpicc) \
> > -D MPI_CXX_COMPILER:FILEPATH= (path name to openmpi 1.4.5
> >mpicxx) \
> > -D MPI_Fortran_COMPILER:FILEPATH= (path name to openmpi 1.4.5
> >mpif90) \
> > -D TPL_ENABLE_Boost:BOOL=ON \
> > -D Boost_LIBRARY_DIRS:PATH= (path name here) \
> > -D Boost_INCLUDE_DIRS:PATH= (path name here) \
> > -D TPL_ENABLE_MKL:BOOL=ON \
> > -D MKL_LIBRARY_DIRS:FILEPATH= (path name to intel64 libraries) \
> > -D MKL_INCLUDE_DIRS:FILEPATH= (path name to intel64 includes) \
> > -D MKL_LIBRARY_NAMES:STRING='mkl_rt' \
> > -D DOXYGEN_EXECUTABLE:FILEPATH= (path name) \
> > -D Netcdf_INCLUDE_DIRS:PATH= (path name) \
> > -D Netcdf_LIBRARY_DIRS:PATH= (path name) \
> > -D Matio_INCLUDE_DIRS:PATH= (path name) \
> > -D Matio_LIBRARY_DIRS:PATH= (path name) \
> > -D PYTHON_EXECUTABLE:FILEPATH= (path name to python 2.7.5) \
> > -D PYTHON_INCLUDE_DIRS:PATH= (path name to python 2.7.5
> >includes) \
> > -D PYTHON_LIBRARIES:PATH= (path list to python 2.7 libraries) \
> > -D BLAS_LIBRARY_DIRS:PATH= (path to intel64 library ) \
> > -D BLAS_LIBRARY_NAMES:STRING='mkl_rt' \
> > -D LAPACK_LIBRARY_DIRS:PATH= (path to intel64 library ) \
> > -D LAPACK_LIBRARY_NAMES:STRING= 'mkl_rt' \
> > (path name to source distribution for trilinos 11.4.2)
> >
> >Error occurs in line 82 of Teuchos_ScalarTraits.hpp - this declaration
> >has no storage class or type specifier
> > TEUCHOSCORE_LIB_DLL_EXPORT
> >
> >Error occurs in line 83 of Teuchos_ScalarTrailts.hpp - expected a ;
> > Void throwScalarTraitsNanInfError( const std::string &errMsg );
> >
> >. . . . etc. . . .
> >
> >Please note that if I disable Teuchos I am successful, but then I lose
> >too many other packages which are important to me. Any additional
> >guidance of how I can dig out of this hole would be great.
> >
> >Regards,
> >
> >Gary T. Myers
> >Principal Scientist
> >Bechtel Marine Propulsion Corp.
> >Gary.Myers.contractor at unnpp.gov
> >
> >
> >
> >
> >
> >-----Original Message-----
> >From: Perschbacher, Brent M [mailto:bmpersc at sandia.gov]
> >Sent: Thursday, January 16, 2014 4:00 PM
> >To: Myers, Gary T (Contractor); bundeka at id.doe.gov;
> >trilinos-users at software.sandia.gov
> >Subject: Re: [EXTERNAL] Re: [Trilinos-Users] Trilinos Build Problem
> >
> >Gary,
> > You really don't want to force _WIN32 to be set if you aren't on a
> >windows machine creating a shared build. That macro is used to specify
> >which classes/functions will be publicly available inside DLL's.
> >However, it should be defined but as empty on non-windows machines. The
>
> >last few lines of the Teuchos_DLLExportMacro.h should have that
> >assignment to empty. I have an idea what might be causing this, but I'm
>
> >not sure why it wouldn't have caused us trouble before if it is the
> >case. I will look into it. In the mean time you can either build a
> >static build (ie don't set BUILD_SHARED_LIBS to ON) or you could add
> >the empty define manually with CMAKE_CXX_FLAGS and CMAKE_C_FLAGS.
> >
> >Brent
> >
> >On 1/16/14 1:30 PM, "Gary.Myers.Contractor at unnpp.gov"
> ><Gary.Myers.Contractor at unnpp.gov> wrote:
> >
> >>Kermit,
> >>
> >>Yes, I already have this set. But, I feel that _WIN32 is not set
> >>(which my intuition suggest is correct) since I am not on a WIN 32
> >>based computer. I am on a Linux 64 bit cluster. I guess I could
> >>force
> >
> >>the
> >>_WIN32 to be defined and see what happens.
> >>
> >>Gary
> >>
> >>
> >>
> >>-----Original Message-----
> >>From: Bunde, Kermit A [mailto:bundeka at id.doe.gov]
> >>Sent: Thursday, January 16, 2014 3:02 PM
> >>To: Myers, Gary T (Contractor)
> >>Subject: RE: [Trilinos-Users] Trilinos Build Problem
> >>
> >>Gary,
> >>
> >>
> >>
> >>Please ignore my last reply. In doing a little google searching:
> >>
> >>
> >>
> >>http://msdn.microsoft.com/en-us/library/b0084kay.aspx lists
> >>
> >>
> >>
> >>_WIN32 as a predefined macro.
> >>
> >>
> >>
> >>
> >>
> >>Try adding this to your do_configure script:
> >>
> >>
> >>
> >> -D BUILD_SHARED_LIBS:BOOL=ON \
> >>
> >>Kermit Bunde
> >>Enforcement Coordinator
> >>Criticality Safety SME
> >>Nuclear Safety SME
> >>DOE-ID Aviation Safety Officer
> >>208-526-5188 (office)
> >>208-526-1926 (fax)
> >>208-680-6843 (cell)
> >>"Accept the challenges so that you may feel the exhilaration of
> >>victory."
> >>
> >>Never tell people how to do things. Tell them what to do and they will
>
> >>surprise you with their ingenuity."
> >>
> >>--George S. Patton Jr.,
> >>American Army general
> >>
> >>
> >>
> >>From: trilinos-users-bounces at software.sandia.gov
> >>[mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of
> >>Gary.Myers.Contractor at unnpp.gov
> >>Sent: Thursday, January 16, 2014 8:42 AM
> >>To: trilinos-users at software.sandia.gov
> >>Subject: [Trilinos-Users] Trilinos Build Problem
> >>
> >>
> >>
> >>Hi,
> >>
> >>
> >>
> >>Newbie question here: First time build of Trilinos using CMake.
> >>
> >>
> >>
> >>I am building Trilinos 11.4.2 on a Linux Cluster using Intel
> >>compilers;
> >
> >>openMPI, MKL, . . .
> >>
> >>
> >>
> >>CMake configuration is completed, but Teuchos fails to compile
> >>because
> >
> >>macro TEUCHOSCORE_LIB_DLL_EXPORT is not explicitly define. The first
> >>reference occurs on line 82 of Teuchos_ScalarTraits.hpp.
> >>
> >>
> >>
> >>In looking at Teuchos_DLLExportMacro.h, I could see how this could
> >>happen since _WIN32 probably is not defined.
> >>
> >>
> >>
> >>Can someone suggest how I can get past this (please note that I am not
>
> >>in a position to easily share files since the Linux Cluster is not on
> >>the grid)?
> >>
> >>
> >>
> >>Thanks,
> >>
> >>
> >>
> >>Gary T. Myers
> >>
> >>Principal Scientist
> >>
> >>Bechtel Marine Propulsion Corp.
> >>
> >>Gary.Myers.contractor at unnpp.gov
> >>
> >>
> >>_______________________________________________
> >>Trilinos-Users mailing list
> >>Trilinos-Users at software.sandia.gov
> >>http://software.sandia.gov/mailman/listinfo/trilinos-users
> >
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 21 Jan 2014 14:52:37 +0000
> From: "Bunde, Kermit A" <bundeka at id.doe.gov>
> Subject: [Trilinos-Users] CMAKE issues in PyTrilinos
> To: "'trilinos-users at software.sandia.gov'"
> <trilinos-users at software.sandia.gov>
> Message-ID:
> <1A714649FC7A5A44ACF6FE45DAFD86275823471D at IDXCH1.id.doe.lcl>
> Content-Type: text/plain; charset="us-ascii"
>
> I believe that there is an error in the code fragment from the
> CMakeLists.txt file in "trilinos-11.4.3-Source\packages\PyTrilinos\src "
> directory below:
>
> #
> # On Mac OS X Gnu compilers, add dynamic lookup for undefined symbols
> # to the pytrilinos library and PyTrilinos extension modules
> SET(EXTRA_LINK_ARGS "${CMAKE_SHARED_LINKER_FLAGS}")
> IF(APPLE)
> IF((CMAKE_CXX_COMPILER_ID MATCHES) "GNU" OR (CMAKE_CXX_COMPILER_ID
> MATCHES "Clang"))
> SET(EXTRA_LINK_ARGS "${EXTRA_LINK_ARGS} -undefined dynamic_lookup")
> ENDIF()
> ENDIF(APPLE)
>
>
> This line:
> IF((CMAKE_CXX_COMPILER_ID MATCHES) "GNU" OR (CMAKE_CXX_COMPILER_ID MATCHES
> "Clang"))
>
> Should be:
> IF((CMAKE_CXX_COMPILER_ID MATCHES "GNU") OR (CMAKE_CXX_COMPILER_ID MATCHES
> "Clang"))
>
> Also I am getting a parse error of this form from one of the lines below
> from the same file:
> Parse error. Function missing ending ")".
> Instead found unterminated string with text ")
> ".
> #
> # Add the additional "make clean" files
> GET_DIRECTORY_PROPERTY(clean_files ADDITIONAL_MAKE_CLEAN_FILES)
> LIST(APPEND clean_files ${ADDITIONAL_CLEAN_FILES})
> LIST(REMOVE_DUPLICATES clean_files)
> LIST(REMOVE_ITEM clean_files "")
> SET_DIRECTORY_PROPERTIES(PROPERTIES ADDITIONAL_MAKE_CLEAN_FILES
> "${clean_files}")
>
>
> Thanks for your help.
> Kermit Bunde
> Enforcement Coordinator
> Criticality Safety SME
> Nuclear Safety SME
> DOE-ID Aviation Safety Officer
> 208-526-5188 (office)
> 208-526-1926 (fax)
> 208-680-6843 (cell)
> "Accept the challenges so that you may feel the exhilaration of victory."
> Never tell people how to do things. Tell them what to do and they will
> surprise you with their ingenuity."
> --George S. Patton Jr.,
> American Army general
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> https://software.sandia.gov/pipermail/trilinos-users/attachments/20140121/36ae01cf/attachment-0001.html
>
> ------------------------------
>
> Message: 5
> Date: Tue, 21 Jan 2014 15:58:38 +0000
> From: Matthias Heil <matthias.heil at manchester.ac.uk>
> Subject: Re: [Trilinos-Users] aztec00 problem
> To: "Heroux, Mike" <MHeroux at CSBSJU.EDU>
> Cc: Andrew Hazel <ahazel at maths.manchester.ac.uk>, David Wilke
> <david.wilke at student.adelaide.edu.au>,
> "trilinos-users at software.sandia.gov"
> <trilinos-users at software.sandia.gov>,
> "trilinos-bugs at software.sandia.gov"
> <trilinos-bugs at software.sandia.gov>
> Message-ID: <52DE992E.5070901 at manchester.ac.uk>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Mike,
>
> we've made some progress. After some more digging we've
> established that the offensive call to AZ_manage_memory is
> made with the following arguments:
>
> AZ_manage_memory (input_size=295202080,
> action=0,
> type=-914901,
> name=0x7fffffffd720 "vblock in gmres0",
> status=0x7fffffffd6c8)
>
>
> at trilinos-11.4.3-Source/packages/aztecoo/src/az_util.c:944
> when "viewed" from within AZ_manage_memory.
>
> However, looking at the calling code, the first argument, input_size,
> is derived from: kspace=5000; aligned_N_total=222084;
> sizeof(double)=8, so it should be (5000+1)*222084*8=8885136672.
>
> Andrew Hazel then wrote the small test code, below, which shows that
> 295202080 is the value given to an unsigned int that stores the result
> of the calculation. The problem appears to be that the first argument
> to AZ_manage_memory is an unsigned int, rather than an unsigned
> long (or some other custom type).
>
> Matthias
>
>
> #include <iostream>
>
> int main()
> {
> unsigned kspace = 5000;
> unsigned aligned_N_total = 222084;
>
> unsigned temp = (kspace+1)*aligned_N_total*sizeof(double);
> unsigned long temp2 = (kspace+1)*aligned_N_total*sizeof(double);
>
> std::cout << temp << " " << temp2 << "\n";
> }
>
>
>
> On 20/01/14 21:18, Heroux, Mike wrote:
> > Matthias,
> >
> > Do you have a sense of whether or not the data sizes you are using would
> > result in array indexing that exceed 2.1 billion? The kinds of issues
> you
> > are seeing would be consistent with trying to address an array using an
> > integer value that is bigger than what signed int can handle.
> >
> > The hex values you are printing are very large (more than 140 trillion),
> > which seems to indicate an incorrect address calculation somewhere. I
> > agree that the memory manager should detect the issue, no matter what.
> >
> > Mike
> >
> > On 1/20/14 8:24 AM, "Matthias Heil" <matthias.heil at manchester.ac.uk>
> wrote:
> >
> >> Hi,
> >>
> >> we've come across a possible bug in trilinos aztecoo.
> >> The code seg faults when trying to execute the line
> >>
> >> *dst_ptr++ = s;
> >>
> >> in
> >>
> >> trilinos-11.4.3-Source/packages/epetra/src/Epetra_CrsMatrix.cpp:3327
> >>
> >> An attempt to de-reference that pointer (in ddd) shows:
> >>
> >> (gdb) print *dst_ptr
> >> Cannot access memory at address 0x8001cb466110
> >>
> >> Moving back through the call stack shows that the
> >> memory is initially allocated in AZ_manage_memory(...)
> >> which is called from just under
> >>
> >> trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:239
> >>
> >> The problem arises only for large values of kspace which is related
> >> to the max number of iterations. We've set this to a rather large
> >> value of 5000 (We don't usually need that many, BUT the code
> >> should hopefully still be able to handle this or fail
> >> gracefully. Things work ok for smaller values, e.g kspace=1000).
> >>
> >> Following the return from this call, the memory allocated in
> >> AZ_manage_memory(...) gets distributed into two vectors, hh
> >> and v, and it's v that contains the illegal memory address:
> >> Placing a breakpoint in
> >>
> >> trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:248
> >>
> >> (just after that loop) and interrogating various values of v yields:
> >>
> >> (gdb) print v[5000]
> >> $1 = (double *) 0x8001cb466110
> >>
> >> and, predictably:
> >>
> >> (gdb) print *v[5000]
> >> Cannot access memory at address 0x8001cb466110
> >>
> >> whereas
> >>
> >> (gdb) print *v[500]
> >> $4 = 0
> >>
> >> is fine.
> >>
> >> Trial and error shows that things go wrong beyond entry 518:
> >>
> >> (gdb) print *v[519]
> >> Cannot access memory at address 0x7ffff0bf14f0
> >> (gdb) print *v[518]
> >> $8 = 0
> >>
> >> Further information:
> >>
> >> -- All code was completely built from source, using gcc
> >> without optimisation and with -g.
> >>
> >> -- Based on a (small) sample of machines, the problem only
> >> arises on 64 bit machines (not 32)
> >>
> >> -- The problem only arises for sufficiently big problem sizes
> >> (though they are still way short of the machines' total
> >> available memory). When running on a machine with very
> >> little memory, the call to AZ_manage_memory(...) fails
> >> gracefully with the "maybe you should try a smaller problem"
> >> message.
> >>
> >> -- The problem arises with both serial and parallel installations
> >> (i.e. when the code is compiled with and without mpi support)
> >> and with different trilinos releases.
> >>
> >> -- The problem is difficult to isolate further since we use
> >> trilinos from within our own big library (which provides the
> >> preconditioner). Note that our code works fine if we use our
> >> own (serial) GMRES solver (or a direct solver).
> >>
> >> Does any of this ring a bell?
> >>
> >> Happy to run further tests here or provide additional diagnostic
> >> information.
> >>
> >> Best wishes,
> >>
> >> Matthias
> >>
> >> --
> >>
> --------------------------------------------------------------------------
> >> -
> >> Professor Matthias Heil
> >>
> >> Alan Turing Building, Room 2.224
> >> School of Mathematics Tel. +44 (0)161 275 5808
> >> University of Manchester Fax. +44 (0)161 275 5819
> >> Oxford Road email: M.Heil at maths.man.ac.uk
> >> Manchester M13 9PL WWW: http://www.maths.man.ac.uk/~mheil/
> >> U.K.
> >>
> >> NEWS: The beta release of oomph-lib, the object-oriented
> >> multi-physics finite-element library is now available
> >> as free open-source software at
> >>
> >> http://www.oomph-lib.org
> >>
> >>
> --------------------------------------------------------------------------
> >> -
> >>
> >> _______________________________________________
> >> Trilinos-Users mailing list
> >> Trilinos-Users at software.sandia.gov
> >> http://software.sandia.gov/mailman/listinfo/trilinos-users
>
> --
> ---------------------------------------------------------------------------
> Professor Matthias Heil
>
> Alan Turing Building, Room 2.224
> School of Mathematics Tel. +44 (0)161 275 5808
> University of Manchester Fax. +44 (0)161 275 5819
> Oxford Road email: M.Heil at maths.man.ac.uk
> Manchester M13 9PL WWW: http://www.maths.man.ac.uk/~mheil/
> U.K.
>
> NEWS: The beta release of oomph-lib, the object-oriented
> multi-physics finite-element library is now available
> as free open-source software at
>
> http://www.oomph-lib.org
>
> ---------------------------------------------------------------------------
>
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 21 Jan 2014 17:05:20 +0000
> From: "Heroux, Mike" <MHeroux at csbsju.edu>
> Subject: Re: [Trilinos-Users] aztec00 problem
> To: Matthias Heil <matthias.heil at manchester.ac.uk>
> Cc: Andrew Hazel <ahazel at maths.manchester.ac.uk>, David Wilke
> <david.wilke at student.adelaide.edu.au>,
> "trilinos-bugs at software.sandia.gov"
> <trilinos-bugs at software.sandia.gov>,
> "trilinos-users at software.sandia.gov"
> <trilinos-users at software.sandia.gov>
> Message-ID: <CF04042E.B28A6%mheroux at csbsju.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> Matthias,
>
> AztecOO has been "upgraded" to handle larger problems. We can now use it
> for problems where the global integer data is "long long", but we decided
> to avoid a complete transition to 64-bit ints. Instead, the Belos
> package, along with Tpetra is our long-term solution for this issue.
>
> I am recording the result of this conversation with trilinos-bugs, so we
> can get the fix on the queue.
>
> Thanks.
>
> Mike
>
> On 1/21/14 12:00 PM, "Matthias Heil" <matthias.heil at manchester.ac.uk>
> wrote:
>
> >Mike,
> >
> > thanks for your quick reply. I agree that if this
> >
> >"AztecOO is not designed to work with problems
> >where the size of any local data objects is beyond the range of signed
> >32-bit ints."
> >
> >is the policy (which is sensible -- if potentially inconvenient/confusing
> >for the user who can't necessarily assess what happens
> >internally) then it's not a bug. I'd noticed some recent changes
> >to aztec's source code where ints had been upgraded to long ints,
> >and inferred (wrongly!) that there was a general attempt to allow
> >it to handle bigger problems. I had hoped that we might simply
> >need a similar tweak here but do realise that there would almost
> >certainly be additional problems lurking further "downstream".
> >
> >However, adding some internal sanity checking that issues warnings
> >(or aborts) if this problem arises would be VERY helpful. Took us
> >quite a while to get to the bottom of this...
> >
> > Anyway, with your explanation I'm happy to regard this as
> >resolved...
> >
> > Thanks for the quick feedback (and the great code!).
> >
> > Best wishes,
> >
> > Matthias
> >
> >
> >
> >
> >On 21/01/14 16:26, Heroux, Mike wrote:
> >> Matthias,
> >>
> >> Just to make sure I understand: This is a real overflow of range, so
> >>the
> >> issue is not a bug in the correct execution of AztecOO, but in not
> >> detecting the memory error. AztecOO is not designed to work with
> >>problems
> >> where the size of any local data objects is beyond the range of signed
> >> 32-bit ints.
> >>
> >> It seems that we could add a quick check to AZ_manage_memory that would
> >> copy the input_size value into a signed int and then compare the result,
> >> or something similar.
> >>
> >> Is this the kind of fix you could use? Please let me know if I am
> >>missing
> >> the point.
> >>
> >> Thanks.
> >>
> >> Mike
> >>
> >> On 1/21/14 9:58 AM, "Matthias Heil" <matthias.heil at manchester.ac.uk>
> >>wrote:
> >>
> >>> Mike,
> >>>
> >>> we've made some progress. After some more digging we've
> >>> established that the offensive call to AZ_manage_memory is
> >>> made with the following arguments:
> >>>
> >>> AZ_manage_memory (input_size=295202080,
> >>> action=0,
> >>> type=-914901,
> >>> name=0x7fffffffd720 "vblock in gmres0",
> >>> status=0x7fffffffd6c8)
> >>>
> >>>
> >>> at trilinos-11.4.3-Source/packages/aztecoo/src/az_util.c:944
> >>> when "viewed" from within AZ_manage_memory.
> >>>
> >>> However, looking at the calling code, the first argument, input_size,
> >>> is derived from: kspace=5000; aligned_N_total=222084;
> >>> sizeof(double)=8, so it should be (5000+1)*222084*8=8885136672.
> >>>
> >>> Andrew Hazel then wrote the small test code, below, which shows that
> >>> 295202080 is the value given to an unsigned int that stores the result
> >>> of the calculation. The problem appears to be that the first argument
> >>> to AZ_manage_memory is an unsigned int, rather than an unsigned
> >>> long (or some other custom type).
> >>>
> >>> Matthias
> >>>
> >>>
> >>> #include <iostream>
> >>>
> >>> int main()
> >>> {
> >>> unsigned kspace = 5000;
> >>> unsigned aligned_N_total = 222084;
> >>>
> >>> unsigned temp = (kspace+1)*aligned_N_total*sizeof(double);
> >>> unsigned long temp2 = (kspace+1)*aligned_N_total*sizeof(double);
> >>>
> >>> std::cout << temp << " " << temp2 << "\n";
> >>> }
> >>>
> >>>
> >>>
> >>> On 20/01/14 21:18, Heroux, Mike wrote:
> >>>> Matthias,
> >>>>
> >>>> Do you have a sense of whether or not the data sizes you are using
> >>>>would
> >>>> result in array indexing that exceed 2.1 billion? The kinds of issues
> >>>> you
> >>>> are seeing would be consistent with trying to address an array using
> >>>>an
> >>>> integer value that is bigger than what signed int can handle.
> >>>>
> >>>> The hex values you are printing are very large (more than 140
> >>>>trillion),
> >>>> which seems to indicate an incorrect address calculation somewhere. I
> >>>> agree that the memory manager should detect the issue, no matter what.
> >>>>
> >>>> Mike
> >>>>
> >>>> On 1/20/14 8:24 AM, "Matthias Heil" <matthias.heil at manchester.ac.uk>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> we've come across a possible bug in trilinos aztecoo.
> >>>>> The code seg faults when trying to execute the line
> >>>>>
> >>>>> *dst_ptr++ = s;
> >>>>>
> >>>>> in
> >>>>>
> >>>>> trilinos-11.4.3-Source/packages/epetra/src/Epetra_CrsMatrix.cpp:3327
> >>>>>
> >>>>> An attempt to de-reference that pointer (in ddd) shows:
> >>>>>
> >>>>> (gdb) print *dst_ptr
> >>>>> Cannot access memory at address 0x8001cb466110
> >>>>>
> >>>>> Moving back through the call stack shows that the
> >>>>> memory is initially allocated in AZ_manage_memory(...)
> >>>>> which is called from just under
> >>>>>
> >>>>> trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:239
> >>>>>
> >>>>> The problem arises only for large values of kspace which is related
> >>>>> to the max number of iterations. We've set this to a rather large
> >>>>> value of 5000 (We don't usually need that many, BUT the code
> >>>>> should hopefully still be able to handle this or fail
> >>>>> gracefully. Things work ok for smaller values, e.g kspace=1000).
> >>>>>
> >>>>> Following the return from this call, the memory allocated in
> >>>>> AZ_manage_memory(...) gets distributed into two vectors, hh
> >>>>> and v, and it's v that contains the illegal memory address:
> >>>>> Placing a breakpoint in
> >>>>>
> >>>>> trilinos-11.4.3-Source/packages/aztecoo/src/az_gmres.c:248
> >>>>>
> >>>>> (just after that loop) and interrogating various values of v yields:
> >>>>>
> >>>>> (gdb) print v[5000]
> >>>>> $1 = (double *) 0x8001cb466110
> >>>>>
> >>>>> and, predictably:
> >>>>>
> >>>>> (gdb) print *v[5000]
> >>>>> Cannot access memory at address 0x8001cb466110
> >>>>>
> >>>>> whereas
> >>>>>
> >>>>> (gdb) print *v[500]
> >>>>> $4 = 0
> >>>>>
> >>>>> is fine.
> >>>>>
> >>>>> Trial and error shows that things go wrong beyond entry 518:
> >>>>>
> >>>>> (gdb) print *v[519]
> >>>>> Cannot access memory at address 0x7ffff0bf14f0
> >>>>> (gdb) print *v[518]
> >>>>> $8 = 0
> >>>>>
> >>>>> Further information:
> >>>>>
> >>>>> -- All code was completely built from source, using gcc
> >>>>> without optimisation and with -g.
> >>>>>
> >>>>> -- Based on a (small) sample of machines, the problem only
> >>>>> arises on 64 bit machines (not 32)
> >>>>>
> >>>>> -- The problem only arises for sufficiently big problem sizes
> >>>>> (though they are still way short of the machines' total
> >>>>> available memory). When running on a machine with very
> >>>>> little memory, the call to AZ_manage_memory(...) fails
> >>>>> gracefully with the "maybe you should try a smaller problem"
> >>>>> message.
> >>>>>
> >>>>> -- The problem arises with both serial and parallel installations
> >>>>> (i.e. when the code is compiled with and without mpi support)
> >>>>> and with different trilinos releases.
> >>>>>
> >>>>> -- The problem is difficult to isolate further since we use
> >>>>> trilinos from within our own big library (which provides the
> >>>>> preconditioner). Note that our code works fine if we use our
> >>>>> own (serial) GMRES solver (or a direct solver).
> >>>>>
> >>>>> Does any of this ring a bell?
> >>>>>
> >>>>> Happy to run further tests here or provide additional
> >>>>>diagnostic
> >>>>> information.
> >>>>>
> >>>>> Best wishes,
> >>>>>
> >>>>> Matthias
> >>>>>
> >>>>> --
> >>>>>
> >>>>>
> >>>>>----------------------------------------------------------------------
> >>>>>--
> >>>>> --
> >>>>> -
> >>>>> Professor Matthias Heil
> >>>>>
> >>>>> Alan Turing Building, Room 2.224
> >>>>> School of Mathematics Tel. +44 (0)161 275 5808
> >>>>> University of Manchester Fax. +44 (0)161 275 5819
> >>>>> Oxford Road email: M.Heil at maths.man.ac.uk
> >>>>> Manchester M13 9PL WWW:
> >>>>>http://www.maths.man.ac.uk/~mheil/
> >>>>> U.K.
> >>>>>
> >>>>> NEWS: The beta release of oomph-lib, the object-oriented
> >>>>> multi-physics finite-element library is now available
> >>>>> as free open-source software at
> >>>>>
> >>>>> http://www.oomph-lib.org
> >>>>>
> >>>>>
> >>>>>
> >>>>>----------------------------------------------------------------------
> >>>>>--
> >>>>> --
> >>>>> -
> >>>>>
> >>>>> _______________________________________________
> >>>>> Trilinos-Users mailing list
> >>>>> Trilinos-Users at software.sandia.gov
> >>>>> http://software.sandia.gov/mailman/listinfo/trilinos-users
> >>> --
> >>>
> >>>------------------------------------------------------------------------
> >>>--
> >>> -
> >>> Professor Matthias Heil
> >>>
> >>> Alan Turing Building, Room 2.224
> >>> School of Mathematics Tel. +44 (0)161 275 5808
> >>> University of Manchester Fax. +44 (0)161 275 5819
> >>> Oxford Road email: M.Heil at maths.man.ac.uk
> >>> Manchester M13 9PL WWW:
> http://www.maths.man.ac.uk/~mheil/
> >>> U.K.
> >>>
> >>> NEWS: The beta release of oomph-lib, the object-oriented
> >>> multi-physics finite-element library is now available
> >>> as free open-source software at
> >>>
> >>> http://www.oomph-lib.org
> >>>
> >>>
> >>>------------------------------------------------------------------------
> >>>--
> >>> -
> >>>
> >>>
> >>>
> >
> >--
> >--------------------------------------------------------------------------
> >-
> >Professor Matthias Heil
> >
> >Alan Turing Building, Room 2.224
> >School of Mathematics Tel. +44 (0)161 275 5808
> >University of Manchester Fax. +44 (0)161 275 5819
> >Oxford Road email: M.Heil at maths.man.ac.uk
> >Manchester M13 9PL WWW: http://www.maths.man.ac.uk/~mheil/
> >U.K.
> >
> >NEWS: The beta release of oomph-lib, the object-oriented
> > multi-physics finite-element library is now available
> > as free open-source software at
> >
> > http://www.oomph-lib.org
> >
> >--------------------------------------------------------------------------
> >-
> >
> >
> >
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users
>
>
> End of Trilinos-Users Digest, Vol 101, Issue 9
> **********************************************
>
--
Dr. Riccardo Rossi, Civil Engineer
Member of Kratos Team
International Center for Numerical Methods in Engineering - CIMNE
Campus Norte, Edificio C1
c/ Gran Capitán s/n
08034 Barcelona, España
Tel: (+34) 93 401 56 96
Fax: (+34) 93.401.6517
web: www.cimne.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://software.sandia.gov/pipermail/trilinos-users/attachments/20140122/bc2f2269/attachment-0001.html
More information about the Trilinos-Users
mailing list