[Trilinos-Users] [EXTERNAL] Re: decomp tool Error

Sai P Uppati uppatis at utexas.edu
Mon Feb 22 20:27:06 EST 2016


Wow. That worked! Adding these couple lines in my cmake script for Trilinos
and rebuilding it was the charm I guess.

Thanks a lot for your patience with this issue, Greg. Thanks everyone who
had inputs on this issue.

Sai
ᐧ

On Mon, Feb 22, 2016 at 3:03 PM, Sjaardema, Gregory D <gdsjaar at sandia.gov>
wrote:

> OK. I’m not yet on El Capitan, so not sure if that could be part of the
> issue…
>
> Could you try doing the netcdf-4 options enabled and then adding:
>
> -D TPL_Netcdf_Enables_Netcdf4:BOOL=ON \
> -D Trilinos_EXTRA_LINK_FLAGS:STRING="-L${TPL}/lib -lhdf5_hl -lhdf5 -lz” \
>
> —Greg
>
> --
> "A supercomputer is a device for turning compute-bound problems into
> I/O-bound problems”
>
> From: Sai P Uppati <uppatis at utexas.edu>
> Date: Monday, February 22, 2016 at 12:51 PM
>
> To: "Sjaardema, Gregory D" <gdsjaar at sandia.gov>
> Cc: "Bradley, Andrew Michael" <ambradl at sandia.gov>, "
> trilinos-users at trilinos.org" <trilinos-users at trilinos.org>
> Subject: Re: [Trilinos-Users] [EXTERNAL] Re: decomp tool Error
>
> Mac OS X 10.11.3 (El Capitan)
>
> I've attached the cmake configure script for the Trilinos build. I'm using
> MPI compilers from open-mpi version 1.10.1_1, which I installed via
> Homebrew on my machine.
>
> Sai
>>
> On Mon, Feb 22, 2016 at 9:13 AM, Sjaardema, Gregory D <gdsjaar at sandia.gov>
> wrote:
>
>> What version of OS X are you running?
>>
>> Can you send me your Trilinos configure script with the compilers that
>> you are trying to use?
>>
>> Not sure what is happening since my OS X version runs correctly… Will
>> have to figure out what is different on your system…
>> ..Greg
>>
>> --
>> "A supercomputer is a device for turning compute-bound problems into
>> I/O-bound problems”
>>
>> From: Sai P Uppati <uppatis at utexas.edu>
>> Date: Friday, February 19, 2016 at 11:58 AM
>> To: "Sjaardema, Gregory D" <gdsjaar at sandia.gov>
>> Cc: "Bradley, Andrew Michael" <ambradl at sandia.gov>, "
>> trilinos-users at trilinos.org" <trilinos-users at trilinos.org>
>>
>> Subject: Re: [Trilinos-Users] [EXTERNAL] Re: decomp tool Error
>>
>> UPDATE 2:
>>
>> NetCDF 4.4.0: Tried two variations, both with netcdf-4 and dap enabled:
>>
>> I) With the following changes in netcdf.h:
>>
>> # Modify the following #define statements in the netcdf.h file.  Change the values to match what is given below.
>> #define NC_MAX_DIMS 65536
>> #define NC_MAX_ATTRS 8192
>> #define NC_MAX_VARS 524288
>> #define NC_MAX_NAME 256
>> #define NC_MAX_VAR_DIMS 8
>>
>>
>> II) With default numbers in the netcdf.h
>>
>> In case I: the following tests in the netcdf test suite failed:
>>
>> 155/169 Test #155: ncdap_tst_remote3 .....................***Failed
>> 11.89 sec
>>
>>         Start 156: ncdap_tst_formatx
>>
>> 156/169 Test #156: ncdap_tst_formatx .....................   Passed
>> 0.47 sec
>>
>>         Start 157: ncdap_test_partvar
>>
>> 157/169 Test #157: ncdap_test_partvar ....................   Passed
>> 0.52 sec
>>
>>         Start 158: ncdap_testurl
>>
>> 158/169 Test #158: ncdap_testurl .........................   Passed
>> 0.77 sec
>>
>>         Start 159: ncdap_test_nstride_cached
>>
>> 159/169 Test #159: ncdap_test_nstride_cached .............***Exception:
>> SegFault  0.59 sec
>>
>>         Start 160: ncdap_t_misc
>>
>> 160/169 Test #160: ncdap_t_misc ..........................   Passed
>> 0.14 sec
>>
>>         Start 161: ncdap_test_varm3
>>
>> 161/169 Test #161: ncdap_test_varm3 ......................***Exception:
>> SegFault  0.58 sec
>>
>>         Start 162: C_tests_simple_xy_wr
>>
>> 162/169 Test #162: C_tests_simple_xy_wr ..................   Passed
>> 0.01 sec
>>
>>         Start 163: C_tests_simple_xy_rd
>>
>> 163/169 Test #163: C_tests_simple_xy_rd ..................   Passed
>> 0.04 sec
>>
>>         Start 164: C_tests_sfc_pres_temp_wr
>>
>> 164/169 Test #164: C_tests_sfc_pres_temp_wr ..............   Passed
>> 0.01 sec
>>
>>         Start 165: C_tests_sfc_pres_temp_rd
>>
>> 165/169 Test #165: C_tests_sfc_pres_temp_rd ..............   Passed
>> 0.01 sec
>>
>>         Start 166: C_tests_pres_temp_4D_wr
>>
>> 166/169 Test #166: C_tests_pres_temp_4D_wr ...............   Passed
>> 0.06 sec
>>
>>         Start 167: C_tests_pres_temp_4D_rd
>>
>> 167/169 Test #167: C_tests_pres_temp_4D_rd ...............   Passed
>> 0.03 sec
>>
>>         Start 168: cdl_create_sample_files
>>
>> 168/169 Test #168: cdl_create_sample_files ...............   Passed
>> 0.05 sec
>>
>>         Start 169: cdl_do_comps
>>
>> 169/169 Test #169: cdl_do_comps ..........................   Passed
>> 0.01 sec
>>
>>
>> 98% tests passed, 3 tests failed out of 169
>>
>>
>> Total Test time (real) = 111.31 sec
>>
>>
>> The following tests FAILED:
>>
>> 155 - ncdap_tst_remote3 (Failed)
>>
>> 159 - ncdap_test_nstride_cached (SEGFAULT)
>>
>> 161 - ncdap_test_varm3 (SEGFAULT)
>>
>> Errors while running CTest
>>
>> In case II, all tests passed (100%).
>>
>> But in either case, trilinos build against these variations still results
>> in the decomp error. However, Peridigm builds fine with all tests passing
>> in either case.
>>
>> Sai
>>
>>
>>
>>
>>>>
>> On Fri, Feb 19, 2016 at 11:47 AM, Sai P Uppati <uppatis at utexas.edu>
>> wrote:
>>
>>> Hi Greg,
>>>
>>> It's not a specific mesh that I'm having trouble with. Basically, any
>>> mesh I'm trying to decompose, I get either the netcdf error I've pasted in
>>> previous messages on this thread, or the following about 'segmentation
>>> fault':
>>>
>>> Executing:
>>>    /usr/local/trilinos/bin/nem_slice -e -S  -l inertial -c -o
>>> cube_split.g.nem -m mesh=4 cube_split.g
>>>    ...see cube_split.g.decomp.out for nem_slice status
>>>
>>> Beginning nem_slice execution.
>>> Input Mesh File = 'cube_split.g'
>>> Using 32-bit integer mode for decomposition...
>>> [dhcp-128-83-76-100:51612] *** Process received signal ***
>>> [dhcp-128-83-76-100:51612] Signal: Segmentation fault: 11 (11)
>>> [dhcp-128-83-76-100:51612] Signal code: Address not mapped (1)
>>> [dhcp-128-83-76-100:51612] Failing at address: 0x7fbdae400000
>>> [dhcp-128-83-76-100:51612] [ 0] 0   libsystem_platform.dylib
>>>  0x00007fff91404eaa _sigtramp + 26
>>> [dhcp-128-83-76-100:51612] [ 1] 0   ???
>>> 0x00007fff6df35390 0x0 + 140735038051216
>>> [dhcp-128-83-76-100:51612] [ 2] 0   nem_slice
>>> 0x000000010ab53522
>>> _Z13write_nemesisIiEiRNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEEP19Machine_DescriptionP19Problem_DescriptionP16Mesh_DescriptionIT_EP14LB_DescriptionISD_EP11Sphere_Info
>>> + 16162
>>> [dhcp-128-83-76-100:51612] [ 3] 0   nem_slice
>>> 0x000000010ab4afce _Z13internal_mainIiEiiPPcT_ + 8814
>>> [dhcp-128-83-76-100:51612] [ 4] 0   nem_slice
>>> 0x000000010ab44400 main + 1680
>>> [dhcp-128-83-76-100:51612] [ 5] 0   libdyld.dylib
>>> 0x00007fff9788e5ad start + 1
>>> [dhcp-128-83-76-100:51612] *** End of error message ***
>>> /usr/local/trilinos/bin/decomp: line 125: 51612 Segmentation fault: 11
>>>  ( $NEM_SLICE -e $spheres $decomp_method $do_viz $nem_slice_flag -o
>>> $nemesis -m mesh=$processors $genesis >> $output )
>>>
>>> ERROR:******************************************************************
>>> ERROR:
>>> ERROR     During nem_slice execution. Check error output above and rerun
>>> ERROR:
>>> ERROR:******************************************************************
>>>
>>> You can see that this is not one of the meshes you see in the errors
>>> prior to this message. Any mesh I try to decompose is having this problem.
>>> All these errors though are 'During nem_slice execution'.
>>>
>>> I'm using netcdf 4.4.0, the latest stable version. I've also tried
>>> netcdf version 4.3.3.1, which was working before the rebasing to the latest
>>> GitHub trilinos version. I've tried variations including not
>>> enabling/disabling netcdf-4 and dap. None of these variations were helping.
>>> The latest build I have is with netcdf-4 and dap enabled, and I've not made
>>> changes to the numbers in netcdf.h file as instructed in the peridigm
>>> webpage. All the tests in the netcdf test suite (~160 tests, I believe)
>>> passed.
>>>
>>> I'm attaching the mesh file you requested, but it's not a mesh specific
>>> issue. I really appreciate your help looking into this issue. Meanwhile,
>>> I'll try leaving netcdf-4 and dap enabled, but changing the variables as
>>> shown on the peridigm page, and see if that variation works to fix this
>>> issue.
>>>
>>> Sai
>>>>>>
>>> On Fri, Feb 19, 2016 at 7:52 AM, Sjaardema, Gregory D <
>>> gdsjaar at sandia.gov> wrote:
>>>
>>>> Most builds of nemslice use a netcdf with netcdf4 enabled.  It looks
>>>> like there is a logic error somewhere with determining whether something is
>>>> 32 or 64 bit (even in 32 bit mode we output some values to the nem file as
>>>> 64 bit values if netcdf4 enabled).  If you could enable netcdf4 on your
>>>> build then hopefully it will work.  I'm still trying to track down why
>>>> yours is failing but can't replicate yet.
>>>>
>>>> What netcdf version are you using?
>>>>
>>>> .. Greg
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Feb 18, 2016, at 9:28 PM, Sjaardema, Gregory D <gdsjaar at sandia.gov>
>>>> wrote:
>>>>
>>>> Is it possible to send me the mesh.
>>>>
>>>> Also, you don't need to and probably shouldn't disable netcdf4.  It
>>>> will give you more options, but shouldn't cause the problem you are seeing.
>>>>
>>>> If I could try the mesh I might be able to replicate the issue.
>>>>
>>>> ..Greg
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Feb 18, 2016, at 4:47 PM, Bradley, Andrew Michael <
>>>> ambradl at sandia.gov> wrote:
>>>>
>>>> Hi Sai,
>>>>
>>>>
>>>> OK. Sorry, but I'll have to let the experts step in at this point. I
>>>> had guessed that might work based on examining elb_main.C lines 139,
>>>> 168-171, but there must be some deeper issue that I'm not seeing.
>>>>
>>>>
>>>> Andrew
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Sai P Uppati <uppatis at utexas.edu>
>>>> *Sent:* Thursday, February 18, 2016 4:39 PM
>>>> *To:* Bradley, Andrew Michael
>>>> *Cc:* Sjaardema, Gregory D; trilinos-users at trilinos.org
>>>> *Subject:* Re: [Trilinos-Users] [EXTERNAL] Re: decomp tool Error
>>>>
>>>> Andrew,
>>>>
>>>> There is no change in the error it throws. Still uses a 32-bit integer
>>>> mode for decomposition.
>>>>
>>>> Sai
>>>>>>>>
>>>> On Thu, Feb 18, 2016 at 5:28 PM, Bradley, Andrew Michael <
>>>> ambradl at sandia.gov> wrote:
>>>>
>>>>> Hi Sai,
>>>>>
>>>>>
>>>>> Just a guess, but what happens if you add the command-line flag -64 ?
>>>>>
>>>>>
>>>>> Andrew
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> *From:* Trilinos-Users <trilinos-users-bounces at trilinos.org> on
>>>>> behalf of Sai P Uppati <uppatis at utexas.edu>
>>>>> *Sent:* Thursday, February 18, 2016 4:23 PM
>>>>> *To:* Sjaardema, Gregory D
>>>>> *Cc:* trilinos-users at trilinos.org
>>>>> *Subject:* Re: [Trilinos-Users] [EXTERNAL] Re: decomp tool Error
>>>>>
>>>>> UPDATE:
>>>>>
>>>>> I think all other tools may be working fine from my trilinos build.
>>>>> The tools I commonly use from trilinos include decomp, epu, exodiff and
>>>>> epu. All except decomp seem to working fine.
>>>>>
>>>>> Even after rebuilding trilinos several times (varying some options
>>>>> each time) and Peridigm passing all the tests each time, decomp throws
>>>>> errors like these:
>>>>>
>>>>> Executing:
>>>>>    /usr/local/trilinos/bin/nem_slice -e -S  -l inertial -c -o
>>>>> HEGF-res-cylin.g.nem -m mesh=4 HEGF-res-cylin.g
>>>>>    ...see HEGF-res-cylin.g.decomp.out for nem_slice status
>>>>>
>>>>> Beginning nem_slice execution.
>>>>> Input Mesh File = 'HEGF-res-cylin.g'
>>>>> Using 32-bit integer mode for decomposition...
>>>>> Exodus Library Warning/Error: [ex_put_cmap_params_cc]
>>>>> Error: failed to add dimension for "ncnt_cmap" of size 6313656973 in
>>>>> file ID 65536
>>>>> NetCDF: Invalid dimension size
>>>>>
>>>>> ================================messages================================
>>>>> fatal: unable to output communication map parameters
>>>>> fatal: could not output Nemesis file
>>>>>
>>>>>
>>>>> ERROR:******************************************************************
>>>>> ERROR:
>>>>> ERROR     During nem_slice execution. Check error output above and
>>>>> rerun
>>>>> ERROR:
>>>>>
>>>>> ERROR:******************************************************************
>>>>>
>>>>> Sai
>>>>>>>>>>
>>>>> On Thu, Feb 18, 2016 at 11:03 AM, Sai P Uppati <uppatis at utexas.edu>
>>>>> wrote:
>>>>>
>>>>>> Hi Greg,
>>>>>>
>>>>>> An example mesh I'm trying to decompose contains 178320 elements,
>>>>>> 189405 nodes and 1 block. I tried decomposing for 4, 6 and 8 processors. I
>>>>>> haven't had problems with previous Trilinos versions I was using before. I
>>>>>> think it was only since I rebased to the official version hosted on the
>>>>>> GitHub page.
>>>>>>
>>>>>> Anyways, getopt I was able to fix with John Foster's help. I just
>>>>>> installed a gnu-getopt version from Homebrew and modified the PATH variable
>>>>>> to look for it first before looking in /usr/bin.
>>>>>>
>>>>>> Coming to Netcdf, I followed the instructions exactly as they stated
>>>>>> in the following page: https://peridigm.sandia.gov/content/netcdf.
>>>>>> So, I disabled netcdf-4 and dap, and installed it using the changed numbers
>>>>>> in netcdf.h file as well. All the tests passed when I did 'make check'. So
>>>>>> I didn't think there were any issues with the netcdf installation. Doing it
>>>>>> this way, however, there was no referencing the HDF5 build I did in the
>>>>>> previous step. Even the in summary of netcdf configuation, the HDF5 support
>>>>>> seems to off. I left HDF5 installed though because I saw that it maybe
>>>>>> needed for the SEACAS package in Trilinos.
>>>>>>
>>>>>> But as I mentioned before, I didn't have issues like this with
>>>>>> previous Trilinos versions (I also didn't follow the netcdf instructions
>>>>>> given at the webpage before, I just installed whatever was default from
>>>>>> unidata). Perhaps, the instructions on the page are not completely correct?
>>>>>>
>>>>>> Sorry for the long email, but those are all the details.
>>>>>>
>>>>>> Sai
>>>>>>
>>>>>> On Thu, Feb 18, 2016 at 7:38 AM, Sjaardema, Gregory D <
>>>>>> gdsjaar at sandia.gov> wrote:
>>>>>>
>>>>>>> What size mesh are you decomposing (#elem, #block, #node) and how
>>>>>>> many processors are you decompsing it for?
>>>>>>>
>>>>>>> Did you also install hdf5 and reference it in the netcdf build for
>>>>>>> netcdf-4 support, or is it a netcdf build only?
>>>>>>>
>>>>>>> The current getopt that you have will work, but will give reduced
>>>>>>> functionality in regards to long options which you can see by entering -H
>>>>>>> and -h and seeing the difference.  I’m not sure if installing the
>>>>>>> gnu-getopt in parallel with the system getopt would cause issues or not,
>>>>>>> but on my and many other macs we have both installed and have not noticed
>>>>>>> any issues (However, I use port instead of brew).
>>>>>>>
>>>>>>> ..Greg
>>>>>>> --
>>>>>>> "A supercomputer is a device for turning compute-bound problems into
>>>>>>> I/O-bound problems”
>>>>>>>
>>>>>>> From: Trilinos-Users <trilinos-users-bounces at trilinos.org> on
>>>>>>> behalf of "John T. Foster" <jfoster at austin.utexas.edu>
>>>>>>> Date: Wednesday, February 17, 2016 at 6:00 PM
>>>>>>> To: Sai P Uppati <uppatis at utexas.edu>
>>>>>>> Cc: "trilinos-users at trilinos.org" <trilinos-users at trilinos.org>
>>>>>>> Subject: [EXTERNAL] Re: [Trilinos-Users] decomp tool Error
>>>>>>>
>>>>>>> Sai,
>>>>>>>
>>>>>>> I believe your using homebrew as a package manager so use:
>>>>>>>
>>>>>>> brew install getopt
>>>>>>>
>>>>>>> To install the getopt command line utility.
>>>>>>>
>>>>>>> JTF
>>>>>>>
>>>>>>> On Wednesday, February 17, 2016, Sai P Uppati <uppatis at utexas.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I installed Trilinos and Peridigm (official versions hosted on
>>>>>>>> GitHub) on my Mac OS X 10.11.3, including the dependencies boost, hdf5 and
>>>>>>>> netcdf. I followed the instructions on Sandia's Peridigm installation guide
>>>>>>>> to the dot.
>>>>>>>>
>>>>>>>> The Peridigm unit tests all passed, which is good. However, when I
>>>>>>>> try to use the decomp tool from Trilinos, I get the following errors:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ########################################################################
>>>>>>>> The "getopt" executable that is available on this system is an older
>>>>>>>> version that is not compatible with the needs of the "decomp" tool.
>>>>>>>> If possible, you should update your getopt to a newer version and
>>>>>>>> make
>>>>>>>> sure that the new getopt is in your path.
>>>>>>>>
>>>>>>>> Below are some options for getting the current getopt version:
>>>>>>>> * If on a Mac: "sudo port install getopt"
>>>>>>>> * Search the internet for "getopt-1.1.5" or "getopt-1.1.4";
>>>>>>>> download and build
>>>>>>>>
>>>>>>>> Enter "-h" for the modified options that this version supports.
>>>>>>>> Enter "-H" for the options that the standard version supports.
>>>>>>>>
>>>>>>>> ########################################################################
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Executing:
>>>>>>>>    /usr/local/trilinos/bin/nem_slice -e -S  -l inertial -c -o
>>>>>>>> prism-precrack.g.nem -m mesh=8 prism-precrack.g
>>>>>>>>    ...see prism-precrack.g.decomp.out for nem_slice status
>>>>>>>>
>>>>>>>> Beginning nem_slice execution.
>>>>>>>> Input Mesh File = 'prism-precrack.g'
>>>>>>>> Using 32-bit integer mode for decomposition...
>>>>>>>> Exodus Library Warning/Error: [ex_put_cmap_params_cc]
>>>>>>>> Error: unable to output variable in file ID 65536
>>>>>>>> NetCDF: Index exceeds dimension bound
>>>>>>>>
>>>>>>>> ================================messages================================
>>>>>>>> fatal: unable to output communication map parameters
>>>>>>>> fatal: could not output Nemesis file
>>>>>>>>
>>>>>>>>
>>>>>>>> ERROR:******************************************************************
>>>>>>>> ERROR:
>>>>>>>> ERROR     During nem_slice execution. Check error output above and
>>>>>>>> rerun
>>>>>>>> ERROR:
>>>>>>>>
>>>>>>>> ERROR:******************************************************************
>>>>>>>>
>>>>>>>>
>>>>>>>> There are multiple errors here.
>>>>>>>>
>>>>>>>> 1) I don't know how to update the getopt executable. It seems Mac
>>>>>>>> OS X already comes with a built in version (which I checked and found to be
>>>>>>>> in /usr/bin), but this version in not compatible with decomp. I checked
>>>>>>>> Homebrew, and there is a key only option to install gnu-getopt, but they
>>>>>>>> have a warning that installing different versions in parallel can cause
>>>>>>>> trouble. I'm not able to find any other working way to install get opt with
>>>>>>>> out causing errors.
>>>>>>>>
>>>>>>>> 2) NetCDF error about exceeding dimensions. I installed the latest
>>>>>>>> version of netcdf-c, 4.4.0. I changed the numbers in netcdf.h as instructed
>>>>>>>> in the Peridigm installation guide. I have a feeling that this may have
>>>>>>>> something to do with the error, but I'm not quite sure. All tests passed,
>>>>>>>> however, when I installed netcdf from source.
>>>>>>>>
>>>>>>>> There may be other errors I'm not seeing. Please, I would
>>>>>>>> appreciate if I can get some guidance on how to address these errors.
>>>>>>>>
>>>>>>>> Sai
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from iPhone
>>>>>>>
>>>>>>
>>>>>>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20160222/a66ab23c/attachment.html>


More information about the Trilinos-Users mailing list