[Trilinos-Users] [EXTERNAL] Re: decomp tool Error

Sai P Uppati uppatis at utexas.edu
Mon Feb 22 14:51:44 EST 2016


Mac OS X 10.11.3 (El Capitan)

I've attached the cmake configure script for the Trilinos build. I'm using
MPI compilers from open-mpi version 1.10.1_1, which I installed via
Homebrew on my machine.

Sai
ᐧ

On Mon, Feb 22, 2016 at 9:13 AM, Sjaardema, Gregory D <gdsjaar at sandia.gov>
wrote:

> What version of OS X are you running?
>
> Can you send me your Trilinos configure script with the compilers that you
> are trying to use?
>
> Not sure what is happening since my OS X version runs correctly… Will have
> to figure out what is different on your system…
> ..Greg
>
> --
> "A supercomputer is a device for turning compute-bound problems into
> I/O-bound problems”
>
> From: Sai P Uppati <uppatis at utexas.edu>
> Date: Friday, February 19, 2016 at 11:58 AM
> To: "Sjaardema, Gregory D" <gdsjaar at sandia.gov>
> Cc: "Bradley, Andrew Michael" <ambradl at sandia.gov>, "
> trilinos-users at trilinos.org" <trilinos-users at trilinos.org>
>
> Subject: Re: [Trilinos-Users] [EXTERNAL] Re: decomp tool Error
>
> UPDATE 2:
>
> NetCDF 4.4.0: Tried two variations, both with netcdf-4 and dap enabled:
>
> I) With the following changes in netcdf.h:
>
> # Modify the following #define statements in the netcdf.h file.  Change the values to match what is given below.
> #define NC_MAX_DIMS 65536
> #define NC_MAX_ATTRS 8192
> #define NC_MAX_VARS 524288
> #define NC_MAX_NAME 256
> #define NC_MAX_VAR_DIMS 8
>
>
> II) With default numbers in the netcdf.h
>
> In case I: the following tests in the netcdf test suite failed:
>
> 155/169 Test #155: ncdap_tst_remote3 .....................***Failed
> 11.89 sec
>
>         Start 156: ncdap_tst_formatx
>
> 156/169 Test #156: ncdap_tst_formatx .....................   Passed
> 0.47 sec
>
>         Start 157: ncdap_test_partvar
>
> 157/169 Test #157: ncdap_test_partvar ....................   Passed
> 0.52 sec
>
>         Start 158: ncdap_testurl
>
> 158/169 Test #158: ncdap_testurl .........................   Passed
> 0.77 sec
>
>         Start 159: ncdap_test_nstride_cached
>
> 159/169 Test #159: ncdap_test_nstride_cached .............***Exception:
> SegFault  0.59 sec
>
>         Start 160: ncdap_t_misc
>
> 160/169 Test #160: ncdap_t_misc ..........................   Passed
> 0.14 sec
>
>         Start 161: ncdap_test_varm3
>
> 161/169 Test #161: ncdap_test_varm3 ......................***Exception:
> SegFault  0.58 sec
>
>         Start 162: C_tests_simple_xy_wr
>
> 162/169 Test #162: C_tests_simple_xy_wr ..................   Passed
> 0.01 sec
>
>         Start 163: C_tests_simple_xy_rd
>
> 163/169 Test #163: C_tests_simple_xy_rd ..................   Passed
> 0.04 sec
>
>         Start 164: C_tests_sfc_pres_temp_wr
>
> 164/169 Test #164: C_tests_sfc_pres_temp_wr ..............   Passed
> 0.01 sec
>
>         Start 165: C_tests_sfc_pres_temp_rd
>
> 165/169 Test #165: C_tests_sfc_pres_temp_rd ..............   Passed
> 0.01 sec
>
>         Start 166: C_tests_pres_temp_4D_wr
>
> 166/169 Test #166: C_tests_pres_temp_4D_wr ...............   Passed
> 0.06 sec
>
>         Start 167: C_tests_pres_temp_4D_rd
>
> 167/169 Test #167: C_tests_pres_temp_4D_rd ...............   Passed
> 0.03 sec
>
>         Start 168: cdl_create_sample_files
>
> 168/169 Test #168: cdl_create_sample_files ...............   Passed
> 0.05 sec
>
>         Start 169: cdl_do_comps
>
> 169/169 Test #169: cdl_do_comps ..........................   Passed
> 0.01 sec
>
>
> 98% tests passed, 3 tests failed out of 169
>
>
> Total Test time (real) = 111.31 sec
>
>
> The following tests FAILED:
>
> 155 - ncdap_tst_remote3 (Failed)
>
> 159 - ncdap_test_nstride_cached (SEGFAULT)
>
> 161 - ncdap_test_varm3 (SEGFAULT)
>
> Errors while running CTest
>
> In case II, all tests passed (100%).
>
> But in either case, trilinos build against these variations still results
> in the decomp error. However, Peridigm builds fine with all tests passing
> in either case.
>
> Sai
>
>
>
>
>>
> On Fri, Feb 19, 2016 at 11:47 AM, Sai P Uppati <uppatis at utexas.edu> wrote:
>
>> Hi Greg,
>>
>> It's not a specific mesh that I'm having trouble with. Basically, any
>> mesh I'm trying to decompose, I get either the netcdf error I've pasted in
>> previous messages on this thread, or the following about 'segmentation
>> fault':
>>
>> Executing:
>>    /usr/local/trilinos/bin/nem_slice -e -S  -l inertial -c -o
>> cube_split.g.nem -m mesh=4 cube_split.g
>>    ...see cube_split.g.decomp.out for nem_slice status
>>
>> Beginning nem_slice execution.
>> Input Mesh File = 'cube_split.g'
>> Using 32-bit integer mode for decomposition...
>> [dhcp-128-83-76-100:51612] *** Process received signal ***
>> [dhcp-128-83-76-100:51612] Signal: Segmentation fault: 11 (11)
>> [dhcp-128-83-76-100:51612] Signal code: Address not mapped (1)
>> [dhcp-128-83-76-100:51612] Failing at address: 0x7fbdae400000
>> [dhcp-128-83-76-100:51612] [ 0] 0   libsystem_platform.dylib
>>  0x00007fff91404eaa _sigtramp + 26
>> [dhcp-128-83-76-100:51612] [ 1] 0   ???
>> 0x00007fff6df35390 0x0 + 140735038051216
>> [dhcp-128-83-76-100:51612] [ 2] 0   nem_slice
>> 0x000000010ab53522
>> _Z13write_nemesisIiEiRNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEEP19Machine_DescriptionP19Problem_DescriptionP16Mesh_DescriptionIT_EP14LB_DescriptionISD_EP11Sphere_Info
>> + 16162
>> [dhcp-128-83-76-100:51612] [ 3] 0   nem_slice
>> 0x000000010ab4afce _Z13internal_mainIiEiiPPcT_ + 8814
>> [dhcp-128-83-76-100:51612] [ 4] 0   nem_slice
>> 0x000000010ab44400 main + 1680
>> [dhcp-128-83-76-100:51612] [ 5] 0   libdyld.dylib
>> 0x00007fff9788e5ad start + 1
>> [dhcp-128-83-76-100:51612] *** End of error message ***
>> /usr/local/trilinos/bin/decomp: line 125: 51612 Segmentation fault: 11  (
>> $NEM_SLICE -e $spheres $decomp_method $do_viz $nem_slice_flag -o $nemesis
>> -m mesh=$processors $genesis >> $output )
>>
>> ERROR:******************************************************************
>> ERROR:
>> ERROR     During nem_slice execution. Check error output above and rerun
>> ERROR:
>> ERROR:******************************************************************
>>
>> You can see that this is not one of the meshes you see in the errors
>> prior to this message. Any mesh I try to decompose is having this problem.
>> All these errors though are 'During nem_slice execution'.
>>
>> I'm using netcdf 4.4.0, the latest stable version. I've also tried netcdf
>> version 4.3.3.1, which was working before the rebasing to the latest GitHub
>> trilinos version. I've tried variations including not enabling/disabling
>> netcdf-4 and dap. None of these variations were helping. The latest build I
>> have is with netcdf-4 and dap enabled, and I've not made changes to the
>> numbers in netcdf.h file as instructed in the peridigm webpage. All the
>> tests in the netcdf test suite (~160 tests, I believe) passed.
>>
>> I'm attaching the mesh file you requested, but it's not a mesh specific
>> issue. I really appreciate your help looking into this issue. Meanwhile,
>> I'll try leaving netcdf-4 and dap enabled, but changing the variables as
>> shown on the peridigm page, and see if that variation works to fix this
>> issue.
>>
>> Sai
>>>>
>> On Fri, Feb 19, 2016 at 7:52 AM, Sjaardema, Gregory D <gdsjaar at sandia.gov
>> > wrote:
>>
>>> Most builds of nemslice use a netcdf with netcdf4 enabled.  It looks
>>> like there is a logic error somewhere with determining whether something is
>>> 32 or 64 bit (even in 32 bit mode we output some values to the nem file as
>>> 64 bit values if netcdf4 enabled).  If you could enable netcdf4 on your
>>> build then hopefully it will work.  I'm still trying to track down why
>>> yours is failing but can't replicate yet.
>>>
>>> What netcdf version are you using?
>>>
>>> .. Greg
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 18, 2016, at 9:28 PM, Sjaardema, Gregory D <gdsjaar at sandia.gov>
>>> wrote:
>>>
>>> Is it possible to send me the mesh.
>>>
>>> Also, you don't need to and probably shouldn't disable netcdf4.  It will
>>> give you more options, but shouldn't cause the problem you are seeing.
>>>
>>> If I could try the mesh I might be able to replicate the issue.
>>>
>>> ..Greg
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 18, 2016, at 4:47 PM, Bradley, Andrew Michael <ambradl at sandia.gov>
>>> wrote:
>>>
>>> Hi Sai,
>>>
>>>
>>> OK. Sorry, but I'll have to let the experts step in at this point. I had
>>> guessed that might work based on examining elb_main.C lines 139, 168-171,
>>> but there must be some deeper issue that I'm not seeing.
>>>
>>>
>>> Andrew
>>>
>>>
>>> ------------------------------
>>> *From:* Sai P Uppati <uppatis at utexas.edu>
>>> *Sent:* Thursday, February 18, 2016 4:39 PM
>>> *To:* Bradley, Andrew Michael
>>> *Cc:* Sjaardema, Gregory D; trilinos-users at trilinos.org
>>> *Subject:* Re: [Trilinos-Users] [EXTERNAL] Re: decomp tool Error
>>>
>>> Andrew,
>>>
>>> There is no change in the error it throws. Still uses a 32-bit integer
>>> mode for decomposition.
>>>
>>> Sai
>>>>>>
>>> On Thu, Feb 18, 2016 at 5:28 PM, Bradley, Andrew Michael <
>>> ambradl at sandia.gov> wrote:
>>>
>>>> Hi Sai,
>>>>
>>>>
>>>> Just a guess, but what happens if you add the command-line flag -64 ?
>>>>
>>>>
>>>> Andrew
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Trilinos-Users <trilinos-users-bounces at trilinos.org> on behalf
>>>> of Sai P Uppati <uppatis at utexas.edu>
>>>> *Sent:* Thursday, February 18, 2016 4:23 PM
>>>> *To:* Sjaardema, Gregory D
>>>> *Cc:* trilinos-users at trilinos.org
>>>> *Subject:* Re: [Trilinos-Users] [EXTERNAL] Re: decomp tool Error
>>>>
>>>> UPDATE:
>>>>
>>>> I think all other tools may be working fine from my trilinos build. The
>>>> tools I commonly use from trilinos include decomp, epu, exodiff and epu.
>>>> All except decomp seem to working fine.
>>>>
>>>> Even after rebuilding trilinos several times (varying some options each
>>>> time) and Peridigm passing all the tests each time, decomp throws errors
>>>> like these:
>>>>
>>>> Executing:
>>>>    /usr/local/trilinos/bin/nem_slice -e -S  -l inertial -c -o
>>>> HEGF-res-cylin.g.nem -m mesh=4 HEGF-res-cylin.g
>>>>    ...see HEGF-res-cylin.g.decomp.out for nem_slice status
>>>>
>>>> Beginning nem_slice execution.
>>>> Input Mesh File = 'HEGF-res-cylin.g'
>>>> Using 32-bit integer mode for decomposition...
>>>> Exodus Library Warning/Error: [ex_put_cmap_params_cc]
>>>> Error: failed to add dimension for "ncnt_cmap" of size 6313656973 in
>>>> file ID 65536
>>>> NetCDF: Invalid dimension size
>>>> ================================messages================================
>>>> fatal: unable to output communication map parameters
>>>> fatal: could not output Nemesis file
>>>>
>>>> ERROR:******************************************************************
>>>> ERROR:
>>>> ERROR     During nem_slice execution. Check error output above and rerun
>>>> ERROR:
>>>> ERROR:******************************************************************
>>>>
>>>> Sai
>>>>>>>>
>>>> On Thu, Feb 18, 2016 at 11:03 AM, Sai P Uppati <uppatis at utexas.edu>
>>>> wrote:
>>>>
>>>>> Hi Greg,
>>>>>
>>>>> An example mesh I'm trying to decompose contains 178320 elements,
>>>>> 189405 nodes and 1 block. I tried decomposing for 4, 6 and 8 processors. I
>>>>> haven't had problems with previous Trilinos versions I was using before. I
>>>>> think it was only since I rebased to the official version hosted on the
>>>>> GitHub page.
>>>>>
>>>>> Anyways, getopt I was able to fix with John Foster's help. I just
>>>>> installed a gnu-getopt version from Homebrew and modified the PATH variable
>>>>> to look for it first before looking in /usr/bin.
>>>>>
>>>>> Coming to Netcdf, I followed the instructions exactly as they stated
>>>>> in the following page: https://peridigm.sandia.gov/content/netcdf.
>>>>> So, I disabled netcdf-4 and dap, and installed it using the changed numbers
>>>>> in netcdf.h file as well. All the tests passed when I did 'make check'. So
>>>>> I didn't think there were any issues with the netcdf installation. Doing it
>>>>> this way, however, there was no referencing the HDF5 build I did in the
>>>>> previous step. Even the in summary of netcdf configuation, the HDF5 support
>>>>> seems to off. I left HDF5 installed though because I saw that it maybe
>>>>> needed for the SEACAS package in Trilinos.
>>>>>
>>>>> But as I mentioned before, I didn't have issues like this with
>>>>> previous Trilinos versions (I also didn't follow the netcdf instructions
>>>>> given at the webpage before, I just installed whatever was default from
>>>>> unidata). Perhaps, the instructions on the page are not completely correct?
>>>>>
>>>>> Sorry for the long email, but those are all the details.
>>>>>
>>>>> Sai
>>>>>
>>>>> On Thu, Feb 18, 2016 at 7:38 AM, Sjaardema, Gregory D <
>>>>> gdsjaar at sandia.gov> wrote:
>>>>>
>>>>>> What size mesh are you decomposing (#elem, #block, #node) and how
>>>>>> many processors are you decompsing it for?
>>>>>>
>>>>>> Did you also install hdf5 and reference it in the netcdf build for
>>>>>> netcdf-4 support, or is it a netcdf build only?
>>>>>>
>>>>>> The current getopt that you have will work, but will give reduced
>>>>>> functionality in regards to long options which you can see by entering -H
>>>>>> and -h and seeing the difference.  I’m not sure if installing the
>>>>>> gnu-getopt in parallel with the system getopt would cause issues or not,
>>>>>> but on my and many other macs we have both installed and have not noticed
>>>>>> any issues (However, I use port instead of brew).
>>>>>>
>>>>>> ..Greg
>>>>>> --
>>>>>> "A supercomputer is a device for turning compute-bound problems into
>>>>>> I/O-bound problems”
>>>>>>
>>>>>> From: Trilinos-Users <trilinos-users-bounces at trilinos.org> on behalf
>>>>>> of "John T. Foster" <jfoster at austin.utexas.edu>
>>>>>> Date: Wednesday, February 17, 2016 at 6:00 PM
>>>>>> To: Sai P Uppati <uppatis at utexas.edu>
>>>>>> Cc: "trilinos-users at trilinos.org" <trilinos-users at trilinos.org>
>>>>>> Subject: [EXTERNAL] Re: [Trilinos-Users] decomp tool Error
>>>>>>
>>>>>> Sai,
>>>>>>
>>>>>> I believe your using homebrew as a package manager so use:
>>>>>>
>>>>>> brew install getopt
>>>>>>
>>>>>> To install the getopt command line utility.
>>>>>>
>>>>>> JTF
>>>>>>
>>>>>> On Wednesday, February 17, 2016, Sai P Uppati <uppatis at utexas.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I installed Trilinos and Peridigm (official versions hosted on
>>>>>>> GitHub) on my Mac OS X 10.11.3, including the dependencies boost, hdf5 and
>>>>>>> netcdf. I followed the instructions on Sandia's Peridigm installation guide
>>>>>>> to the dot.
>>>>>>>
>>>>>>> The Peridigm unit tests all passed, which is good. However, when I
>>>>>>> try to use the decomp tool from Trilinos, I get the following errors:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ########################################################################
>>>>>>> The "getopt" executable that is available on this system is an older
>>>>>>> version that is not compatible with the needs of the "decomp" tool.
>>>>>>> If possible, you should update your getopt to a newer version and
>>>>>>> make
>>>>>>> sure that the new getopt is in your path.
>>>>>>>
>>>>>>> Below are some options for getting the current getopt version:
>>>>>>> * If on a Mac: "sudo port install getopt"
>>>>>>> * Search the internet for "getopt-1.1.5" or "getopt-1.1.4"; download
>>>>>>> and build
>>>>>>>
>>>>>>> Enter "-h" for the modified options that this version supports.
>>>>>>> Enter "-H" for the options that the standard version supports.
>>>>>>>
>>>>>>> ########################################################################
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Executing:
>>>>>>>    /usr/local/trilinos/bin/nem_slice -e -S  -l inertial -c -o
>>>>>>> prism-precrack.g.nem -m mesh=8 prism-precrack.g
>>>>>>>    ...see prism-precrack.g.decomp.out for nem_slice status
>>>>>>>
>>>>>>> Beginning nem_slice execution.
>>>>>>> Input Mesh File = 'prism-precrack.g'
>>>>>>> Using 32-bit integer mode for decomposition...
>>>>>>> Exodus Library Warning/Error: [ex_put_cmap_params_cc]
>>>>>>> Error: unable to output variable in file ID 65536
>>>>>>> NetCDF: Index exceeds dimension bound
>>>>>>>
>>>>>>> ================================messages================================
>>>>>>> fatal: unable to output communication map parameters
>>>>>>> fatal: could not output Nemesis file
>>>>>>>
>>>>>>>
>>>>>>> ERROR:******************************************************************
>>>>>>> ERROR:
>>>>>>> ERROR     During nem_slice execution. Check error output above and
>>>>>>> rerun
>>>>>>> ERROR:
>>>>>>>
>>>>>>> ERROR:******************************************************************
>>>>>>>
>>>>>>>
>>>>>>> There are multiple errors here.
>>>>>>>
>>>>>>> 1) I don't know how to update the getopt executable. It seems Mac OS
>>>>>>> X already comes with a built in version (which I checked and found to be in
>>>>>>> /usr/bin), but this version in not compatible with decomp. I checked
>>>>>>> Homebrew, and there is a key only option to install gnu-getopt, but they
>>>>>>> have a warning that installing different versions in parallel can cause
>>>>>>> trouble. I'm not able to find any other working way to install get opt with
>>>>>>> out causing errors.
>>>>>>>
>>>>>>> 2) NetCDF error about exceeding dimensions. I installed the latest
>>>>>>> version of netcdf-c, 4.4.0. I changed the numbers in netcdf.h as instructed
>>>>>>> in the Peridigm installation guide. I have a feeling that this may have
>>>>>>> something to do with the error, but I'm not quite sure. All tests passed,
>>>>>>> however, when I installed netcdf from source.
>>>>>>>
>>>>>>> There may be other errors I'm not seeing. Please, I would appreciate
>>>>>>> if I can get some guidance on how to address these errors.
>>>>>>>
>>>>>>> Sai
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from iPhone
>>>>>>
>>>>>
>>>>>>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20160222/4b113c54/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trilinos-macosx-clang-cmake.sh
Type: application/x-sh
Size: 2354 bytes
Desc: not available
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20160222/4b113c54/attachment.sh>


More information about the Trilinos-Users mailing list