[Trilinos-Users] [EXTERNAL] Re: Odd Behavior in EpetraExt_HDF5

Lofstead, Gerald F II gflofst at sandia.gov
Tue Sep 16 14:16:24 MDT 2014


Someone with Epetra HDF experience will have to help.

This does not change the inherent inappropriate &testVec [0]. Yes, it will
work in many cases, but as compiler versions change or you change
platforms, it may not work giving you  subtle bug.

Best,

Jay

On 9/16/14 2:11 PM, "Truman Ellis" <truman at ices.utexas.edu> wrote:

>This is a STL vector. I am relying on the fact that vectors store data
>contiguously. There is no error when I remove the Trilinos code. The
>point of this example was to demonstrate that the EpetraExt_HDF5 code
>hangs when two processors try to write data of different size. The stall
>is not caused by extra time to write data. When both processors write
>the larger amount (2000 doubles) it finished immediately. The stall
>causes the program to never complete.
>
>Truman
>On 9/16/14, 2:52 PM, Lofstead, Gerald F II wrote:
>> Which vector is that? It looks like an C++STL vector rather than a
>> Trilinos vector (E/Tpetra). If so, I expect you can simplify this
>>removing
>> all of Trilinos and see the same behavior. Depending on the version of
>>C++
>> (c++11 is different), how you get the raw data differs. There is a
>> function added in C++11 called data() that will give you a pointer to
>>the
>> elements that would work for the code as written. What you have in
>> &testVec[0] is taking the address of a single element reference. Whether
>> or not that is part of the actual data storage in the vector is not
>> defined. For that matter, taking the address of a reference is not a
>>good
>> idea either. You need to use an iterator or upgrade to C++11 to get the
>> code you have written to work properly and portably. The stall I am not
>> sure why it is happening. If the sizes are what you expect, they may
>>just
>> be the amount of additional time to write the quantity of data. How long
>> is the stall compared to writing out from the base case (1000 elements)?
>>
>> Jay
>>
>> On 9/16/14 12:08 PM, "Jonathan Hu" <jhu at sandia.gov> wrote:
>>
>>> trilinos-users-request at software.sandia.gov wrote on 09/16/2014 11:00
>>>AM:
>>>> Subject:
>>>> Re: [Trilinos-Users] Odd Behavior in EpetraExt_HDF5
>>>> From:
>>>> Truman Ellis <truman at ices.utexas.edu>
>>>> Date:
>>>> 09/15/2014 03:56 PM
>>>>
>>>> To:
>>>> <trilinos-users at software.sandia.gov>
>>>>
>>>>
>>>> There isn't any distributed data in this example. I just wanted two
>>>> mpi processes to simultaneously write out two independent HDF5 files.
>>>> But I noticed that if the two HDF5 files were different sizes (1000
>>>> data items vs 2000 data items) then I got a stall. If they both write
>>>> data of the same size, then everything goes through.
>>>>
>>>> On 9/15/14, 4:10 PM, Jonathan Hu wrote:
>>>>>> I am using the EpetraExt_HDF5 interface to save and load solutions,
>>>>>> but
>>>>>> I've run into some odd behavior and was wondering if anyone could
>>>>>> explain it. My goal is to have each processor write out its own part
>>>>>> of
>>>>>> the solution in a different HDF5 file. For the time being, I am
>>>>>> assuming
>>>>>> that the number of processors loading the solution is equal to the
>>>>>> number writing it. Since each processor is completely independent, I
>>>>>> shouldn't get any weird race conditions or anything like that
>>>>>> (theoretically). In order to communicate this to EpetraExt, I am
>>>>>> using a
>>>>>> Epetra_SerialComm in the constructor. However, the following code
>>>>>> hangs
>>>>>> when I run with 2 mpi nodes
>>>>>>
>>>>>>
>>>>>> {
>>>>>>      int commRank = Teuchos::GlobalMPISession::getRank();
>>>>>>      Epetra_SerialComm Comm;
>>>>>>      EpetraExt::HDF5 hdf5(Comm);
>>>>>>      hdf5.Create("file"+Teuchos::toString(commRank)+".h5");
>>>>>>      vector<double> testVec;
>>>>>>      for (int i=0; i<1000+1000*commRank; i++)
>>>>>>      {
>>>>>>        testVec.push_back(1.0);
>>>>>>      }
>>>>>>      hdf5.Write("Test", "Group", H5T_NATIVE_DOUBLE, testVec.size(),
>>>>>> &testVec[0]);
>>>>>> }
>>>>>> {
>>>>>>      int commRank =
>>>>>> Teuchos::Global_______________________________________________
>>>>>> Trilinos-Users mailing list
>>>>>> Trilinos-Users at software.sandia.gov
>>>>>>
>>>>>> 
>>>>>>https://software.sandia.gov/mailman/listinfo/trilinos-usersMPISession
>>>>>>::
>>>>>> getRank();
>>>>>>
>>>>>>      Epetra_SerialComm Comm;
>>>>>>      EpetraExt::HDF5 hdf5(Comm);
>>>>>>      hdf5.Open("file"+Teuchos::toString(commRank)+".h5");
>>>>>>      hdf5.Close();
>>>>>> }
>>>>>>
>>>>>> Note that commRank 0 writes 1000 elements while commRank 1 writes
>>>>>> 2000.
>>>>>> The code works just fine when both write the same number of
>>>>>>elements.
>>>>>> Can someone enlighten me on what I am doing wrong? Is it possible to
>>>>>> get
>>>>>> the behavior I want, where each processor's read and write is
>>>>>> independent of others?
>>>>>>
>>>>>> Thanks,
>>>>>> Truman Ellis
>>>>> Truman,
>>>>>
>>>>>      Rank 1 is loading/writing testVec from from 0..2000 due to the
>>>>> bounds in your for loop.  I'm guessing that you want rank 1 to load
>>>>> from 1001..2000 instead, so replace
>>>>>
>>>>>     for (int i=0; i<1000+1000*commRank; i++)
>>>>>
>>>>> with
>>>>>
>>>>>     for (int i=1000*commRank; i<1000+1000*commRank; i++)
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Jonathan
>>>>>
>>> Truman,
>>>
>>>    Ok, I completely misunderstood your original email.  Hopefully one
>>>of
>>> the I/O developers can chime in here.
>>>
>>> Jonathan
>>>
>>> _______________________________________________
>>> Trilinos-Users mailing list
>>> Trilinos-Users at software.sandia.gov
>>> https://software.sandia.gov/mailman/listinfo/trilinos-users
>> _______________________________________________
>> Trilinos-Users mailing list
>> Trilinos-Users at software.sandia.gov
>> https://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>
>_______________________________________________
>Trilinos-Users mailing list
>Trilinos-Users at software.sandia.gov
>https://software.sandia.gov/mailman/listinfo/trilinos-users



More information about the Trilinos-Users mailing list