[Trilinos-Users] [EXTERNAL] Re: Odd Behavior in EpetraExt_HDF5

Lofstead, Gerald F II gflofst at sandia.gov
Tue Sep 16 14:24:25 MDT 2014


Thanks for pointing out a standard change from when last I looked at it. Addresses of references were a bad idea for a long time.

Jay

From: Truman Ellis <truman at ices.utexas.edu<mailto:truman at ices.utexas.edu>>
Date: Tuesday, September 16, 2014 2:22 PM
To: Jay Lofstead <gflofst at sandia.gov<mailto:gflofst at sandia.gov>>, "trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>" <trilinos-users at software.sandia.gov<mailto:trilinos-users at software.sandia.gov>>
Subject: Re: [EXTERNAL] Re: [Trilinos-Users] Odd Behavior in EpetraExt_HDF5

The C++ standard guarantees that vector will store data contiguously:

23.2.6 Class template vector [vector]

1 A vector is a sequence container that supports random access iterators. In addition, it supports (amortized) constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage management is handled automatically, though hints can be given to improve efficiency. The elements of a vector are stored contiguously, meaning that if v is a vector where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

On 9/16/14, 3:16 PM, Lofstead, Gerald F II wrote:

Someone with Epetra HDF experience will have to help.

This does not change the inherent inappropriate &testVec [0]. Yes, it will
work in many cases, but as compiler versions change or you change
platforms, it may not work giving you  subtle bug.

Best,

Jay

On 9/16/14 2:11 PM, "Truman Ellis" <truman at ices.utexas.edu><mailto:truman at ices.utexas.edu> wrote:



This is a STL vector. I am relying on the fact that vectors store data
contiguously. There is no error when I remove the Trilinos code. The
point of this example was to demonstrate that the EpetraExt_HDF5 code
hangs when two processors try to write data of different size. The stall
is not caused by extra time to write data. When both processors write
the larger amount (2000 doubles) it finished immediately. The stall
causes the program to never complete.

Truman
On 9/16/14, 2:52 PM, Lofstead, Gerald F II wrote:


Which vector is that? It looks like an C++STL vector rather than a
Trilinos vector (E/Tpetra). If so, I expect you can simplify this
removing
all of Trilinos and see the same behavior. Depending on the version of
C++
(c++11 is different), how you get the raw data differs. There is a
function added in C++11 called data() that will give you a pointer to
the
elements that would work for the code as written. What you have in
&testVec[0] is taking the address of a single element reference. Whether
or not that is part of the actual data storage in the vector is not
defined. For that matter, taking the address of a reference is not a
good
idea either. You need to use an iterator or upgrade to C++11 to get the
code you have written to work properly and portably. The stall I am not
sure why it is happening. If the sizes are what you expect, they may
just
be the amount of additional time to write the quantity of data. How long
is the stall compared to writing out from the base case (1000 elements)?

Jay

On 9/16/14 12:08 PM, "Jonathan Hu" <jhu at sandia.gov><mailto:jhu at sandia.gov> wrote:



trilinos-users-request at software.sandia.gov<mailto:trilinos-users-request at software.sandia.gov> wrote on 09/16/2014 11:00
AM:


Subject:
Re: [Trilinos-Users] Odd Behavior in EpetraExt_HDF5
From:
Truman Ellis <truman at ices.utexas.edu><mailto:truman at ices.utexas.edu>
Date:
09/15/2014 03:56 PM

To:
<trilinos-users at software.sandia.gov><mailto:trilinos-users at software.sandia.gov>


There isn't any distributed data in this example. I just wanted two
mpi processes to simultaneously write out two independent HDF5 files.
But I noticed that if the two HDF5 files were different sizes (1000
data items vs 2000 data items) then I got a stall. If they both write
data of the same size, then everything goes through.

On 9/15/14, 4:10 PM, Jonathan Hu wrote:


I am using the EpetraExt_HDF5 interface to save and load solutions,
but
I've run into some odd behavior and was wondering if anyone could
explain it. My goal is to have each processor write out its own part
of
the solution in a different HDF5 file. For the time being, I am
assuming
that the number of processors loading the solution is equal to the
number writing it. Since each processor is completely independent, I
shouldn't get any weird race conditions or anything like that
(theoretically). In order to communicate this to EpetraExt, I am
using a
Epetra_SerialComm in the constructor. However, the following code
hangs
when I run with 2 mpi nodes


{
     int commRank = Teuchos::GlobalMPISession::getRank();
     Epetra_SerialComm Comm;
     EpetraExt::HDF5 hdf5(Comm);
     hdf5.Create("file"+Teuchos::toString(commRank)+".h5");
     vector<double> testVec;
     for (int i=0; i<1000+1000*commRank; i++)
     {
       testVec.push_back(1.0);
     }
     hdf5.Write("Test", "Group", H5T_NATIVE_DOUBLE, testVec.size(),
&testVec[0]);
}
{
     int commRank =
Teuchos::Global_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>https://software.sandia.gov/mailman/listinfo/trilinos-usersMPISession
::
getRank();

     Epetra_SerialComm Comm;
     EpetraExt::HDF5 hdf5(Comm);
     hdf5.Open("file"+Teuchos::toString(commRank)+".h5");
     hdf5.Close();
}

Note that commRank 0 writes 1000 elements while commRank 1 writes
2000.
The code works just fine when both write the same number of
elements.
Can someone enlighten me on what I am doing wrong? Is it possible to
get
the behavior I want, where each processor's read and write is
independent of others?

Thanks,
Truman Ellis


Truman,

     Rank 1 is loading/writing testVec from from 0..2000 due to the
bounds in your for loop.  I'm guessing that you want rank 1 to load
from 1001..2000 instead, so replace

    for (int i=0; i<1000+1000*commRank; i++)

with

    for (int i=1000*commRank; i<1000+1000*commRank; i++)

Hope this helps.

Jonathan



Truman,

   Ok, I completely misunderstood your original email.  Hopefully one
of
the I/O developers can chime in here.

Jonathan

_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>https://software.sandia.gov/mailman/listinfo/trilinos-users

_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>https://software.sandia.gov/mailman/listinfo/trilinos-users

_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov<mailto:Trilinos-Users at software.sandia.gov>https://software.sandia.gov/mailman/listinfo/trilinos-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://software.sandia.gov/pipermail/trilinos-users/attachments/20140916/5fa1e8e9/attachment.html>


More information about the Trilinos-Users mailing list