computationalradiationphysics / libsplash Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 15.0 11.01 MB

libSplash - Simple Parallel file output Library for Accumulating Simulation data using Hdf5

License: GNU Lesser General Public License v3.0

C++ 92.06% C 0.37% Shell 0.73% CMake 3.25% Python 3.58%

libsplash's People

Contributors

Stargazers

Watchers

Forkers

f-schmitt ax3l prometheuspi c-schumann-zih beyondespresso slizzered eucall-software psychocoderhpc openpmd flamefire abukva c0nsultant eminsight zwghit sbastrakov

libsplash's Issues

Representation of Bool Types

@PrometheusPi reported a problem reading particle bool attributes with h5py in ComputationalRadiationPhysics/picongpu#682.

The problem is of course, that HDF5 has no native boolean type why we implemented a 1 8 byte (?) bitfield as a work-around.

h5py instead uses HDF5 enums as a work around that will then be directly mapped to a 1-byte numpy bool while reading, see also the HDF5 Enum Docs (section 8).

The idea now is to also switch to that representation to increase compatibility (that will increase the internal file format version).

Fix `open` interface for PDC and support virtual MPI positions

Related to a misunderstanding in #127

add a unified header file

add a unified header file (splash.h) which allows to test on parallel IO support

Travis Broken

I'll have to fix the travis support.
Seem that the line sudo apt-get install -qq libcppunit-dev libboost-program-options-dev $APTMPI $APTHDF5 is broken, investigating...

Remove requirement to link with shared library version of HDF5

we should also support linking with HDF5 statically

parallel read fails if offset is beyond dataspace

In principle, invalid offsets are bound to fail. However, zero reads must be allowed with invalid offsets.

splash2xdmf: Filename (without -t)

Cross posting of ComputationalRadiationPhysics/picongpu#330 since the grid/poly file splitting already happens in splash2xdmf.py.

Support SPLASH_VERBOSE env variable

Merge fixes from parallel to master

Merge bug fixes and examples from parallel to master branch
assess if complete merge reasonable

Header files / Includes

During the deployment of CPack routines in #70 I realized a slight problem with the "huge number" (>1) of include files we produce.

Technically, a user only sees splash.h but of couse this one pulls more header dependencies in. If I would like to install libSplash the clean way, I would specify a prefix like /usr (that's the default).

By that, our two libraries (.so, .a) go to /usr/lib/, our binaries (splashtools, later splash2txt) to /usr/bin and our headers/includes to /usr/include.

So far, so nice.

Unfortunately, that introduces the risk of header file name collisions in /usr/include. As a work around, boost uses a separate folder like boost/allHeaders.h (that is of course kind of a namespace ansatz).

I tried to realize this during install time, which is possible. But since we mess a little bit with tests, examples and real tools (some depend on already build libs, other get build directly out or their sources) one must technically also move the source tree:

src/include/* -> src/include/splash/

to reflect that and one hast to change all internal includes to #include "splash/..." to be consistent.

I know it's a major change but I suggest to change that the earlier the better.

rename DCollector:: namespace to splash::

Data Format: Version Number

How about adding a simple data format version to version.hpp?
The change in a version in libSplash does not necessarily mean (in)compatible hdf5 files...

We could increment it (e.g. start with 1 in splash 1.1) for any incompatible changes that will occur later on.

We could also use major and minor for compatible and incompatible data format changes, but maybe just a single number is enough.

This data format version should be added as standard meta data to each data file created with libSplash.

Update documentation

especially for parallel I/O

Create Fortran bindings

Create Fortran bindings to make libSplash usable in Fortran programs

Re-enable Parallel_SimpleDataTest

Parallel_SimpleDataTest has been disabled and is currently not tested

FindSplash module

Add a FindSplash.cmake Module for CMake's find_package.
This module should find Splash and set some useful but space-wasting variables like SPLASH_VERSION, SPLASH_FORMAT, SPLASH_PARALLEL, ... in CMake.

Copied to e.g. <INSTALL PATH>/cmake, it is found if $SPLASH_ROOT(/bin) is in $PATH: doc

Btw: I would prefer "talking" (doc) about Splash rather than libSplash since else we should call our libraries liblibsplash.so :)

Clarify and reduce interfaces

Some write/read interfaces are quite long and have several overloaded versions. To increase usability, use helper/meta classes like Domain and Selection to reduce the number of required parameters and avoid confusing their order. Moreover, this will reduce the number of required overloaded functions.

Benchmarks

Let's perform some Benchmarks.

At least on our Panasus (hypnos) and on taurus (lustre?).

They should end up like usual (parallel-)hdf5 benchmarks.

HDF5 1.8.12

HDF5 1.8.12 is out - the world is a better place now! ✨

Do some of these changes affect us?

writeDomain -> localDomain

Shouldn't this entry in localDomain come from the Selection input?
like this

Domain localDomain(Dimensions(0, 0, 0), select.size);

~~Furthermore, shouldn't the offset be the offset of the local domain within the globalDomain?~~ that does not make sense for a ParallelDomain write so (0,0,0) is fine I guess.

collective read in ParallelDataCollector hangs

If N processes call collective ParallelDataCollector::read(...) and some read zero amount of data the read call hangs.
This is the case when all process which read zero amount of data set sizeRead to zero and the pointer buf to NULL. e.g.

void read(100, "dataName", Dimensions ( 0, 0, 0), NULL)

The BUG is triggered by this check because not all read calls arrive H5Dread. This is not allowed if collective operations are used.

Requests for version 1.2 of libSplash

@psychocoderHPC , @ax3l , @bussmann

I would like to brainstorm new ideas, feature requests for the next version of libSplash.

Some ideas:

do we want to maintain serial + parallel version?
do we want to maintain writing data without domain information?
which features are required/nice for reading the data for analysis fast and easy?
do we want asynchronous write to memory (extensive caching)
which meta-information should be added by default?
which tools should come along with libSplash?

Performance Tuning: MPI_Info and H5P Options

We should make use of a MPI_Info object.

Some ressources:

HLRS/Jaguar Wiki (MPI_Info)
ORNL Lustre Basics (striding)
Jaguar Lustre example (old MPI_Info flags, includes H5Pset_sieve_buf_size and H5Pset_alignment tweaks, see this talk)
H5Pset_meta_block_size might be worth a look, too
~~Tacc Parallel I/O Workshop (page 56ff. PHDF5/MPI_Info)~~
Uni Delaware (prefers infos striping_factor/striping_unit)
Tips on Cray XT5, Cray XE6, Cray Lustre p 56. (stating direct_io should be rathe true; env variable MPICH_MPIIO_HINTS instead of coding; stating to use cb_align 2)
hdf5group: PHDF5 hints
T3PIO: Auto-set MPI_Info for Lustre+MPI-I/O (=PHDF5), slides
remove the fill_value via NULL H5Pset_fill_value and H5D_FILL_TIME_NEVER
GPFS: H5Pset_alignment to disk block size reported to improve performance, also IBM_largeblock_io=true for MPI hints in H5Pset_fapl_mpio
h5perf: user guide

Hacking

the PDC constructor allows for a MPI_Info object already ~~(assumes MPI_INFO_NULL)~~ takes the user input from the constructor

Help for test run scripts

If $build and $1 are not set, a small help text would be useful.

Prepare Release 1.2.3

@ax3l Any concerns over releasing this as version 1.2.3?

Need to update version header.
No update of file format required.

listFilesInDir BUG

There is a Bug in ParallelDataCollector in getMaxID(), it always return zero.

I add some more debug output and see that the bug is in listFilesInDir().
This are my debug changes (only the log_msg)

// extract id from filename
int32_t id = atoi(
     fname.substr(name.size(), fname.size() - 3 - name.size()).c_str());
ids.insert(id);
 log_msg(2, "add file to max id search list %s with id %i", fname.c_str(),id);

I get this output

[1,1]<stderr>:[SPLASH_LOG:1] add file to max id search list h5_500.h5 with id 0
[1,1]<stderr>:[SPLASH_LOG:1] add file to max id search list h5_0.h5 with id 0
[1,1]<stderr>:[SPLASH_LOG:1] add file to max id search list h5_1000.h5 with id 0

There is something wrong inside the parameter calculation for atoi()

h5utils: Document Some Use Cases

The debian package h5utils contains some interesting tools.

It might be useful to play around with them on our data sets and to document some useful applications.

quote:

h5topng, which extracts a 2d slice of an HDF5 file and outputs a corresponding image in PNG format
h5totxt, which extracts 2d slices and outputs comma-delimited text (suitable for import into a spreadsheet)
h5fromtxt, which converts simple text input into multi-dimensional numeric HDF5 datasets
h5fromh4, which converts HDF4 data to HDF5
h5tovtk, which converts HDF5 files to VTK files for visualization with VTK-aware programs
h5read, a plugin for the Octave numerical language

domainOffsets: without globalDomOffset

@f-schmitt-zih @psychocoderHPC
We were discussing if it would be more useful to define the domainOffset starting with (0,0,0) from the globalDomOffset.

That would "extract" the globalDomOffset as an meta attribute for the whole domain, for example if I want to follow a moving window and I need an absolute position.

testIntersection would have to read the globalDomOffset in this case.

An other nice fact would be, that we store particles this way right now, because it is the intuitive way of seeing a moving simulation window in post-processing (adding again the global offset for an absolute position is also possible, but a little bit less often used).

CMakeLists.txt: Use HDF5_IS_PARALLEL

According to FindHDF5.cmake we should remove
OPTION(PARALLEL "enable parallel MPI I/O" OFF)
and check for
HDF5_IS_PARALLEL - Whether or not HDF5 was found with parallel IO support
from FIND_PACKAGE(HDF5)

Easy and nice.

Fix DCDataSet::read to properly support reading zero data

Describe HDF5 Structure

We should somewhere describe our generic HDF5 structure we create.

Even if we prefer using ~~lib~~Splash for reading and writing, it's a good manner to transparently allow other HDF5 readers/writers to jump in.

Tool: png to splash

A tool reading a png picture and translating it in a scalar 2D grid entry would be really useful (e.g. to start simulations with a gas profile like the PIConGPU logo).
Option: transform to 3D field and clone in one direction.

Add an explicit cleanup/free to SDC/PDC

An explicit point to free all allocated resources is beneficial. Otherwise, there is no nice way to free the internal MPI_Comm for a stack-allocated PDC in main before the user calls MPI_Finalize.

CMake: Add CPack Routines

At least for .deb packages, yeah 💃

Getting started: http://www.cmake.org/Wiki/CMake:Packaging_With_CPack
Example: http://www.cmake.org/Wiki/CMake/CPackExample
Deb, RPM & OSX: http://www.cmake.org/Wiki/CMake:CPackPackageGenerators#DEB_.28UNIX_only.29

Add this packages as binary downloads to each new libSplash release.

Not possible to write empty data with NULL pointer

All write methods (e.g. writeDomain) not support to write empty data and use a NULL pointer as data pointer.

Error:

[1,15]:terminate called after throwing an instance of 'DCollector::DCException'
[1,15]: what(): Exception for SerialDataCollector::write: a parameter was NULL

Add examples to .travis.yml

Build & test the examples during travis ci runs, too.

Reading Domains: Get Type

How to compare a stored type annotation to an according DCollector::CollectionType again?

DomainCollector::DomDataClass data_class;
DataContainer *particles_container =
    dataCollector.readDomain(simulationStep,
                             name.c_str(),
                             domain_offset,
                             domain_size,
                             &data_class);

DomDataClass::getDataType() returns a H5DataType (not useful, because we only know DCollector::CollectionTypes)

How can I find my float/ColTypeFloat or double/ColTypeDouble from data_class?

Add support for n-dimensional datasets

1-3D --> nD

Duplicate user MPI communicator for ParallelDataCollector

to prevent interference with user-app MPI messages

Parallel Write Example

I would like to add at least a parallel write example for the sandbox MD simulation (= particles only) I wrote here.

We can test directories for ghost particles (= guards).
We can add some cell-attributes in that simulation to simulate grids, too.
We can add an easy-to-use example for analysis and visualisation of the data.

2D Data Set in 1D Topology

one of the interfaces of writeDomain in at least the parallel domain collector is broken for the following use case:

writing a 2D grid with a 1D topology with "moving window"

Apply the following patch to a checked out version of v.1.1.1 to test. I assume the one overloaded member in v.1.2.0 will also be affected and the error may relies at ParallelDataCollector::gatherMPIWrite:

diff a/examples/2Din1Dtop/2Din1Dtop.cpp b/examples/2Din1Dtop/2Din1Dtop.cpp
new file mode 100644
index 0000000..db6f21c
--- /dev/null
+++ b/examples/2Din1Dtop/2Din1Dtop.cpp
@@ -0,0 +1,70 @@
+// Copyright 2014 Axel Huebl
+//
+// LGPL
+//
+// In this example I am going to write a 2D data set
+// which is distributed over a 1D MPI topology
+
+#include <mpi.h>
+#include <splash/splash.h>
+
+int main()
+{
+    using namespace splash;
+    MPI_Init(NULL, NULL);
+
+    int size, rank, vrank;
+    MPI_Comm_size( MPI_COMM_WORLD, &size );
+    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
+    // "simulate" a moving window
+    vrank = ( rank + 3 ) % size;
+
+    {
+    ParallelDomainCollector pdc(
+      MPI_COMM_WORLD, MPI_INFO_NULL, Dimensions(size, 1, 1), 10 );
+
+    /* use a shifted "virtual rank" for data distribution */
+    DataCollector::FileCreationAttr fAttr;
+    Dimensions mpiPosition( vrank, 0, 0 );
+    fAttr.mpiPosition.set( mpiPosition );
+
+    /* open file */
+    pdc.open( "testDomain", fAttr );
+
+    /* write my virtual rank -> output should be purely ascending */
+    const int numVal = 2;
+    ColTypeFloat ctFlt;
+    float a[] = {(float)vrank, (float)vrank};
+
+    /* sizes and offsets, naming conventions see
+       https://github.com/ComputationalRadiationPhysics/picongpu/issues/128#issuecomment-41366257 */
+    Dimensions localDomainSize( 1, numVal, 1 );
+    Dimensions localDomainOffset( vrank, 0, 0 );
+
+    Dimensions globalDomainSize( size, numVal, 1 );
+    Dimensions globalDomainOffset( 0, 0, 0 );
+
+    /* call write routine */
+    pdc.writeDomain( 0, /* time step */
+                     ctFlt,
+                     2, /* 2D data set */
+                     localDomainSize,
+                     "myfield",
+                     localDomainOffset, /* ignored anyway ... :( */
+                     localDomainSize,   /* ignored anyway ... :( */
+                     globalDomainOffset,
+                     globalDomainSize,
+                     DomainCollector::GridType,
+                     a );
+
+    /* close and return */
+    pdc.close();
+    }
+
+    int fin;
+    MPI_Finalized( &fin );
+    if( !fin )
+      MPI_Finalize();
+
+    return 0;
+}
diff a/examples/2Din1Dtop/testOutput.py b/examples/2Din1Dtop/testOutput.py
new file mode 100755
index 0000000..ab41370
--- /dev/null
+++ b/examples/2Din1Dtop/testOutput.py
@@ -0,0 +1,15 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright 2014 Axel Huebl
+#
+# LGPL
+#
+
+import h5py as h5
+
+f=h5.File("testDomain_0.h5", "r")
+data=f['data/0/myfield']
+
+print "shape", data.shape
+print "data", data[:,:]
diff a/examples/CMakeLists.txt b/examples/CMakeLists.txt
index a13f5e8..53c1105 100644
--- a/examples/CMakeLists.txt
+++ b/examples/CMakeLists.txt
@@ -39,7 +39,7 @@ SET(CMAKE_BUILD_TYPE Debug)
 OPTION(WITH_MPI "build MPI examples" OFF)

 SET(EXAMPLES domain_read/domain_read)
-SET(MPI_EXAMPLES domain_read/domain_read_mpi domain_write/domain_write_mpi)
+SET(MPI_EXAMPLES domain_read/domain_read_mpi domain_write/domain_write_mpi 2Din1Dtop/2Din1Dtop)

 FOREACH(EXAMPLE_NAME ${EXAMPLES})
     SET(EXAMPLE_FILES "${EXAMPLE_FILES};${EXAMPLE_NAME}.cpp")

to test:

build examples
mpirun -n 4 2Din1Dtop.cpp.out
~/src/libSplash/examples/2Din1Dtop/testOutput.py

output (wrong)

shape (2, 4)
data
  [[ 3.  0.  1.  2.]
   [ 3.  0.  1.  2.]]

output (should be)

shape (2, 4)
data
  [[  0.  1.  2. 3. ]
   [  0.  1.  2. 3. ]]

fix link to fusionforge in UserManual.tex

Hello,

in here is still the link to the old source code.

Change it to git and github.

Cheers,
Anton

Versioning, CMake Finds and Release Strategies

Version.hpp

It would be a good idea to create version file like boost/version.hpp

This would enable users to check for the version number with cmakes VERSION_LESS compare operator, as seen in cmake-2.X/Modules/FindBoost.cmake, for upcoming interface changes and releases.

Shall we write a FindSplash.cmake module, too?

Yes, probably one day. But thats an other topic.

Now you ask: Who updates that version number?

Glad you asked! Well there a two ways: one way would be the

old school way

of putting the new version number in every time you push to master (e.g. major.minor.patchlvl)

The

year 2013 way

would be not to commit to the master on a daily basis at all, but to consider every pull request/commit to master and its resulting merge a release. (And therefore to develop in a separate dev branch.)

We try to follow that strategy kind-of in PIConGPU, but the right way to do it is described in a-successful-git-branching-model.
Anyway, that method allows to set up a hook for post-commits to master. Sadly, this will only work client-side or via a double-commit, since GitHub only supports Post-Receive hooks (for obvious security reasons with pre-receive hooks).

Make step group optional within a file

Allow user to change the internal file structure to avoid the timestep group.
Required to use splash files with VisIt.
Add an internal flag to state that this group is missing to allow transparent reading of such files.

hdf5 dimension labels and bin values

HDF5 seems to support dimension scales and labels [1]. Is this already included in libSplash?
If no, do you think this might be a good feature to add?
I think, this would be extremely useful for documenting physics quantities stored in arrays.

I have not found this option in the HDF5 documentation, but in some tutorials [2] of the hdf5 group.
Analyzing *.h5 files with such scales using h5dump looks like scales are actually attributes.

DATASET "data" {
      DATATYPE  H5T_IEEE_F32LE
      DATASPACE  SIMPLE { ( 4, 3, 2 ) / ( 4, 3, 2 ) }
      DATA {
      (0,0,0): 1, 1,
      (0,1,0): 1, 1,
      (0,2,0): 1, 1,
      (1,0,0): 1, 1,
      (1,1,0): 1, 1,
      (1,2,0): 1, 1,
      (2,0,0): 1, 1,
      (2,1,0): 1, 1,
      (2,2,0): 1, 1,
      (3,0,0): 1, 1,
      (3,1,0): 1, 1,
      (3,2,0): 1, 1
      }
      ATTRIBUTE "DIMENSION_LABELS" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
         DATA {
         (0): "z", NULL  , "x"
       }
      ATTRIBUTE "DIMENSION_LIST" {
         DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
         DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
         DATA {
         (0): (DATASET 8560 /z1 ), (DATASET 8288 /y1 ),
         (2): (DATASET 1536 /x1 , DATASET 1808 /x2 )
         }
      }
   }

[1] http://docs.h5py.org/en/latest/high/dims.html
[2] http://www.hdfgroup.org/HDF5/Tutor/h5dimscale.html

Auto XDMF Description

We could try to write an xdmf file during DataCollector.close() that describes all the written hdf5 data sets.

It might be an option to automatically remember all the attributes we wrote since the DataCollector was opened.

Positive effect: this feature would allow us to use the native hdf5/xdmf readers of tools like VisIt and ParaView.

Doc: IDomainCollector -> writeDomain -> id

Doc method writeDomain param id: read from ? ... write to

Resizable datasets used always

It seems as libSplash uses resizable datasets in any case. This might be good for data that might change size but not for data with fixed size (e.g. magnetic field data). Always allowing resizable datasets might cost performance.

For information on resizable datasets see [1].

As an example in PIConGPU see the following h5ls -r *.h5 dump:

/                        Group
/custom                  Group
/data                    Group
/data/2000               Group
/data/2000/fields        Group
/data/2000/fields/Density_e Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/Density_i Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/EnergyDensity_e Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/EnergyDensity_i Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB Group
/data/2000/fields/FieldB/x Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB/y Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB/z Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE Group
/data/2000/fields/FieldE/x Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE/y Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE/z Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/particles     Group
/data/2000/particles/e   Group
/data/2000/particles/e/globalCellIdx Group
/data/2000/particles/e/globalCellIdx/x Dataset {29491200/Inf}
/data/2000/particles/e/globalCellIdx/y Dataset {29491200/Inf}
/data/2000/particles/e/globalCellIdx/z Dataset {29491200/Inf}
/data/2000/particles/e/momentum Group
/data/2000/particles/e/momentum/x Dataset {29491200/Inf}
/data/2000/particles/e/momentum/y Dataset {29491200/Inf}
/data/2000/particles/e/momentum/z Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1 Group
/data/2000/particles/e/momentumPrev1/x Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1/y Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1/z Dataset {29491200/Inf}
/data/2000/particles/e/particles_info Dataset {32/Inf}
/data/2000/particles/e/position Group
/data/2000/particles/e/position/x Dataset {29491200/Inf}
/data/2000/particles/e/position/y Dataset {29491200/Inf}
/data/2000/particles/e/position/z Dataset {29491200/Inf}
/data/2000/particles/e/weighting Dataset {29491200/Inf}
/data/2000/particles/i   Group
/data/2000/particles/i/globalCellIdx Group
/data/2000/particles/i/globalCellIdx/x Dataset {29491200/Inf}
/data/2000/particles/i/globalCellIdx/y Dataset {29491200/Inf}
/data/2000/particles/i/globalCellIdx/z Dataset {29491200/Inf}
/data/2000/particles/i/momentum Group
/data/2000/particles/i/momentum/x Dataset {29491200/Inf}
/data/2000/particles/i/momentum/y Dataset {29491200/Inf}
/data/2000/particles/i/momentum/z Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1 Group
/data/2000/particles/i/momentumPrev1/x Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1/y Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1/z Dataset {29491200/Inf}
/data/2000/particles/i/particles_info Dataset {32/Inf}
/data/2000/particles/i/position Group
/data/2000/particles/i/position/x Dataset {29491200/Inf}
/data/2000/particles/i/position/y Dataset {29491200/Inf}
/data/2000/particles/i/position/z Dataset {29491200/Inf}
/data/2000/particles/i/weighting Dataset {29491200/Inf}
/header                  Group

All datasets have the option to become infinitly large (maked by .../Inf).

With (parallel) hdf5 it should be possible to set fixed and arbitary sized datasets.
A python example to illustrate this is given here:

from mpi4py import MPI
import h5py

rank = MPI.COMM_WORLD.rank

print "Hello from processor {}".format(rank)

f = h5py.File('example_dataSize.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD)

f.create_dataset('dataset_fixed', (10,5), dtype='f')
f.create_dataset('dataset_variable1', (10,5), maxshape=(10,10), dtype='f')
f.create_dataset('dataset_variable2', (10,5), maxshape=(None,None), dtype='f')

f.close()

The corresponding hdf5 file looks like this when using h5ls -r *.h5:

/                        Group
/dataset_fixed           Dataset {10, 5}
/dataset_variable1       Dataset {10, 5/10}
/dataset_variable2       Dataset {10/Inf, 5/Inf}

Is there a reason to aways use arbitrary sized datasets?

[1] http://docs.h5py.org/en/latest/high/dataset.html#resizable-datasets

Releases: ChangeLog

Write a CHANGELOG.md for each new commit in master.
With a special highlight interface changes.

Commit/Pull in master from a intermediate release-XYZ branch -> tag it as vX.Y

Add support for 64bit integer basetypes

need support for 64bit integer basetypes (signed+unsigned)