simphony / simphony-common Goto Github PK

View Code? Open in Web Editor NEW

7.0 24.0 5.0 7.41 MB

The native implementation of the Simphony cuds objects

License: BSD 2-Clause "Simplified" License

Python 99.88% Batchfile 0.12%

multiscale-simulation simulation-framework

simphony-common's Introduction

Simphony-common

The native implementation of the SimPhoNy cuds objects and io code (http://www.simphony-project.eu/).

Repository

Simphony-common is hosted on github: https://github.com/simphony/simphony-common

Requirements

enum34 >= 1.0.4
stevedore >= 1.2.0
numpy >= 1.11.1
PyTables >= 3.2.3.1

Optional requirements

To support the documentation built you need the following packages:

sphinx >= 1.3.1
mock

Installation

The package requires python 2.7.x, installation is based on setuptools:

# build and install
python setup.py install

or:

# build for in-place development
python setup.py develop

Generation of EDM egg

An EDM egg can be generated with:

python edmsetup.py egg

the resulting egg will be left in the endist directory. Uploading to the repository is only possible from an Enthought jenkins build process. Automatic build of the eggs is performed when a branch (not a tag, due to jenkins github limitations) named release-<version>-<build> is created. If the build is successful, the package will appear in the enthought/simphony-dev repository shortly afterwards.

We recommend to leave these branches for future reference.

Testing

To run the full test-suite run:

python -m unittest discover -p test*

Documentation

To build the documentation in the doc/build directory run:

python setup.py build_sphinx

If you recreate the uml diagrams you need to have java and xdot installed:

sudo apt-get install default-jre xdot

A copy of the plantuml.jar needs also to be available in the :file:`doc/` folder. Running make uml inside the :file:`doc/` directory will recreate all the UML diagrams.

Note

One can use the --help option with a setup.py command to see all available options.
The documentation will be saved in the :file:`./build` directory.
Not all the png files of the UML diagrams are used.

Directory structure

There are four subpackages:

core -- used for common low level classes and utility code
cuds -- to hold all the native cuds implementations
io -- to hold the io specific code
bench -- holds basic benchmarking code
examples -- holds SimPhoNy example code
doc -- Documentation related files
- source -- Sphinx rst source files
- build -- Documentation build directory, if documentation has been generated using the make script in the doc directory.

SimPhoNy Framework

The simphony library is the core component of the SimPhoNy Framework; information on setting up the framework is provided on a separate repository https://github.com/simphony/simphony-framework. Observe that this repository is in deprecation stage, as we are moving toward Enthought Deployment Manager (EDM) based installation and deployment.

For Developers

The data structures used in this project are based on the metadata which is defined under ontology.

In order to reflect latest changes to the metadata repository, one should regenerate these entities. The generator is hosted in the repository simphony-metatools located at: https://github.com/simphony/simphony-metatools . The generator is used to recreate the python classes in simphony/cuds/meta. It is installed and invoked by the setup.py script.

simphony-common's People

Contributors

Stargazers

Watchers

Forkers

kemattil jeremy-rutman anshmania dodoistda jomoeller

simphony-common's Issues

Add a ABCCheckLatticeContainers testing template

This new class will be similar to the ABCCheckParticlesContainers template class and will allow easier integration and more reliable testing of the multiple implementations (native or adapter) of the ABCLattice api.

Build documentation automatically using readthedoc.org

Sphinx boilerplate .
ReadTheDocs setup.
Simphony-common implementation reference

use uuid for all adapters

Looking at the implementations of the various containers a common pattern is to generate an id using a random number generator and others do not even provide an id.

Based on the discussion in the simphony wiki high level containers should
(a) assign an id to items that are added when those items do not have an id assigned and
(b) use the id of items that have one assigned.

The main problem here is the fact that because (b) there is no control on how the ids are created. So each time case b happens an number of ids are randomly generated until one that does not conflict is found (This operation is very expensive for file or remote adapters).

Assume that there are two adapters which assign ids based on internal logic then we cannot be sure that the following will not happen:

mesh1 = adapter1.Mesh()
mesh2 = adapter2.Mesh()

point1 = Point(1, 1, 1)
point1.id = mesh2.add_point(point1)

point2 = Point(0.0, 0.0, 0.0)
point2.id = mesh1.add_point(point2)

# In the generic case where the adapters are not sharing the same process the following
# is not unlikely.
print point2.id != point2.id

I would suggest to force all adapters to use the RFC 4122 standard for creating new uuids. This way we can create new uuids without having to worry about identical ones (within practical limits). The RFC 4122 is supported in python (uuid package) and C/C++ so using it will not be a big problem. The only issue I can see is that we have to use 128 bits more for the objects that we instantiate in python, but I think the advantages on speed and code simplicity are going to counter this.

Use tuples vs lists for mesh items where possible

Using tuples for items like coordinates or points and other items that have a known size could simplify the implementation. The reason is mainly the fact that when using immutable types like tuples we do not have to worry when we copy data by reference from one Point or Edge. in general using tuples will make the code a little cleaner and depending on the usecase use fewer resources. Using tuples is not a free meal since operations that require manipulating part of the tuple are much more cumbersome, yet I think that the coordinates and pointsattributes are a good candidate for tuples.

rename ABCParticleContainer to ABCParticles

Nature of a Boundary Condition

Here's something for you to comment:

I would say that we can specify the type of the boundary condition as some kind of agreed key instead of trying to describe the equational form of the exact condition. The former is by far easier and for the same type of a boundary there may be several ways to implement it. For example, the wall condition for the turbulent kinetic energy is usually either a (nonlinear) Dirichlet type condition or a no-flux (homogeneous Neumann) condition with special boundary layer source term. The exact implementation should be irrelevant for the user, hence the same key.

The boundary condition, of course, needs the specification of the boundary... and we have the uuids. For me, it would make most sense to give the uuids of the boundary faces (in 3D), since the whole boundary is then uniquely defined and from the faces it is easy to get all the nodes or edges. Anyway, a list (or something) of uuids should be sufficient.

Then the values. I have understood that some of the codes like to have the nodal values specified. OpenFOAM, on the other hand, needs the face center values. As a generalization, the boundary condition should be able to give a value for any specified point on the boundary. We probably could make value-giving classes (for possible multiple values needed by the boundary condition) inside boundary condition class, having build in value interpolation routines from nodal or face centered data to any point. The point specification could be barycentric coordinates of an edge or face and the corresponding uuid. The barycentric coordinates are easily transformed (also build in) to ordinary eulerian coordinates, to support analytical expressions, for example.

The variable for which the boundary condition is applied could be given as CUBA key(s).

What slightly bothers me with all this is that when several boundary conditions act on the same boundary for different variables, we end up repeating the same list of uuids to specify the boundary. To avoid this, we would need separate entieties for boundary and boundary condition.

Cannot reliably store uuids using the `bytes` attribute

@roigcarlo, has seen intermittent problems with using the bytes attribute for recovering the uuid stored as a 16 character byte string. Please find below the analysis of the issue.

After some digging I ended up in this situation. Consider this fragment of execution:

print "Stored:\t \\x" + '\\x'.join(x.encode('hex') for x in cell.uuid.bytes), len(cell.uuid.bytes)
row['uuid'] = cell.uuid.bytes

# some useful code here

try:
    print "Retrieved:\t \\x" + '\\x'.join(x.encode('hex') for x in row['uuid']), len(row['uuid'])
except ValueError:
    import pdb
    pdb.set_trace()

A possible output in case the error raises is:

Stored:     \xab\xbe\x19\x39\x72\x24\x4e\x68\x92\x44\x21\xc6\x00\x26\xe0\x00 16
Retrieved:  \xab\xbe\x19\x39\x72\x24\x4e\x68\x92\x44\x21\xc6\x00\x26\xe0 15

In this case is clear that either pytables or python is just ignoring \x00, probably as is a null character at the end of the string.

But this is not the only time when such a error happens:

Stored:      \x2c\xc0\x1e\xd0\x0f\x0f\x4f\x1c\xb1\x68\x22\x69\xae\x5d\x10\x9a 16
Retrieved:   \x2c\xc0\x1e\xd0\x0f\x0f\x4f\x1c\xb1\x68\x22\x69\xae\x5d\x10\x9a 16

Here the length is the same and it does not appear to be any problem, but looking at the character value ( notice that i will use <> to delimit a char ):

(Pdb) p row['uuid']
<,> <\xc0> <\x1e> <\xd0> <\x0f> <\x0f> <O> <\x1c> <\xb1> <h> <"> <i> <\xae> <]> <\x10> <\x9a>

In this case I assume that probably <"> is causing again the string to be miss-parsed by pyTables. I'm not aware of the internal implementation of UUID, but in the doc it suggest that

UUID(bytes='\x12\x34\x56\x78'*4)

Is accepted, so i guess that what is happening is that our value is actually interpreted as an string due to the format in which is stored. and hence pared by python.

So, my suggestion would be, either to use hex, since this seems not to cause any disturbance, or stop using

tables.StringCol(16, pos=0)

Maybe it is interesting to create an Issue, since Lattice and ParticleContainer IO will also face this problem once they implement UUID for FileIO

Add sphinx documentation infrastructure

CUBA keywords need probably to be grouped`

It looks that some CUBA keywords can only have meaning as part of a specific component (e.g.
keywords BOND_LABEL and BOND_TYPE probably only make sense for a Particle).

Having one DataContainer type might be convenient now but having all the keywords supported when they are not necessary is performance-wise inefficient and makes bad use of memory and hard-disk space. We should also consider that it can lead to confusions (e.g. a mesh cell with SIMULATION_DOMAIN_DIMENSIONS does sound a little weird).

We should probably use a subtype of the DataContainer for the different components where such an object is used. The implementation could be as simple as binding each DataContainer subtype into a specific CUBA keyword group.

Nevertheless, I believe that this avenue needs to be discussed after the current first phase of the api implementation is delivered.

Missing import in bench module

As I found there has been a refactoring process in place to rename cuds_file.py to h5_cuds_file.py a month a go. Apparently the imports in bench modules are missing this change. The incorrect lines are:

from simphony.io.cuds_file import CudsFile
from simphony.cuds.particle import Particle

Moreover these are not covered in tests so nobody has realized this.

Saving and recovering uuids into hdf5

The main issue we are trying to solve here is how to store the uuid 128 bit value into a pytables table

add ABCModelingEngine to simphony/cuds

https://publicwiki-01.fraunhofer.de/SimPhoNy-Project/index.php/Wrapper_description#Interface_for_wrapper_to_a_modeling_engine

Integrate DataContainer into FileParticleContainer

Implementation of FileMesh class

"Task 2: Definition of hdf5 file format for storing mesh. Implementation of using the defined hdf5 file format to read/write mesh "

Consistent identification (id/uuid) for parameters/attributes in Particle/Point/etc

In Mesh and ParticleContainer hold entities (e.g. Particles, Points, Edges) which have a means of identification. That identification is used when getting, updating or adding such entities to the high-level objects (Mesh, ParticleContainer).

This identification should have a consistent name across the classes. Currently, id is used as an attribute to Particle/Point/Edge/Cell.
However in the init methods there are currently some differences:

id is used as a parameter in Particle.
uuid is used as a parameter in Point, Edge, Face, and Cell

Example:

from simphony.cuds.mesh import Point
from simphony.cuds.particles import Particle

particle = Particle(id=None, coordinates=(0.0, 0.0, 0.0))
point = Point(uuid=None, coordinates=(0.0, 0.0, 0.0))

Add a ABCCheckMeshContainers testing template

This new class will be similar to the ABCCheckParticlesContainers template class and will allow easier integration and more reliable testing of the multiple implementations (native or adapter) of the ABCMesh api.

Inconsistent behaviour of methods responsible for adding low-level objects

Currently in the master branch, we have some inconsistencies related to adding low-level objects (like Particle, Point).

In the master branch, we have 3 concrete implemenations (2 for Particles and 1 for Mesh). Each of the add_XXX methods for these behave slightly different. See the following example:

from __future__ import print_function
from simphony.io.cuds_file import CudsFile
from simphony.cuds.particles import Particle, ParticleContainer
from simphony.cuds.mesh import Mesh, Point

cuds_file = CudsFile.open("dummy.cuds")

file_pc = cuds_file.add_particle_container("test")
pc = ParticleContainer()
mesh = Mesh()

particle_a = Particle(coordinates=(1.0,1.0,1.0))
particle_b = Particle(coordinates=(1.0,1.0,1.0))
point_c= Point(coordinates=(1.0,1.0,1.0))

# Case A
print(particle_a.id) # None
id_a = file_pc.add_particle(particle_a)
print(id_a) # the newly generated id
print(particle_a.id) # None

# Case B
print(particle_b.id) # None
id_b = pc.add_particle(particle_b)
print(id_b) # None
print(particle_b.id) # the newly generated id

# Case C
print(point_c.uuid) # None
id_c = mesh.add_point(point_c)
print(id_c) # # the newly generated id
print(point_c.uuid) # the newly generated id

DOC: update documentation to use the SimPhoNy logo

add 'name' attribute to high-level cuds objects

Each high-level cuds object (ParticleContainer, Mesh, Lattice) would have a name attribute

Support for uuids in the Particles, Bonds and ParticleContainer

The current implementation of Particles, Bonds and PartcileContainer, does not support uuids

Add flake8 checks during the travis-ci test builds

CUDS api extension: Shared data

There is a need to share common data within groups of cuds components especially sharing some of the values in data containers. This is expected to improve memory usage.

Tag release 0.1

Provide documentation on adding engine plugins

Tests: Lattice tests do not set the size attribute correctly

The lattice tests for the lattice factory functions set the size of the lattice defined in 3D to a 2D size (see
https://github.com/simphony/simphony-common/blob/master/simphony/cuds/tests/test_lattice.py#L58)

I also think that the factory tests need to be a little more thorough (e.g. test that number of lattice nodes is the expected one).

Make a feature release

The following tasks are the necessary for make a new feature release of simphony-common

Make a release preparation PR to:
- update the version in setup.py to the release version e.g. 0.1.0.dev0 -> 0.1.0. It should probably be updated in the conf.py for the documentation.
- Update changes in the CHANGES.txt (if there is no such file we need to create it). See example in https://github.com/enthought/traits-enaml/blob/master/CHANGES.txt
As soon as the preparation PR is merged make the release in github (see https://help.github.com/articles/creating-releases/)
make a post release PR to:
- update the version in setup.py and other places to the next milestone version e.g. 0.1.1.dev0
Sent an e-mail to collaborators to notify of the new release attaching the changes that it brings to the library.

HDF5 Storing the `data` of higher level objects like PartcileContainer

from @nathanfranklin:

How will we store the ParticleContainer's data (or the data member of any high-level CUDS instances)? Will we place those in a separate single-row table (e.g. at /particle_container/foo/data):

MaterialType	Label
"MyType"	"my label"

Or can we just use the HDF5 attributes of the /particle_container/foo/ Node to store the contents of the data DataContainer. Here, I am unclear about the limitations of using HDF5-attributes.

Any thoughts on this part of the issue (storing ParticleContainer's data)?

Provide methods to query the numbers of items in the cuds container

For example:

mesh = Mesh()

# return the number of points
mesh.number_of_points()  

# return the number of edges
mesh.number_of_edges()

CUDS: Type of lattice is not properly described

It is not clear what are the accepted string values for the type attribute in the CUDS Lattice container.
From the factory functions in https://github.com/simphony/simphony-common/blob/master/simphony/cuds/lattice.py we find four of the possible values, but the full set of acceptable values should be defined in ABCLattice. One might consider that it would be better if the bravais type was given as an Enum.

consistent naming of classes in H5IO

we currently have:
H5CUDS
H5Particles
FileMesh
FileLattice (in PR)

We should rename things so they are consistent.

The native python implementation for the Particles container does not support properly the `data` attribute

data should probably be a property so that getting and setting the information take place through a copy.

The native python implementation of the Lattice does not support the `data` attribute

Fix documentation mistakes in Mesh and FileMesh

It has been reported that there are some mistakes in the mesh and provably file mesh documentation

ids vs references

Looking at the implementations there are a number of problems and potential issues with having references. I would like to reopen the discussion regarding the use of ids vs references in the Bond and Element like objects (i.e. Face, Edge).

Please find bellow a summary of the problems (I will be using the Edge as example)

Lets assume that we have the following points and edges

points = [Point(0.0, 0.0, z) for z in range(3)]
edge1 = Edge(points=[points[0], points[1]])
edge2 = Edge(points=[points[1], points[2]])

Adding the edges will add points multiple times.

mesh = Mesh()

# This operation will add points[0] and points[1]
edge1.id = mesh.add_edge(edge1)

# points[0] and points[1] now have a non-None id.

# This operation will add points[1] and points[2]
edge2.id = mesh.add_edge(edge2)

So the above code will attempt add points[1] multiple times. The first time an id will be assigned and
the second time (assuming that update is allows). The Mesh container has two options either
will make an update of points[1] or first check that points[1] is exactly the same with the internal storage (with all the issues of floating point comparison) and then if necessary make an update. In practice is it faster and more robust to make an update directly.

What if we only used ids

if the Edge object was storing only ids then the above code will have to change

points = [Point(0.0, 0.0, z) for z in range(3)]

mesh = Mesh()

ids = []
for point in points:
    id = mesh.add_point(point)
    ids.append(id)

edge1 = Edge(points=ids[:2])
edge2 = Edge(points=ids[1:3])

edge1.id = mesh.add_edge(edge1)
edge2.id = mesh.add_edge(edge2)

In this case there is no attempt to save the same element twice. The add_edge method needs to only make sure that the ids contained in the edges are valid. And the implementation can be
much simpler.

Updating an Edge will need to update all the contained points

edge2.data[CUBA.Pressure] = 6.7

# This will attempt to also update the points[1] and points[2]
mesh.update_edge(edge2)

The issue here is again the fact the there is no safe way for the Mesh to know if the Points have changed without comparing them with the internal storage and thus it is more practical to just update them anyway.

We pay only for the objects that we use

Using ids will reduce the number of python instances that are created when getting an Edge from a Mesh. The user will have to explicitly get the points if they are needed. Depending on the common
usecases using ids can have a big impact in speed and memory usage.

Using ids will have some side effects

Copying Edge elements from one adapter to the other needs to take place in two steps first all the points and then all the edges. However, I think that forcing the copy operation in two steps is actually cleaner and easier to optimize.

Yet from a usability point of view it might not be the best, and this will heavily depend on the usecases. For example Calculating properties like length of an edge is not trivial anymore.

I still think that we can improve the usability by providing purpose build methods and special iterators.

for edge, points in mesh.iter_edges_with_points():
      # points is a tuple of the points that this edge connects
      print points[2].coordinates - points[1].coordinates

In this case calculating properties (e.g. length) of edges in batch might be faster and more memory efficient.

Please let me know what you think and give counter examples where using ids might make the implementation problematic.

update simphony project name and add URL in Readme and package info

Absrtact descriptions of low level components

"name" attribute for all high-level cuds objects

On the wiki, it was suggested that high-level cuds objects have a name attribute.

Mesh implementation does not properly support the `data` attribute.

The Mesh implementation does not implement the data attribute properly.

The data attribute should be set to a DataContainer not 0 at initialization
The data attribute should be a property so setting and getting the value should return a copy of the internal representation.

Implementation of FileLattice class

"Task 2: Definition of hdf5 file format for storing mesh. Implementation of using the defined hdf5 file format to read/write a lattice mesh "

Move the CUBA info from the wiki to a file in the repository

Extensions to DataContainerTable

The following extension to the DataContainerTable are necessary

Support version
Control which keywords are necessary to be saved/restored

Storing element Points in FILE I/O

The number of points that can be stored for a given element now is set in the code

Ex:

MAX_POINTS_IN_FACE = 3
MAX_POINTS_IN_CELL = 4

Points are then sotred as

points_uuids = tables.StringCol(32, pos=2, shape=(MAX_POINTS_IN_EDGE,) )

We agreed in the last meeting that this need to be changed.

Currently we are considering two options:

Increase the maximum number to fit the requirements of any element.
Use VLArray ( variale lenght arrays ) see #22 (comment)

While I preffer to use VLArray, I can recall that someone argued that this will decrease the performance of the code.

Please let me know what you think.

H5CUDS support for storing CUDS-objects with a select set of CUBA-keys

Current implementation of H5CUDS only allows complete saving of CUDS-object to H5CUDS-file. That means all data corresponding to CUBA-keys is saved. However, at least for the Lattice object, we can already see a need to select the CUBA-keys, which are to be saved.

For the Lattice-Boltzmann method, we generally have CUBA-data for material_id, density and velocity stored in SimPhoNy, and optionally, model specific internal variables used in the simulation engines. We want to allow extraction of this data separately, for example, saving only the geometry (in the case of models that allow changing of it), or only the internal variables for analysis or restarting purposes. The need to select which CUBA-data is saved, might become useful for the other CUDS-objects as well.

In the current implementation of H5CUDS/FileLattice, selective saving of the CUBA-data can be done by constructing a new FileLattice with a custom data description class that has correct columns defined. Then one can iterate over the nodes of the Lattice that has to be saved and update the corresponding nodes in the FileLattice. This would, however, separate the two cases under different abstraction levels (H5CUDS that handles CUDS-objects vs. FileLattice which provides methods to manipulate in-file LatticeNodes).

The question is: Should we allow partial saving of the CUDS-objects already at the H5CUDS-level?

Provide batch operations for adding multiple elements (e.g. particles/bonds)

Please find below the proposed changes to the current CUDS api.

class ABCLattice(self):

    def add_nodes(nodes):
        """ Add a set of nodes from the provided iterable.

        """

    def update_nodes(nodes):
        """ Update a set of nodes from the provided iterable.

        """

class ABCParticles(self):

    def add_particles(iterable):
        """ Add a set of particles from the provided iterable.

        """

    def update_particles(iterable):
        """ Update a set of particles from the provided iterable.

        """

    def remove_particles(iterable):
        """ Update a set of particles from the provided iterable.

        """

    def remove_particles(uids):
        """ Remove the particles with the provided uids.

        """

    def add_bonds(iterable):
        """ Add a set of bonds from the provided iterable.

        """

    def update_bonds(iterable):
        """ Update a set of bonds from the provided iterable.

        """

    def remove_bonds(uids):
        """ Remove the bonds with the provided uids.

        """

class ABCMesh(self):

    def add_points(iterable):
        """ Add a set of points from the provided iterable.

        """

    def update_points(iterable):
        """ Update a set of points from the provided iterable.

        """

    def remove_points(uids):
        """ Remove the points with the provided uids.

        """

    def add_edges(iterable):
        """ Add a set of edges from the provided iterable.

        """

    def update_edge(iterable):
        """ Update a set of edges from the provided iterable.

        """

    def remove_edges(uids):
        """ Remove the edges with the provided uids.

        """

    def add_faces(iterable):
        """ Add a set of faces from the provided iterable.

        """

    def update_faces(iterable):
        """ Update a set of faces from the provided iterable.

        """

    def remove_faces(uids):
        """ Remove the faces with the provided uids.

        """

    def add_cells(iterable):
        """ Add a set of cells from the provided iterable.

        """

    def update_cells(iterable):
        """ Update a set of cells from the provided iterable.

        """

    def remove_cells(uids):
        """ Remove the cells with the provided uids.

        """

notes:

Remove related methods are now provided for the Mesh container.
All add operations return a list of uids of the items that where added in the order of the items in input.

tasks:

Provide documentation on adding visualization plugins

Template for testing against cuds

It would be really good to have a set of templates that the collaborators can use in order to test their wrapper implementations against the official cuds api.

Skeleton

Dear all,

I have been testing the mesh module and I realized that, with the current skeleton, runing the command "python -m unittest discover" does not execute my module.

For the tests to be executed automatically I had to add a init.py under "simphony-common/simphony" and rename my "tests" folder to "test"

The changes would be like this:

simphony-common/
    simphony/
        init.py
        cuds/
            test/

Can someone point me if this is normal or something has to be added to the setup.py file?
Br.