openpmd / openpmd-viewer Goto Github PK
View Code? Open in Web Editor NEW:snake: Python visualization tools for openPMD files
Home Page: https://openpmd-viewer.readthedocs.io/
License: Other
:snake: Python visualization tools for openPMD files
Home Page: https://openpmd-viewer.readthedocs.io/
License: Other
We should probably add some Continuous Integration (CI) via http://travis-ci.org/ to automatically run a large sample of tests for the API for each PR before we merge it.
Idea: pytest and/or tox (shamelessly stolen from Open Sourcing a Python Project the Right Way)
As suggested by @jgerity , passing a callable would give more flexibility to the user.
Hi,
this is really a minor thing: I noticed that my anaconda python does not ship with wget, which is used to download the sample data in the tutorials. I'm not sure if this is the case because my anaconda is outdated or if this is standard.
This could maybe lead to some confusion for inexperienced python users. I think adding it to the requirements.txt
is a little bit overkill, since it is technically not needed for the viewer itself, but we could maybe add a side note in the tutorials.
If wget now comes with python, forget all I've said above :)
Hi,
in its current implementation check_all_files=False
, prevents the usage of the slider widget as well as the ability to call get_field(t=50years...)
. This happens because self.t
is acquired in the same for loop where the data checking is done:
self.t = np.zeros( N_files )
...
if check_all_files:
for k in range(1, N_files):
t, params = read_openPMD_params(self.h5_files[k])
self.t[k] = t
for key in params0.keys():
if params != params0:
print("Warning: File %s has different openPMD "
"parameters than the rest of the time series."
% self.h5_files[k])
This results in a self.t
array filled with zeros, if check_all_files=False
.
Sorry for being not too familiar with the standard, but if the timestep is saved within the openPMD file, we could use that in order to determine the time from the iteration data. Otherwise ( if the timestep is not allowed to change, again I need to get more familiar with this stuff) we could only open two files and get the timestep from the difference in time and iteration.
EDIT:
"READ THE FREAKING MANUAL!!"
Turns out this is in fact stated in the standard, but not to the benefit of the problem:
dt
type: (float)
description: The latest time step (that was used to reach this iteration). This is needed at the iteration level, since the time step may vary from iteration to iteration in certain codes.
Okay then I see no other way then opening all files to acquire the times.
Technically the api should work without the times so if we want to have a "fast" mode we could have a custom exception thrown whenever a user wants to use the fast mode and call a time argument.
EDIT 2:
This could be done for example in _find_output(self, t, iteration)
def _find_output(self, t, iteration):
...
# If a time is requested
elif (t is not None):
# NEW
if self.t == None:
raise APIError:
'Fast-read mode does not support time data acquisition'
# Make sur the time requested does not exceed the allowed bounds
if t < self.tmin:
self.current_i = 0
elif t > self.tmax:
self.current_i = len(self.t) - 1
...
To conclude this, I think that this flag is very nice since checking large datasets can take a great amount of time. Therefore we should try to find an elegant solution to this problem. I also tried to use multiprocessing for the initialisation step, but my attempts failed with the really not so nice implementation of the python multiprocessing module... What are your thoughts on this?
I upgraded a Python 2.7 install of openPMD-viewer 0.3.3
to 0.5.3
via pip today (note: there is still a tag for 0.5.3
missing on GitHub -> Releases) and noted the following install problem:
$ pip install openPMD-viewer --user -U
[....]
Collecting openPMD-viewer
Using cached openPMD-viewer-0.5.3.tar.gz
Complete output from command python setup.py egg_info:
/home/axel/.local/lib/python2.7/site-packages/pkg_resources/__init__.py:1869: UserWarning: /usr/lib/pymodules/python2.7/rpl-1.5.5.egg-info could not be properly decoded in UTF-8
warnings.warn(msg)
Maybe try:
sudo apt-get install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
pandoc
is missing in the (pip) requirements.txt
and besides the python part it also needs the printed binary-part to be installed. Anyway, it's probably only used for documentation building or rst file creation for packaging?
Not sure when this issue was introduced, because it's in setup.py
already for a long time. Note that I installed previous versions (0.3.3
) via python setup.py install
from the sources directly, that might be the reason it was not triggered.
Repeating the same install via Python 3.4 seemed fine even without pandoc, but only installs 0.5.2
o.0
Also, are we sure we drop support and CI for Python 3.4 already (see setup.py meta information)? It's still the stable Python release in many distributions.
It appears that for Cartesian datasets, avail_circ_modes
will be set to None
in params_reader.py, but this appears to cause an error when calling OpenPMDTimeSeries.slider()
for interactive exploration in an IPython (4.2.1-aed0eae
) notebook with widgets. This is replicated for the example datasets
Replacing avail_circ_modes
with []
as needed after creating the OpenPMDTimeSeries
object seems to yield the correct behavior:
import opmd_viewer
ts = opmd_viewer.OpenPMDTimeSeries('./example-3d/hdf5/')
%matplotlib inline
if ts.avail_circ_modes is None:
ts.avail_circ_modes = []
ts.slider()
Unless I've misunderstood something about this object, using the empty list []
instead of None
at params_reader.py:L99 will resolve the issue without the need for intervention on the user's part.
To improve the generality of the viewer without harming its usefulness for a specific domain, we could make the following adjustment regarding unit systems:
Per default, the OpenPMDTimeSeries
should not convert, rename or exclude records.
But we could set a unit system that is used for reading data and formatting plots in a way such as:
from opmd_viewer import OpenPMDTimeSeries
import opmd_viewer.unit_systems
ts = OpenPMDTimeSeries('...')
# change to lambda0 & c based system
ts.set_unitsystem(unit_systems.LPA(800.e-9))
# change back to SI
ts.set_unitsystem(unit_systems.SI())
# change a plasma system
ts.set_unitsystem(unit_systems.Plasma(1.e15))
# read or plot data now
The important thing is that we apply the transformations as we read the raw data together with unitSI
multiplications, etc.
SI()
: just forward what we read scaled by unitSI
Microns()
: use microns instead of meters for lengthsLPA(lambda0)
: scale length to lambda0
, times to lambda0/c
, speed to c
, ...Plasma(omega_pe)
: scale time by (2 pi) / omega_pe
, speed by c
, length by (c * 2 pi) / omega_pe
, ...Raw()
: ignore all scalings including unitSI
(for debugging codes)CGS()
: I am just kidding, cgs units are deprecated and will be removed in future versions of science ๐If we document the interface of those classes properly, power users could actually even use their own unit system converters even if we did not implement them (yet).
During the change of the unit system we can also rename the records since e.g., a momentum in beta gamma (ux
) is more readable and known in a specific domain.
The slicing slider in the field view of the slider()
method is inverted.
It goes from -1
to +1
with 0
representing the center. At least for slicing in x in 3D +1
is equal to the smallest x-cell-index while -1
is equal to the highest x-cell-index. This is a bit counter-intuitive.
For more consistency, the import statement should be changed from import opmd_viewer
to import openpmd_viewer
.
In addition:
Right now, openPMD-viewer can handle only the case where there is one iteration per file, and moreover, the iteration is obtained from the filename, not the from the basePath
(e.g. /data/100
). Finally, if there are additional paths in /data/
, this could could the reader to fail (e.g. if /data/
contains /data/100/
and /data/some_other_path
).
These are clearly limitations, and they should be fixed.
Currently the pip install is failing with the following error
pip install openPMD-viewer
Collecting openPMD-viewer
Using cached openPMD-viewer-0.5.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-2_8yy6dj/openPMD-viewer/setup.py", line 47, in <module>
"opmd_viewer/openpmd_timeseries/cython_function.pyx"),
File "/data/home/branco77/python/data_analysis/lib/python3.4/site-packages/Cython/Build/Dependencies.py", line 818, in cythonize
aliases=aliases)
File "/data/home/branco77/python/data_analysis/lib/python3.4/site-packages/Cython/Build/Dependencies.py", line 704, in create_extension_list
for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
File "/data/home/branco77/python/data_analysis/lib/python3.4/site-packages/Cython/Build/Dependencies.py", line 108, in nonempty
raise ValueError(error_msg)
ValueError: 'opmd_viewer/openpmd_timeseries/cython_function.pyx' doesn't match any files
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2_8yy6dj/openPMD-viewer/
With the introduction of bremsstrahlung and Compton scattering into PIConGPU a photon model has been added to the available particle species. Since a photon's mass is zero we encounter a divide by zero during the normalization of the momentum data to (mc)^-1.
/lib/python3.4/site-packages/opmd_viewer/openpmd_timeseries/data_reader/particle_reader.py:84:
RuntimeWarning: divide by zero encountered in true_divide
norm_factor = 1. / (get_data(species_grp['mass']) * constants.c)
In this case it would be desirable to have an arbitrary normalization and maybe a logarithmic plot scaling as a default. Then the users could set a normalization value by themselves after seeing the logarithmic plot and deciding where the regions of interest are and maybe change back to linear scaling. ๐
Wanted to report that for a long time but always forgot. As seen here we like to have rather long field and particle attribute names which breaks the layout a bit. Can one adjust it a little so it is more flexible with regards to long record names?
Also the range "number input" boxes in the Plotting
options are sometimes too small in width to see all numbers (e.g. "19" for "1e19") one entered. The last point might also just be a Chrome issue.
We should add a license header to each file stating the 3-Clause-BSD-LBNL license is used.
This is good practice since derivatives might only clone a specific file.
I can prepare a snippet later on, not urgent.
"""
File description
...
__authors__ = "Remi Lehe, Soeren Jalas, Axel Huebl, ..." (from `git log` of each file)
__copyright__ = "Copyright 2015-2016"
__credits__ = ["Remi Lehe", "Soeren Jalas", "Axel Huebl"]
__license__ = "3-Clause-BSD-LBNL"
"""
(addLicense tool in PIConGPU)
Also: avoid UTF-8 for now, write names in pure ascii (sometimes crashes people's setups otherwise)
For one of the sets of simulations that I am currently doing, I would need to be able to track particles in postprocessing.
Assuming that the id
of the particles have been stored in the openPMD data, then this should be possible with openPMD-viewer, if we add this feature.
I would definitely volunteer for this, but I would like us to first agree on the API. I thought about introducing a ParticleTracker
object. This object would be used in the following way, for instance in order to select electrons having a longitudinal momentum between 10 and 100 at iteration 2000, and then fetching the positions of these electrons at iteration 30000 and 50000.
ts = OpenPMDTimeSeries( 'some/directory' )
# Select particles at iteration 2000
pt = ParticleTracker( ts, species='electrons', select={'uz':[10,100]}, iteration=2000 )
# Fetch them at iteration 30000 and 50000
x1, y1, z1 = ts.get_particle( ['x', 'y', 'z'], particle_tracker=pt, iteration=30000 )
x2, y2, z2 = ts.get_particle( ['x', 'y', 'z'], particle_tracker=pt, iteration=50000 )
Also, in this case x1, y1, etc. would be arrays that have the same length as the initial number of particles selected at iteration 2000, and the same tracked particle would have the same position within the array x1 and x2 for instance. Also, there would be NaNs in x1 if some of the particles were present at iteration 2000 but not anymore at the iteration 30000.
@ax3l @soerenjalas @MKirchen @jgerity : Does the above api make sense to you? Do you have suggestions for improvements, before I start coding?
get_gamma
)avail_ptcl_quantities
a dictionary (with macroWeighted yes/no)momentum/x
)We should add a CHANGES.md
or CHANGELOG.md
documenting the changes between two releases so users can update their scripts accordingly.
Examples:
The get_particle
function should be modified, so that, if there is only one species, the user does not need to pass it.
Currently unitSI
seems not to be used while reading data since warp is creating data already in SI (making unitSI==1.0
).
We could combine the multiplication (scaling) with the scaling by weighting to make it more efficient.
Follow-up to #148: the "coord" option in the field API and interactive ipywidget should be renamed to component
(or comp
) since a component of a field does not necessarily reflect a certain quantity along a vector component of the chosen geometry. The current naming can be confusing, e.g. for tensor fields.
It is also the name used in the standard for the same reason, record
and record component
.
In the new version of ipywidgets, the width of boxes have changed.
The slider looks quite cumbersome in this case. Thus the width need to be taylored again.
To save time while opening an OpenPMDTimeSeries
we could add an optional parameter that is only querying the first file for, e.g., available particle records and assumes the other files will not change the available attributes for each species over time.
We still need to add a central place to update the version in both the package and also in the module.__version__
.
Maybe we also add a small update.sh bash script to bump the version number before releases so we do not forget places to update.
When using the slider of openpmd-viewer
one can pass keyword arguments but they will not be used.
Hi,
I came across some minor annoyances when using the api in rather long python scripts. When using wrong field
or var_list
arguments in get_field
or get_particle
we currently issue a warning in form of a python print:
if valid_var_list == False:
quantity_list = '\n - '.join( self.avail_ptcl_quantities )
print("The argument `var_list` is missing or erroneous.\nIt "
"should be a list of strings representing particle "
"quantities.\n The available quantities are: "
"\n - %s" %quantity_list )
print("Please set the argument `var_list` accordingly.")
return(None)
This has the downside of not knowing where exactly in the code the error occurs. Something like
field = get_field(t, field=bogus)
is then passed as NoneType
and will lead to errors further down the road.
My suggestion would be to additionally throw an exception to have a proper python traceback. Would this lead to errors with the interactive interface?
But remove them in the interactive GUI, as it would take some space on the screen and it would not be very useful.
Hi @RemiLehe,
Does the main package of the viewer lack a __version__
attribute, or am I missing it?
import opmd_viewer
print(opmd_viewer.__version__)
# AttributeError: 'module' object has no attribute '__version__'
We might want to add it, there is are two PEP about it :)
ornladios/ADIOS#61 (comment)
Plotting e.g., 2D fields right now does not add the correct labels to the axis (e.g., they are x
, z
as in warp).
For that to change, we just need to read the axisLabels
attribute.
The slider()
could add an optional exclude_particle_records=list
argument which can be used to exclude charge
and mass
as its currently hard-coded.
The syntax should use a dictionary, as in the case of the Warp OpenPMD output.
The interactive GUI should be updated accordingly.
There are several tasks related to the installation that we should go through:
pip
.conda
.In particular, we need to add install instructions in the README
for both pip
and conda
. In the case of pip
, it would be good to distinguish between:
module load
instructions for hdf5 parallel)When trying to read a PIConGPU file with the current dev
branch as of 135809c reading the position via
y,x = ts.get_particle( var_list=['y','x'], t=0, species='e', plot=True, nbins=300, vmax=5e7)
leads to:
openPMD-viewer/opmd_viewer/openpmd_timeseries/main.pyc in get_particle(self, var_list, species, t, iteration, select, output, plot, nbins, **kw)
239 for quantity in var_list:
240 data_list.append(read_species_data(
--> 241 file_handle, species, quantity, self.extensions))
242 # Apply selection if needed
243 if select is not None:
openPMD-viewer/opmd_viewer/openpmd_timeseries/data_reader/particle_reader.pyc in read_species_data(file_handle, species, record_comp, extensions)
72 # - Return positions in microns, with an offset
73 if record_comp in ['x', 'y', 'z']:
---> 74 offset = get_data(species_grp['positionOffset/%s' % record_comp])
75 data += offset
76 data *= 1.e6
/openPMD-viewer/opmd_viewer/openpmd_timeseries/data_reader/utilities.pyc in get_data(dset, i_slice, pos_slice)
92 # Scale by the conversion factor
93 if dset.attrs['unitSI'] != 1.0:
---> 94 data *= dset.attrs['unitSI']
95
96 return(data)
TypeError: Cannot cast ufunc multiply output from dtype('float64') to dtype('int32') with casting rule 'same_kind'
That is likely do to the recent changes in #81 #85 and caused by our internal storage of positionOffset
as an int32
:)
The result of positionOffset[()]
(int32) and unitSI
(float64) should of cause be promoted to float64
if unitSI
!= 1.0
. Since *=
is an in place operation we can just add a reallocation & cast beforehand if necessary:
if dset.attrs['unitSI'] != 1.0:
if data.dtype != np.float64:
data = data.astype(np.float64)
data *= dset.attrs['unitSI']
In recent versions of numpy there is also a copy=False
flag: see this answer. Note: view()
will only work if the two casted types are the same size
in byte.
I noticed an unexpected behaviour in how get_fields
returns the fields and the extent of the fields:
while the extents are returned as (z_min, z_max, r_min, r_max), or similar for other coordinates, the shape of the field array is (Nr, Nz). In other words "flipped" in order. I think this behaviour can lead to confusion and we should, if we don't change this, at least explicitly mention the structure of the returned quantities in the docstring.
For all kind of particle binning methods (in space, in momentum, ...) it would be very useful to add shape (assignment) functions so one can remove the aliasing noise in the bins.
Hi,
is there a problem with our continuous integration. It seems that the tests on all PR are on hold and they don't show up on https://travis-ci.org/openPMD/openPMD-viewer.
Currently, all time series extractions are done using the list_h5_files
function from opmd_viewer.openpmd_timeseries.main
. This function extracts the time step from the last number before the suffix not from the time step in the hdf5 files. See code here. Thus files like libSplash
serial output will result in wrong time steps (the serial output extracts files with a given mpi rank as filename_[timestep]_[rankx]_[ranky]_[rankz].h5
).
Currently the method get_main_frequency
in LpaDiagnostics
simply searches for the maximum in the FFT. This produces quite unusable data when studying e.g. the redshift of the pulse in plasma. As you can see below the value changes quite unsteadily.
I think by fitting the spectrum one could probably get better results as more data is taken into account.
The question is if a Gaussian fit is sufficient for this purpose as the pulse can change quite violently or people might use exotic pulses. I can work out a PR so we could test if this method gives better results. @RemiLehe any thoughts on this?
The function dlink
from package traitlets
provides the ability to link the attributes of 2 widgets. For instance we could link the time sliders of two different simulation.
However, in order to do so, I think the method slider
of OpenPMDTimeSeries
needs to return the slider, not just display it, but this can easily be changed in openPMD-viewer
. I'll investigate this.
Right now, the figure has to be selected through the arguments ifig_f
and ifig_p
of the slider
method ; this is not very convenient.
With "slice direction" in the OpenPMDTimeSeries.slider
we actually mean the slice normal, don't we? Should we rename it, I find the term direction to be quite arbitrary in the widget (because a plane would need at least to base vectors pointing somewhere in the plane to span a slice).
We might need to update the object in OpenPMDTimeSeries
to use dictionaries
with key iteration
instead of plain arrays.
Right now, the viewer assumes all simulations are zero-based in iterations. This still works when restarting (e.g. for high-resolution output) for output dumps only in a specific interval [i_min:i_max]
but variables like current_i
are wrong.
Also, current_i
is due to that actually the n-th output and not the iteration of that output.
We can also generalize _find_output
to return the found time and iteration again instead of modifying the members directly (allows to implement new member functions such as find_time(iteration)
and find_iteration(time)
).
The datasets in https://github.com/openPMD/openPMD-example-datasets will soon be have been modified. The tutorial notebooks (esp. notebook 3) should reflect this.
It would be good if the user can extract the resolution easily.
I noticed that the dev branch is a few commits behind the master branch. I think we should try to keep the branch up to date, since the contribution guidelines say to fork and update from this branch. Could you merge the current master branch into the dev branch?
It would be nice to have pep8 incororated in the automated tests, so that we automatically check for more than 80 characters, etc.
Right now the GUI has a time slider, which indicates the time in femtoseconds.
However, in version 1.0.0 of openPMD-viewer, we are planing to enforce unit consistency and return everything in SI. This means that, for LWFA simulations, the time in the slider will be of the order of 1.e-12.
It turns out that, in this case, the slider does not work anymore! (See for instance: jupyter-widgets/ipywidgets#259)
Therefore, for version 1.0.0, I am planning to replace the time slider by an iteration slider (which will be more robust to small values of the time). @soerenjalas @ax3l : any objection to this ?
The current implementation of get_spectrogram
seems to return wrong results.
.
The white plots in the spectrogram show the projections of the spectrogram and should give the pulse spectrum and envelope.
For comparison the spectrum is calculated with numpy.fft.fft()
and the envelope with get_envelope
. These plots definitely fit to the input data of the simulation. As you can see they don't really agree with the spectrogram.
I'm not yet sure what's causing this discrepancy, as this could either be an error in the algorithm or some other issue. One idea I had is that the frog method just doesn't work well with the given pulse. In that case one could try to use a different gating function. Or maybe you have some other idea what might be the cause of this.
I just wanted to warn you about this issue and will look further into this. You can assign this issue to me.
When using openPMD_viewer
- OpenPMDTimeSeries
with the slider()
method in 2D, the ranges of x and y axis are interchanged wrong.
In PIConGPU, the laser propagates in +y direction, as correctly marked by the axis labels. However, the ranges given are interchanged.
(In this LWFA simulation, I used a moving window)
Currently, the zoom level restores to unzoomed when the time step is changed. This might be unconvenient when one wants to only see a detail and study its change over time.
Keeping the zoom level, just as all the other settings are kept would be very helpful.
Off-topic: Cool tool!!!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.