exa-analytics / exa Goto Github PK
View Code? Open in Web Editor NEWThe exa framework for data management, processing, and visualization
Home Page: https://exa-analytics.github.io/exa
License: Apache License 2.0
The exa framework for data management, processing, and visualization
Home Page: https://exa-analytics.github.io/exa
License: Apache License 2.0
When calling the base container save method, categories must be converted to their "standard" type.
The odict doesn't seem to provide advantages over, for example, a simpler list of tuples that has the same data, i.e.:
{'find_key1': OrderedDict([(lineno1, 'value1'), ('lineno2', 'value2']),
'find_key2': ...}
versus
{'find_key': [(lineno1, 'value1'), (lineno2, 'value2')], 'find_key2': ...}
IMO changing to list of tuples simplifies parsing functions in packages such as exatomic, exadf, etc.
Consider:
ed = exa.Editor("""
Some text with appropriate {
braces like some JSON
}""").write('mynewfile.json')
Currently fails with a KeyError because the call to format() expects a dict with the keys being the contents in the brackets. A work-around is to do something simple like:
ed = exa.Editor("""
Some text with appropriate {
braces like some JSON
}""".replace('{', '{{').replace('}', '}}').write('mynewfile.json')
This will assume there is no format kwargs to be passed and render as plain text.
Ideas: perhaps a "raw" keyword argument to Editor.write to supersede calling format()
It is also possible that one wants to mix and match raw curly braces with formatting curly braces, in which case the "raw" curly braces would need to be escaped somehow.
Occasionally I will bork a file path when calling exa.Editor('/path/to/my/file') but the constructor happily assumes you want an Editor object with a single line whose contents are the broken file path. If it is a string and "looks" like a file path but os.path.isfile is False, maybe we should print a warning that you probably actually wanted a file that didn't get found. It would explain all sorts of IndexError/AttributeError/etc. down the line for subclasses when specific parse_methods are called but cannot execute properly.
Or should we consider error catching using a different paradigm?
In exa.numerical
For example, when two body properties (in atomic) are created only update traits associated with that dataframe; Possibly related to an event system for dataframes.
We should consider supporting MongoDB as the relational and numeric backend. Mongo stores stuff essentially as json docs on disk; this is very convenient for us because already expect to be able to dump containers to json objects.
Suspicion is that newly implemented cylinders are causing a dramatic FPS drop on animation. Need to figure out what is causing the slowdown for sure and come up with a fix. Could consider line shader to make the line look cylinderical rather than using CylinderGeometry.
Is there a programmatic way to prevent compilation failure when numba is not present? Or do we make required dependency? Preference for a solution to using the former approach.
If the line
Container(name='test', description='created during install...')
is not included in exa.install.initialize() then a sqlalchemy flusherror is thrown if a user attempts to create an inherited container (e.g. Universe from atomic) before creating a generic Container. Is it because tables are created in two different packages (see exa.tools.initialize_database and/or atomic.tools.initialize_database).
This is a useful tutorial on building bidirectional communication using websocket within a tornado app.
http://iot-projects.com/index.php?id=websocket-a-simple-example
The Editor class is intended to interact with files on disk, parse them, and create the container (exatomic's universe) object. Upon successful creation of the container, it would be nice to be able to generate new text files in various file formats as determined by Editor's subclasses.
One approach to this idea:
class Editor:
@classmethod
def from_container(cls, container):
parsed = cls.class_specific_string_builder(container)
return cls(parsed)
def class_specific_string_builder(self, container):
raise NotImplementedError()
class SubEditor(Editor):
def class_specific_string_builder(container):
return correct_string
Is your feature request related to a problem? Please describe.
Documentation (rst files) are currently added by hand. Lets automate their creation.
Describe the solution you'd like
sphinx-apidoc can accomplish this
Describe alternatives you've considered
Since we're using Sphinx for documentation generation currently no other doc generation tools were considered.
Additional context
See exatomic#130 for a similar approach
Container concatenation needs an implementation
Describe the bug
numba.autojit
has been deprecated in version 0.47.0. I suggest that this be fixed as the tests fail.
If there a cardinal does not exist for a container, then iteration is ill-defined
Currently editors don't support regex that spans line breaks;
editor.regex("my string.*\nAnd another string")
doesn't work. I think that we could add a kwarg like multiline=False
and provide multiline support via something like:
re.search(pattern, "\n".join(self))
inside the bound method regex
. Alternatively could preparse the pattern for line breaks and handle accordingly inside the bound method. Not really sure what is the best option but either way, what line number is return for a multiline regex in the return values? Currently (for a single search pattern) the return is a list of tuples (line number, line string). That doesn't work for a multiline match exactly.
self._set_categories() and self._revert_categories() do not always behave as expected.
Depending on the order of import, config-dependent code may or may not run as expected if the base package atexit is called prior to importing the subpackage. Ponts to exa-analytics/exatomic#68
reference pull request #40
If we use scoped sessions only one import of exa is possible per compute (Config). If we don't, multiple import exa statements can be run (e.g. in multiple notebooks) but we run the risk of db issues (non-atomic commits) issues.
Currently the exa.editor.Editor.regex method expects single line regex queries only. This is because editor lines are stored as a list of strings. Supporting multiline regex would require something like
'\n'.join(self._lines)
and then a re.search on the joined object. Also, the regex method currently only supports string arguments; no support for compiled re expressions (e.g. re.MULTILINE could be relevant).
Tests should follow the design (and be added to) exa.test.test_numerical.py
Use df_or_series.memory_usage() instead of adding numpy nbytes in exa.container.BaseContainer.info
A medium sized container (~2GB in memory) spikes the RAM usage above 16GB on save!!!
Reference commit #26 for a full list.
Wrote capturing but there may be a more direct way? This caught they eye for capturing:
https://github.com/spite/ccapture.js/
The Editor
should be redesigned to be more efficient, and a StructuredTextEditor
(e.g. csv) can serve as an example implementation.
Rather than requiring the creation of persistent storage, if the directory .exa doesn't exist, simply create the database in memory (sqlite://:memory:) and warn the user that CMS features will not be persistent. This will require retooling the exa.tools.finalize_install() function (or removing it - same for atomic) as well as changing the behavior of exa.config.Config.init to prevent creation of directories and files.
Rather than updating all traits whenever _traits_need_update, we could update only the traits that acutally need updating. This could be accomplished by transforming the aforementioned variable into a list that tracks trait names (_tablename_traitname) that need to be updated (for example). Other more clever solns may exist.
Describe the bug
When import exa
is uncommented in the included code I can no longer change the fontsize with the font variable in matplotlib. When I comment out import exa
I have full control of the fontsize as expected. Is there some global variable in exa that causes this?
To Reproduce
With the code below, when import exa
is commented out you can change the fontsize on the font variable and it will change the fontsize as expected. I put a fontsize of 1 just to make an extreme difference. But, with it uncommented as in the original example you cannot change it at all.
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import exa
# This is the variable that has to be changed to see the issue
# I think 9 or 10 is the default in matplotlib
font = {'size': 1}
matplotlib.rc('font', **font)
# This code was taken from a contour tutorial for matplotlib
matplotlib.rcParams['xtick.direction'] = 'out'
matplotlib.rcParams['ytick.direction'] = 'out'
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-2.0, 2.0, delta)
X, Y = np.meshgrid(x, y)
Z1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
# difference of Gaussians
Z = 10.0 * (Z2 - Z1)
# Create a simple contour plot with labels using default colors. The
# inline argument to clabel will control whether the labels are draw
# over the line segments of the contour, removing the lines beneath
# the label
plt.figure()
CS = plt.contour(X, Y, Z)
plt.clabel(CS, inline=1, fontsize=10)
plt.title('Simplest default with labels')
plt.show()
Expected behavior
Control of the fontsize.
Desktop (please complete the following information):
Ubuntu 18.04
Python 3.7.1
Matplotlib 3.0.2 (from conda)
See numerical.py; this has consequence on load (container.from_hdf) as well. Should consider changing the dfclasses attribute of the container to allow for wildcards (i.e. arbitrary dataframe names).
Experiencing the following bug with a minimal working example:
t = exa.Editor('')
t.meta['name'] = 'main'
t.meta
returns
{'name': 'main'}
as expected. Then creating a new instance
b = exa.Editor('')
b.meta['name'] = 'face'
b.meta
returns
{'name': 'face'}
also as expected. But then:
t.meta
returns
{'name': 'face'}
as though meta is a class-level attribute of the Editor, not an instance attribute.
This involves some changes in the way custom widgets behave http://blog.jupyter.org/2016/04/22/ipywidgets5/
Is this a color issue not a material issue?
Directx has an incompatibility with webgl lines (THREE.Line/LineSegments/LineBasicMaterial). Need to use care with these; check what is (currently) incompatible and fix. Likely just an option in the line's material.
This module provides enhanced search functionality for Session, Program, Project, Job, Container, and File but is not yet implemented.
In exa.container's docstring it would be nice to have a fuller description of the concept of a container and how it facilitates interaction with a collection of (related or not) data objects.
Is your feature request related to a problem? Please describe.
It is difficult to compile documentation locally when developing and testing.
Describe the solution you'd like
It would be nice to have a script for compiling documentation locally.
Describe alternatives you've considered
The command line, pros are flexibility, cons are having to remember commands and options.
This method will be used to properly attached a dataframe including traits (and updating of traits).
Currently the Field class (on the python side) has the following columns:
nx, ny, nz, ox, oy, oz, xi, xj, xk, yi, yj, yk, zi, zj, zk
but the Field class (on the JS side) uses the following attributes (and assumes a cubic cube file):
nx, ny, nz, xmin, ymin, zmin, dx, dy, dz, (xmax, ymax, zmax, x[:], y[:], z[:], xvalues, yvalues, zvalues)
where the ones in parentheses need not be shipped back to python. If we have xmin, dx and nx we have x[:]/xvalues defined for us. The attributes on the JS side can be streamlined to reflect more closely the attributes on the python side, regardless of the internal implementation of x/y/zvalues on the JS side.
Describe the bug
Networkx is missing from the requirements.txt despite being used by the package's Container
class.
Expected behavior
We expect all dependencies to be included in the requirements.txt
file.
The repository size is fairly large now and takes uncomfortably long to clone, for example. In this post: https://stackoverflow.com/questions/2116778/reduce-git-repository-size there are some hints on things that could be done to reduce the repo size. We should explore what can be done here.
Remove all of the boilerplate metaclass=DimensionMeta i.e. simplify the inheritance of the dimension tables.
Need to implement update logic in the case where a new isotope or unit conversion is added (maybe related to bulk update in relational?)
An instance of a subclass of the Editor class for certain files yields the following stack trace:
/home/tjd/Programs/analytics-exa/alex/exatomic/exatomic/editor.py in __init__(self, sgtfo_func, cgtfo_func, *args, **kwargs)
191
192 def __init__(self, *args, sgtfo_func=None, cgtfo_func=None, **kwargs):
--> 193 super().__init__(*args, **kwargs)
194 self.meta = {'program': None}
195 self._atom = None
/home/tjd/Programs/analytics-exa/alex/exa/exa/editor.py in __init__(self, data, filename, meta, as_interned, **kwargs)
333 self._lines = data
334 elif ispath:
--> 335 self._lines = lines_from_file(data, as_interned)
336 self.filename = os.path.basename(data)
337 elif isinstance(data, StringIO):
/home/tjd/Programs/analytics-exa/alex/exa/exa/editor.py in lines_from_file(path, as_interned)
388 lines = [sys.intern(line) for line in f.read().splitlines()]
389 else:
--> 390 lines = f.read().splitlines()
391 return lines
392
/home/tjd/miniconda3/lib/python3.5/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 5720: invalid continuation byte
Can treat the symptoms with a try/except.
Should be fixed in exa/setup.py
This will simplify passing default options from the python side to the javascript side. Note that this API should be compatible with a standalone webapp so it may be advantageous for the Container to have this API - in principle the GUI for a type of data container is systematic regardless of whether it is being view in the notebook or in the standalone webapp
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.