exa-analytics / exa Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 10.0 16.92 MB

The exa framework for data management, processing, and visualization

Home Page: https://exa-analytics.github.io/exa

License: Apache License 2.0

Python 99.67% Batchfile 0.16% Shell 0.17%

data-science

exa's People

Contributors

Stargazers

Watchers

Forkers

tjduigna vmarchen gitter-badger adamphil herbertludowieg farnoushnouri rdmontgomery rulingf codacy-badger

exa's Issues

container.save doesn't revert categories

When calling the base container save method, categories must be converted to their "standard" type.

Should exa.editor.Editor.find return OrderedDict values?

The odict doesn't seem to provide advantages over, for example, a simpler list of tuples that has the same data, i.e.:

{'find_key1': OrderedDict([(lineno1, 'value1'), ('lineno2', 'value2']),
 'find_key2': ...}

versus

{'find_key': [(lineno1, 'value1'), (lineno2, 'value2')], 'find_key2': ...}

IMO changing to list of tuples simplifies parsing functions in packages such as exatomic, exadf, etc.

Editor.write with curly braces

Consider:

ed = exa.Editor("""
Some text with appropriate {
    braces like some JSON
}""").write('mynewfile.json')

Currently fails with a KeyError because the call to format() expects a dict with the keys being the contents in the brackets. A work-around is to do something simple like:

ed = exa.Editor("""
Some text with appropriate {
    braces like some JSON
}""".replace('{', '{{').replace('}', '}}').write('mynewfile.json')

This will assume there is no format kwargs to be passed and render as plain text.

Ideas: perhaps a "raw" keyword argument to Editor.write to supersede calling format()

It is also possible that one wants to mix and match raw curly braces with formatting curly braces, in which case the "raw" curly braces would need to be escaped somehow.

Exa editor should throw a warning if string passed looks like a file path but is not file

Occasionally I will bork a file path when calling exa.Editor('/path/to/my/file') but the constructor happily assumes you want an Editor object with a single line whose contents are the broken file path. If it is a string and "looks" like a file path but os.path.isfile is False, maybe we should print a warning that you probably actually wanted a file that didn't get found. It would explain all sorts of IndexError/AttributeError/etc. down the line for subclasses when specific parse_methods are called but cannot execute properly.

Or should we consider error catching using a different paradigm?

Complete implementation for sparse traits

In exa.numerical

Create a conda package

http://conda.pydata.org/docs/build_tutorials/pkgs2.html

Only update traits as needed

For example, when two body properties (in atomic) are created only update traits associated with that dataframe; Possibly related to an event system for dataframes.

DB backend support

We should consider supporting MongoDB as the relational and numeric backend. Mongo stores stuff essentially as json docs on disk; this is very convenient for us because already expect to be able to dump containers to json objects.

Extend builds to multiple os

https://docs.travis-ci.com/user/multi-os/

app3d cylinder performance

Suspicion is that newly implemented cylinders are causing a dramatic FPS drop on animation. Need to figure out what is causing the slowdown for sure and come up with a fix. Could consider line shader to make the line look cylinderical rather than using CylinderGeometry.

Numba dependent docs are currently set to automatically be included.

Is there a programmatic way to prevent compilation failure when numba is not present? Or do we make required dependency? Preference for a solution to using the former approach.

FlushError

If the line

Container(name='test', description='created during install...')

is not included in exa.install.initialize() then a sqlalchemy flusherror is thrown if a user attempts to create an inherited container (e.g. Universe from atomic) before creating a generic Container. Is it because tables are created in two different packages (see exa.tools.initialize_database and/or atomic.tools.initialize_database).

Web App Frontend

This is a useful tutorial on building bidirectional communication using websocket within a tornado app.

http://iot-projects.com/index.php?id=websocket-a-simple-example

Adding the reverse conceptual direction of the editor class

The Editor class is intended to interact with files on disk, parse them, and create the container (exatomic's universe) object. Upon successful creation of the container, it would be nice to be able to generate new text files in various file formats as determined by Editor's subclasses.

One approach to this idea:

class Editor:
    @classmethod
    def from_container(cls, container):
        parsed = cls.class_specific_string_builder(container)
        return cls(parsed)

    def class_specific_string_builder(self, container):
        raise NotImplementedError()

class SubEditor(Editor):
    def class_specific_string_builder(container):
        return correct_string

Use sphinx-apidoc to generate documentation

Is your feature request related to a problem? Please describe.
Documentation (rst files) are currently added by hand. Lets automate their creation.

Describe the solution you'd like
sphinx-apidoc can accomplish this

Describe alternatives you've considered
Since we're using Sphinx for documentation generation currently no other doc generation tools were considered.

Additional context
See exatomic#130 for a similar approach

exa.container.concat not implemented

Container concatenation needs an implementation

Tests are failing

Describe the bug
numba.autojit has been deprecated in version 0.47.0. I suggest that this be fixed as the tests fail.

iter for container should iterate over cardinal if exists

If there a cardinal does not exist for a container, then iteration is ill-defined

Support for multiline regex

Currently editors don't support regex that spans line breaks;

editor.regex("my string.*\nAnd another string")

doesn't work. I think that we could add a kwarg like multiline=False and provide multiline support via something like:

re.search(pattern, "\n".join(self))

inside the bound method regex. Alternatively could preparse the pattern for line breaks and handle accordingly inside the bound method. Not really sure what is the best option but either way, what line number is return for a multiline regex in the return values? Currently (for a single search pattern) the return is a list of tuples (line number, line string). That doesn't work for a multiline match exactly.

Automatic category handling

self._set_categories() and self._revert_categories() do not always behave as expected.

Config incompatible with subpackages

Depending on the order of import, config-dependent code may or may not run as expected if the base package atexit is called prior to importing the subpackage. Ponts to exa-analytics/exatomic#68

Should we be using scoped sessions?

reference pull request #40

If we use scoped sessions only one import of exa is possible per compute (Config). If we don't, multiple import exa statements can be run (e.g. in multiple notebooks) but we run the risk of db issues (non-atomic commits) issues.

Editor doesn't support multiline regex

Currently the exa.editor.Editor.regex method expects single line regex queries only. This is because editor lines are stored as a list of strings. Supporting multiline regex would require something like

'\n'.join(self._lines)

and then a re.search on the joined object. Also, the regex method currently only supports string arguments; no support for compiled re expressions (e.g. re.MULTILINE could be relevant).

Add tests for exa.numerical.Field, SparseSeries, and SparseDataFrame

Tests should follow the design (and be added to) exa.test.test_numerical.py

container.info() raises error when getting memory usage of Series

Use df_or_series.memory_usage() instead of adding numpy nbytes in exa.container.BaseContainer.info

Memory usage spiking when saving containers

A medium sized container (~2GB in memory) spikes the RAM usage above 16GB on save!!!

Container dunder methods

Reference commit #26 for a full list.

Need a method for capturing three.js animations and converting to video

Wrote capturing but there may be a more direct way? This caught they eye for capturing:
https://github.com/spite/ccapture.js/

Epic 1: Editor re-write

The Editor should be redesigned to be more efficient, and a StructuredTextEditor (e.g. csv) can serve as an example implementation.

Exa should not force creation of the .exa directory

Rather than requiring the creation of persistent storage, if the directory .exa doesn't exist, simply create the database in memory (sqlite://:memory:) and warn the user that CMS features will not be persistent. This will require retooling the exa.tools.finalize_install() function (or removing it - same for atomic) as well as changing the behavior of exa.config.Config.init to prevent creation of directories and files.

Update traits only as needed

Rather than updating all traits whenever _traits_need_update, we could update only the traits that acutally need updating. This could be accomplished by transforming the aforementioned variable into a list that tracks trait names (_tablename_traitname) that need to be updated (for example). Other more clever solns may exist.

Fontsizes in matplotlib are changed

Describe the bug
When import exa is uncommented in the included code I can no longer change the fontsize with the font variable in matplotlib. When I comment out import exa I have full control of the fontsize as expected. Is there some global variable in exa that causes this?

To Reproduce
With the code below, when import exa is commented out you can change the fontsize on the font variable and it will change the fontsize as expected. I put a fontsize of 1 just to make an extreme difference. But, with it uncommented as in the original example you cannot change it at all.

import numpy as np
import pandas as pd
import matplotlib
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import exa

# This is the variable that has to be changed to see the issue
# I think 9 or 10 is the default in matplotlib
font = {'size': 1}
matplotlib.rc('font', **font)

# This code was taken from a contour tutorial for matplotlib
matplotlib.rcParams['xtick.direction'] = 'out'
matplotlib.rcParams['ytick.direction'] = 'out'
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-2.0, 2.0, delta)
X, Y = np.meshgrid(x, y)
Z1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
# difference of Gaussians
Z = 10.0 * (Z2 - Z1)

# Create a simple contour plot with labels using default colors.  The
# inline argument to clabel will control whether the labels are draw
# over the line segments of the contour, removing the lines beneath
# the label
plt.figure()
CS = plt.contour(X, Y, Z)
plt.clabel(CS, inline=1, fontsize=10)
plt.title('Simplest default with labels')

plt.show()

Expected behavior
Control of the fontsize.

Desktop (please complete the following information):
Ubuntu 18.04
Python 3.7.1
Matplotlib 3.0.2 (from conda)

SparseDataFrame objects are not "sparsified" on init

See numerical.py; this has consequence on load (container.from_hdf) as well. Should consider changing the dfclasses attribute of the container to allow for wildcards (i.e. arbitrary dataframe names).

exa.Editor meta kwarg behaves as though it is class level

Experiencing the following bug with a minimal working example:

t = exa.Editor('')
t.meta['name'] = 'main'
t.meta

returns

{'name': 'main'}

as expected. Then creating a new instance

b = exa.Editor('')
b.meta['name'] = 'face'
b.meta

returns

{'name': 'face'}

also as expected. But then:

t.meta

returns

{'name': 'face'}

as though meta is a class-level attribute of the Editor, not an instance attribute.

Prep for a switch to ipywidgets 5.0

This involves some changes in the way custom widgets behave http://blog.jupyter.org/2016/04/22/ipywidgets5/

app3d red and neon green spheres don't look shaded when using spheres?

Is this a color issue not a material issue?

app3d wireframe doesn't display on windows os

Directx has an incompatibility with webgl lines (THREE.Line/LineSegments/LineBasicMaterial). Need to use care with these; check what is (currently) incompatible and fix. Likely just an option in the line's material.

Implement exa.relational.keywords

This module provides enhanced search functionality for Session, Program, Project, Job, Container, and File but is not yet implemented.

Improve the description of container objects

In exa.container's docstring it would be nice to have a fuller description of the concept of a container and how it facilitates interaction with a collection of (related or not) data objects.

Update how documents are built

Is your feature request related to a problem? Please describe.
It is difficult to compile documentation locally when developing and testing.

Describe the solution you'd like
It would be nice to have a script for compiling documentation locally.

Describe alternatives you've considered
The command line, pros are flexibility, cons are having to remember commands and options.

Create a container.add_numerical method

This method will be used to properly attached a dataframe including traits (and updating of traits).

Clean up field generation on JS side

Currently the Field class (on the python side) has the following columns:

nx, ny, nz, ox, oy, oz, xi, xj, xk, yi, yj, yk, zi, zj, zk

but the Field class (on the JS side) uses the following attributes (and assumes a cubic cube file):

nx, ny, nz, xmin, ymin, zmin, dx, dy, dz, (xmax, ymax, zmax, x[:], y[:], z[:], xvalues, yvalues, zvalues)

where the ones in parentheses need not be shipped back to python. If we have xmin, dx and nx we have x[:]/xvalues defined for us. The attributes on the JS side can be streamlined to reflect more closely the attributes on the python side, regardless of the internal implementation of x/y/zvalues on the JS side.

Networkx missing from requirements.txt

Describe the bug
Networkx is missing from the requirements.txt despite being used by the package's Container class.

Expected behavior
We expect all dependencies to be included in the requirements.txt file.

Repository size

The repository size is fairly large now and takes uncomfortably long to clone, for example. In this post: https://stackoverflow.com/questions/2116778/reduce-git-repository-size there are some hints on things that could be done to reduce the repo size. We should explore what can be done here.

cleanup relational.py

Remove all of the boilerplate metaclass=DimensionMeta i.e. simplify the inheritance of the dimension tables.

Should we use an alias table for units, constants, etc?

exa.tools.initialize_database force = True not implemented

Need to implement update logic in the case where a new isotope or unit conversion is added (maybe related to bulk update in relational?)

UnicodeDecodeError on some files -- Editor doesn't catch alternative encodings

An instance of a subclass of the Editor class for certain files yields the following stack trace:

/home/tjd/Programs/analytics-exa/alex/exatomic/exatomic/editor.py in __init__(self, sgtfo_func, cgtfo_func, *args, **kwargs)
    191 
    192     def __init__(self, *args, sgtfo_func=None, cgtfo_func=None, **kwargs):
--> 193         super().__init__(*args, **kwargs)
    194         self.meta = {'program': None}
    195         self._atom = None

/home/tjd/Programs/analytics-exa/alex/exa/exa/editor.py in __init__(self, data, filename, meta, as_interned, **kwargs)
    333             self._lines = data
    334         elif ispath:
--> 335             self._lines = lines_from_file(data, as_interned)
    336             self.filename = os.path.basename(data)
    337         elif isinstance(data, StringIO):

/home/tjd/Programs/analytics-exa/alex/exa/exa/editor.py in lines_from_file(path, as_interned)
    388             lines = [sys.intern(line) for line in f.read().splitlines()]
    389         else:
--> 390             lines = f.read().splitlines()
    391     return lines
    392 

/home/tjd/miniconda3/lib/python3.5/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 5720: invalid continuation byte

Can treat the symptoms with a try/except.

exa installer doesn't install nbextensions/libs/ directory

Should be fixed in exa/setup.py

Widget or Container should provided a pythonic api for GUI controls

This will simplify passing default options from the python side to the javascript side. Note that this API should be compatible with a standalone webapp so it may be advantageous for the Container to have this API - in principle the GUI for a type of data container is systematic regardless of whether it is being view in the notebook or in the standalone webapp

exa-analytics / exa Goto Github PK

exa's People

Contributors

Stargazers

Watchers

Forkers

exa's Issues

Recommend Projects

Recommend Topics

Recommend Org