nsls-ii / amostra Goto Github PK

View Code? Open in Web Editor NEW

1.0 10.0 8.0 362 KB

amostra is a collection of light-weight sample management classes.

Home Page: https://nsls-ii.github.io/amostra

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

amostra's Introduction

Amostra

Sample management backed by MongoDB

Free software: 3-clause BSD license
Documentation: (COMING SOON!) https://danielballan.github.io/amostra.

Features

TODO

amostra's People

Contributors

Stargazers

Watchers

Forkers

cj-wright chiahaoliu licode danielballan mrakitin ryde ke-zhang-rd junaishima

amostra's Issues

Bulk Insert

Allow bulk insert of containers, samples and requests.
The first two will be heavily used when importing data from files.

Allow update the "time" and "name" fields

Since we are allowed to update the entries for Sample, Container and others, it would be useful to also be allowed to update the time to reflect the last modified time.

Also we should be allowed to update the name field, references must be made to the "uid" and not name, so there is no problem in updating it.

Attn. @arkilic

Incorrect error message at Containers update.

See: https://github.com/NSLS-II/amostra/blob/master/amostra/server/engine.py#L367

It should be:

raise compose_err_msg(500,
                                  'Time, uid and name cannot be updated')

Or the name parameter must be removed from the verification.

Att. @arkilic

improve connection checking when connecting to MongoDB server

two references:

https://pymongo.readthedocs.io/en/stable/api/pymongo/mongo_client.html
https://github.com/mongodb/mongo-python-driver/blob/c8d920a46bfb7b054326b3e983943bfc794cb676/pymongo/mongo_client.py#L157

ContainerReference must raise HTTPError on duplicate uid

New template.json file may need as template to generate random fake sample data.

@danielballan

We want generate random fake sample data but human readable, not random unicode code point. In hypothesis, people could achieve this by alphabet in strategies.text, doc here.

We may want do same thing when call from_schema. Unfortunately or Fortunately, hypothesis-jsonschema doesn't allow people explicitly specify string alphabet. Alternatively, we could setup pattern in jsonschema. The role of current sample.json file is validation. A new file might be useful to separate template and validate roles.

update packaging to pyproject.toml

linting
(testing from setup.py)
(version - remove versioneer)

(edited to add features)

Separate client code from server

Code Walkthrough

@arkilic and I are going over Amostra locally and here are the bugs/improvements needed:

: Allow the sample name to be updated. Sometimes a typo happens and we need to fix it.
: Create a log file for the erros on the server side.
: Get rid of the local sample name lookup as a result to the first change needed.

To be continued...

get_schema is not working

sample_ref.get_schema()
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-12-bc74dccf93d4> in <module>()
----> 1 sample_ref.get_schema()

/home/slepicka/git/amostra/amostra/client/commands.py in get_schema(self)
    152         r = requests.get(self._server_path +
    153                         '/schema_ref', params=ujson.dumps('sample'))
--> 154         r.raise_for_status()
    155         return ujson.loads(r.text)
    156 

/home/slepicka/mc/envs/collection_dev/lib/python3.5/site-packages/requests/models.py in raise_for_status(self)
    838 
    839         if http_error_msg:
--> 840             raise HTTPError(http_error_msg, response=self)
    841 
    842     def close(self):

HTTPError: 404 Client Error: Not Found for url: http://localhost:7770//schema_ref?%22sample%22

Also get_schema is not present on the other References (Container & Requests).

ENH: reference files

I would like a way to reference files in amostra using filestore. This way we can load data associated with the sample. eg. keep a couple of potential atomic structures with the sample so that we can compare the the theoretical structure against the observed x-ray data.

Package installation not working

Fix the setup.py so I can use the 📦 in my tests!

make ansible role to transfer index creation changes into ansible roles

do this for analysisstore and conftrak as well

Fill out default fields on the server side

update tornado dependency

currently requiring tornado < 5 - with new releases, update code so the library can run with new tornado.

Validate chemical formulas

This library may be useful for validating 'composition'. Via @bruceravel: https://pypi.org/project/chempy/#parsing-formulae

Document REST API with swagger

A standard way to document REST APIs is with a swagger.io JSON file such as this example which can then be used to generate nice documentation.

Set up doctr

The documentation at https://nsls-ii.github.io/amostra is out of date. Follow https://nsls-ii.github.io/scientific-python-cookiecutter/publishing-docs.html to set up doctr.

The deploy command should be slightly different than the one in that documentation, though, because we publish to an orgnaization repo, NSLS-II/NSLS-II.github.io. The command should be:

doctr deploy --deploy-repo NSLS-II/NSLS-II.github.io --deploy-branch-name master amostra;

The link in the README, which currently points to my personal fork, should also be updated.

Is Diff better than whole document/sample in implementation of revision related methods?

I got this idea from here.

https://stackoverflow.com/questions/4185105/ways-to-implement-data-versioning-in-mongodb

Issue about recursion triggered

This issue is continue discuss of #44. But it should be readable independently.
Usually Sample could create object successfully like scenario below.

Success

>>> from amostra.objects import Sample
>>> foo = Sample(None, name='abc')
>>> foo

Sample(composition='', description='', name='abc', projects=[], tags=[])

Failure
However, if name is empty string '', recursion will be triggered. This was pointed by @tacaswell and I just copy past code here

>>> from amostra.objects import Sample
>>> foo = Sample(None, name='')
>>> foo

***Recursion maximum triggered

This issue tries to explain the details under the hood, but before there are some tips
Tips

Here are two types of attributes involved. compositioin, description and name are Unicode which is raw type. projects and tags are List which inherent from Instance in traitlets.
When Sample.__get__ is called, traitlets tries to get the value from Sample._trait_values which is a dict of attribute_name: attribute_value. If except, it tries to be smart to help you figure out(including call dynamic default to help you).
_validate function

    def _validate(self, obj, value):
        if value is None and self.allow_none:
            return value
        if hasattr(self, 'validate'):
            value = self.validate(obj, value)
        if obj._cross_validation_lock is False:
            value = self._cross_validate(obj, value)
        return value

Pipelines of success scenario

Initial default value or raw type(Unicode here) by traitlets which is empty string '' here
try to set argument to name by user's code, here set name='abc'
if past_value == value:
      skip validation
else:
    with self.hold_trait_notifications
    # set _cross_validation_lock = True when __enter__ and set it back to False when __exit__
          1. call user defined validation function, in amostra, it is _validate_with_jsonschema
          2. _validate_with_jsonschema invokes to_dict method which will call get for all attributes.
          3. traitlets find that projects wasn't initialized in here(not exist in foo._trait_values), it end up to call dynamic default method to generate default value.
          4. pass this dynamic default value to _validate function
          5. since _cross_validation_lock == True, SKIP self._cross_validate
          6. return value and finish self._trait_values['projects'] = value

Pipelines of failure scenario

>>> foo
***Recursion maximum triggered

run __repr__,
it eventually call __get__ then get method for all attributes
traitlets find that 'projects' wasn't initialized in here(not exist in foo._trait_values), it's call dynamic_default method to generate default value.
pass this dynamic default value to _validate function
since _cross_validation_lock == False, RUN self._cross_validate
self._cross_validate will trigger user defined validation function which is _validate_with_jsonschema
Our _validate_with_jsonschema invoke to_dict method which actually called __get__ for all attributes.
Goto 2 (Infinity loop build)

Solution(has been implement in #52)
The solution could be using a cross_validation_lock contextmanager to control steps 5 to break loop.

run __repr__,
with self.cross_validation_lock:
      2. it calls __get__ method for all attributes
      3. traitlets find that 'projects' wasn't initialized(not exist in foo._trait_values),it's call dynamic default method to generate default value.
      4. pass this dynamic default value to _validate function
      5. since _cross_validation_lock == False in context, SKIP self._cross_validate
      6. return value and finish self._trait_values['projects'] = value