nsls-ii / metadatastore Goto Github PK

View Code? Open in Web Editor NEW

2.0 19.0 11.0 1.67 MB

DEPRECATED: Incorporated into https://github.com/NSLS-II/databroker

License: Other

Shell 0.14% Python 52.26% Jupyter Notebook 47.60%

metadatastore's Introduction

This project has been incorporated into the databroker repository.

metadatastore's People

Contributors

Stargazers

Watchers

Forkers

tacaswell danielballan sameera2004 cowanml gitter-badger shengb giltis runt18 cj-wright licode stuartcampbell

metadatastore's Issues

Data keys in event_descriptor

Add a list of the keys for the event data dictionary to the event_descriptor

Sort out which `insert_*` functions should take custom dicts

To fully exploit all of the DynamicDocuments we have, we need a way to provide a way to shove random extra data in from the command api.

See #84 for a template.

cc @ericdill @danielballan @arkilic

remove time_as_datetime from Event

We do not want to support this long term and removing it later will require data migration. This needs to be done as soon as possible before we have lots of data to migrate.

Bulk Header Create Default-- CollectionAPI

Bulk header create does not create using the default values for owner, custom, etc.

BUG: Fully implement and document 'project' and 'group' in commands

insert_run_start does not accept or document the project or group fields. find_run_start does not document group as an accepted argument.

Error in sample_collection_code.py

Currently on master branch. Ran sample_collection_code.py and get the following error:

Traceback (most recent call last):
File "/home/edill/dev/python/metadataStore/example/sample_collection_code.py", line 71, in <module>
create_event(event)
File "/home/edill/dev/python/metadataStore/metadataStore/collectionapi/commands.py", line 106, in create_event
owner=owner, data=data)
File "/home/edill/dev/python/metadataStore/metadataStore/dataapi/commands.py", line 271, in insert_event
__validate_keys(formatted_data_keys, desc_data_keys)
File "/home/edill/dev/python/metadataStore/metadataStore/dataapi/commands.py", line 293, in __validate_keys
raise ValueError('Data keys for event data and descriptor data do not match! Check ' + str(key))
ValueError: Data keys for event data and descriptor data do not match! Check some_motor_1

Process finished with exit code 1

Fix documentation

Generated documentation (i.e., Sphinx) should be on a gh-pages branch and not under version control

CHX Demo data location

xf23id1-ioc2:/POOL/demo_data/CHX/C4b_T/250K$

The insert_run_start api doesn't match the RunStart Document

The insert_run_start function would suggest that time and beamline_id are the only required RunStart fields:

def insert_run_start(time, beamline_id, beamline_config=None, owner=None,
                     scan_id=None, custom=None, uid=None):

However, the fields in RunStart are this:

beamline_config, required
owner, required
scan_id, required
time, required
uid, required

beamline_id, optional
group, optional
project, optional
sample, optional

Owner is an optional kwarg to insert_run_start because we can programmatically get the logged in user
beamline_config could be guessed by doing something like grabbing the last beamline_config that mongo knows about, but that feels really dangerous
scan_id could be guessed by grabbing the last run_start and incrementing its scan_id by 1

However, these do not feel like terribly good ideas, as it should not be the job of mds to guess what the user wanted to do. That's what ophyd is for!

In the mean time, scan_id and beamline_config should not be kwargs. However, in the interest of @tacaswell 's sanity this change should wait until after tomorrow's deployment.

It would also be a good idea for someone to verify that the rest of the insert_* api's respect the required-ness as implemented by the Documents in odm_templates.py

Documentation

Use sphinx to create documentation for the code

WIN: import error

In [2]: from metadataStore.userapi.commands import search
---------------------------------------------------------------------------
NoSectionError                            Traceback (most recent call last)
<ipython-input-2-b282184031ce> in <module>()
----> 1 from metadataStore.userapi.commands import search

c:\dev\my_src\python\metadatastore\metadataStore\userapi\commands.py in <module>()
      8 from pymongo.errors import OperationFailure
      9
---> 10 from metadataStore.sessionManager.databaseInit import metadataLogger
     11
     12 from metadataStore.dataapi.raw_commands import save_header, save_beamline_config, insert_event, insert_event_descriptor, find

c:\dev\my_src\python\metadatastore\metadataStore\sessionManager\databaseInit.py in <module>()
      4 from pymongo.errors import ConnectionFailure
      5
----> 6 from metadataStore.config.parseConfig import database, host, port
      7 from metadataStore.sessionManager.databaseLogger import DbLogger
      8

c:\dev\my_src\python\metadatastore\metadataStore\config\parseConfig.py in <module>()
      4
      5
----> 6 database = conf_dict.get('metadataStore', 'database')
      7 host = conf_dict.get('metadataStore', 'host')
      8 port = conf_dict.get('metadataStore', 'port')

C:\Users\edill\AppData\Local\Continuum\Anaconda\lib\ConfigParser.pyc in get(self, section, option, raw, vars)
    605         except KeyError:
    606             if section != DEFAULTSECT:
--> 607                 raise NoSectionError(section)
    608         # Update with the entry specific variables
    609         vardict = {}

NoSectionError: No section: 'metadataStore'

have fall-back defaults if user-specified configuration can't be found

NoSection exceptions are really annoying

Alert user if limit cuts off results that would otherwise have been return.

Maybe do this in the broker, @arkilic suggests.

Empty search should return nothing

calling search() with no parameters should not return any results

Bulk Event record

event_type_descriptor_id is not retrieved during bulk inserts. This should be fixed inside metadataStore.dataapi.raw_commands.insert_bulk_event()

event_type_descriptor_id and header_id must be required fields for events so things of this nature will not happen in future development

Setup.py fails to find README.md

setup.py looks for README.md, can't find it and fails.

python setup.py install 0.0.2 Traceback (most recent call last): File "setup.py", line 80, in <module> long_description=read('README.md'), File "setup.py", line 34, in read return open(os.path.join(os.path.dirname(__file__), fname)).read() IOError: [Errno 2] No such file or directory: 'README.md'

Someday: Return generator that makes Event Documents from mongo Cursor

When there are huge numbers of Events:

(Document(e) for e in Cursor)

DbLogger cannot get root information to create new log file

https://gist.github.com/ccff2f5fb25a3a16d2b1.git

Add a search function that simply returns a list of events.

As per our discussion today, @arkilic.

Set up testing framework

As this code is currently being used to store actual commissioning data on 23-ID, it needs a comprehensive testing framework to ensure that bugs aren't introduced with updates to source code and changes to APIs

Changes to user and collection apis

Make it more user friendly

Add simple vistrails quick start and metadataStore related tools tutorial to documentation

If there is any external documentation, we could link it. We need a really simple how to guide for vistrails-metadataStore integration in order to guide users to utilize vistrails for their analysis

Events field is not returning

Description

Cleared the mongo database with the following:

ssh xf23id-broker
mongo
use metaDataStore
db.header.drop()
db.event_descriptor.drop()
db.beamline_config.drop()
db.event.drop()

grabbed the head of arkilic/metadataStore/dev branch
ran metadataStore/example/sample_code_userapi.py which added an entry to the newly emptied metaDataStore db
ran vistools/qt_apps/broker_query_example.py (from https://github.com/NSLS-II/vistools.git, vistrails_integration branch)
searched on 'owner' = edill and 'data' = True
results are shown in the screengrab below

if I use the search command from ipython, I still don't get an 'events' field, so it's not a problem with the query widget. screengrab of that shown below (top right corner of terminator)

Problem

The events field is not showing up in the dictionaries returned by metadataStore.userapi.commands.search()

Fix collection api search

search() finds the right header but cannot parse the correct event_descriptor and events. Also, make sure this does not originate from bulk event insert

Searching with data=True or data=False returns all entries

Description

calling userapi.commands.search(data=True) or userapi.commands.search(data=False) returns all entries in the mongo db

Revert from hardcoded config info to dataBroker.conf text file

Add EXLog config parameters to dataBroker.conf

API into the channel archiver

I realize that this issue is really more related to the channel archiver, but I'm leaving it here as a placeholder for now.

I can see both of these as valid options for the api into the channel archiver.

Option 1:

def get_pv(time_list, pv_name, beamline_id, interpolation_type):
    '''
    Parameters
    ----------
    time_list: list
        list of times to obtain the values of the pv for
    pv_name: string
        pv name that the channel archiver knows about
    beamline_id: string
        Not sure if this is needed, since the channel archivers are specific to the 
        beamline (i think?)
    interpolation_type: string
        Type of interpolation to use: easy examples are linear and step function

    Returns
    -------
    list
        List of pv values interpolated based on "interpolation_type" at times 
        occurring at the values in "time_list"
   '''

Option 2:

def get_pv(t_start, t_finish, pv_name, beamline_id):
    '''
    Parameters
    ----------
    t_start : datetime.datetime
        earliest time to obtain the values of pv_name
    t_finish : datetime.datetime
        latest time to obtain the values of pv_name
    pv_name : string
        pv name that the channel archiver knows about
    beamline_id : string
        Not sure if this is needed, since the channel archivers are specific to the beamline (i think?)

    Returns
    -------
    pv_vals, time
    pv_vals : list
        List of all pv values
    time : list of datetime objects
        time stamps of pv values 
   '''

Thoughts @arkilic @tacaswell ?

Event_descriptor key

The event_descriptor_# should be nested inside an event_descriptor key so that it is always clear where you should go (programatically) to retrieve the event_descriptors

CamelCase needs to go away in this repository

We really should get rid of the camelcase in this repo. The specific places I'm thinking of are:

Repo name: metadataStore -> metadatastore
Package name: metadataStore -> metadatastore
mongo database name: (metaDataStore, metadataStore) -> metadatastore

put the conf file in XDG compliant location

http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html

I have some experience with setting this up, mpl is compliant.

find_last should be able to return last N entries given int param

Change time format convention

Convert time from datetime to Unix time

Run header version id field in run header

Add a run header version field to the run header so that we can keep track of the expected fields coming out of the data broker.

Figure out LDAP for owner authentication

Mostly relevant to collection-time

Allow searching on parent Document's uid.

e.g., find_event_descriptors(run_start_uid=...)

Is this something that can be done, @arkilic? As discussed on the train, I agreewe should prefer our uids over mongo's ids wherever possible, but the commands don't currently support that.

delaying creation of event_descriptors

I had a conversation with @dchabot many weeks ago where he made the (correct) point that the creation of event_descriptors should be delayed as long as possible, which means that they should be created just before the insertion of the first event. The recent changes to mds, especially here seem to be moving in the opposite direction because now event_descriptors need to be created before the run_header gets created. Given that I seem to have missed a great deal in the last week, @dchabot are you now thinking that we don't need to defer the creation of event descriptors as late as possible?

Convert data in run header to a table (probably using pandas)

More information to come later

fix int validation in DataKeys.shape

Events should have a time

Each event needs to have a time associated with it

Brainstorm from Dan & Arman during MDS refactor

Random thoughts occurred to us while refactoring MDS to meet the jsonspec.

Should BeamlineConfig be an EmbeddedDocument? Lookup would be faster (fewer queries) at a small but significant storage cost. This scales with runs, not events or descriptors, the cost would be moderate.
Neither of us like the names BeginRunEvent and EndRunEvent and might prefer RunHead and RunTail.

Add find_last that returns mongo-like dicts

Plan to revise MDS API for more consistency and more power

From a discussion with @arkilic @tacaswell @ericdill and myself, here is how we want to make the MDS API consistent:

All find_* functions have a signature like find_run_start(**kwargs). The keyword arguments can be:

Documents that reference the document searched for: e.g., find_events(event_descriptor=ev_desc).
Attributes of the document searched for: e.g., find_run_start(owner='dallan')
Arbitrary queres that are passed to mongo as __raw__.

This means that fetch_events is removed and merged with find_events, and that all find_* functions need to accept kwargs.

Meanwhile, in databroker, DataBroker.find_headers has some pre-defined kwargs and open-ended **kwargs, accepting arbitrary mongo queries. The defined kwargs are like

find_headers(start_time=None, end_time=None, owner=None, scan_id=None, ..., **kwargs)

These kwargs fall into two categories:

Queries that will be fast (things we index on, like owner)
Common, difficult mongo queries (e.g. measuring='motor1', which looks into data_keys).

search on _id

Having the ability to search on the hashed header id (_id) makes my life significantly easier when integrating into VisTrails, since I need to guarantee that I've returned a unique header

Fix whatever the heck happened with the reverting and re-reverting

Whatever happened with the reverting merges and re-reverting merges needs to be cleaned up in the history. I cant even follow what happened. Let's meet later in the week or next week, as python training is happening this week

add alias to all odm templates

This is so that we can control the server of mds, fs, and analysis store separately if we want to.

different return from search

Make a search_2 function in userapi.commands that returns header_dict, event_dict, and event_descriptor_dict. I'm suggesting a search_2 function so that when people change their mind again about what the run header format should be, the work done for search is not lost...

Each of the dictionaries is keyed on _id so that you can access the header and event_descriptor for a given event by doing header_dict[event.header_id] and event_descriptor_dict[event.descriptor_id] This will make it significantly easier to combine the events from multiple run headers

nsls-ii / metadatastore Goto Github PK

metadatastore's Introduction

metadatastore's People

Contributors

Stargazers

Watchers

Forkers

metadatastore's Issues

It would also be a good idea for someone to verify that the rest of the insert_* api's respect the required-ness as implemented by the Documents in odm_templates.py

Description

Problem

Description

Recommend Projects

Recommend Topics

Recommend Org