Code Monkey home page Code Monkey logo

metadatastore's Introduction

This project has been incorporated into the databroker repository.

metadatastore's People

Contributors

arkilic avatar cj-wright avatar cowanml avatar danielballan avatar ericdill avatar klauer avatar licode avatar tacaswell avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metadatastore's Issues

remove time_as_datetime from Event

We do not want to support this long term and removing it later will require data migration. This needs to be done as soon as possible before we have lots of data to migrate.

Error in sample_collection_code.py

Currently on master branch. Ran sample_collection_code.py and get the following error:

Traceback (most recent call last):
File "/home/edill/dev/python/metadataStore/example/sample_collection_code.py", line 71, in <module>
create_event(event)
File "/home/edill/dev/python/metadataStore/metadataStore/collectionapi/commands.py", line 106, in create_event
owner=owner, data=data)
File "/home/edill/dev/python/metadataStore/metadataStore/dataapi/commands.py", line 271, in insert_event
__validate_keys(formatted_data_keys, desc_data_keys)
File "/home/edill/dev/python/metadataStore/metadataStore/dataapi/commands.py", line 293, in __validate_keys
raise ValueError('Data keys for event data and descriptor data do not match! Check ' + str(key))
ValueError: Data keys for event data and descriptor data do not match! Check some_motor_1

Process finished with exit code 1 

Fix documentation

Generated documentation (i.e., Sphinx) should be on a gh-pages branch and not under version control

The insert_run_start api doesn't match the RunStart Document

The insert_run_start function would suggest that time and beamline_id are the only required RunStart fields:

def insert_run_start(time, beamline_id, beamline_config=None, owner=None,
                     scan_id=None, custom=None, uid=None):

However, the fields in RunStart are this:

beamline_config, required
owner, required
scan_id, required
time, required
uid, required

beamline_id, optional
group, optional
project, optional
sample, optional
  • Owner is an optional kwarg to insert_run_start because we can programmatically get the logged in user
  • beamline_config could be guessed by doing something like grabbing the last beamline_config that mongo knows about, but that feels really dangerous
  • scan_id could be guessed by grabbing the last run_start and incrementing its scan_id by 1

However, these do not feel like terribly good ideas, as it should not be the job of mds to guess what the user wanted to do. That's what ophyd is for!

In the mean time, scan_id and beamline_config should not be kwargs. However, in the interest of @tacaswell 's sanity this change should wait until after tomorrow's deployment.

It would also be a good idea for someone to verify that the rest of the insert_* api's respect the required-ness as implemented by the Documents in odm_templates.py

WIN: import error

In [2]: from metadataStore.userapi.commands import search
---------------------------------------------------------------------------
NoSectionError                            Traceback (most recent call last)
<ipython-input-2-b282184031ce> in <module>()
----> 1 from metadataStore.userapi.commands import search

c:\dev\my_src\python\metadatastore\metadataStore\userapi\commands.py in <module>()
      8 from pymongo.errors import OperationFailure
      9
---> 10 from metadataStore.sessionManager.databaseInit import metadataLogger
     11
     12 from metadataStore.dataapi.raw_commands import save_header, save_beamline_config, insert_event, insert_event_descriptor, find

c:\dev\my_src\python\metadatastore\metadataStore\sessionManager\databaseInit.py in <module>()
      4 from pymongo.errors import ConnectionFailure
      5
----> 6 from metadataStore.config.parseConfig import database, host, port
      7 from metadataStore.sessionManager.databaseLogger import DbLogger
      8

c:\dev\my_src\python\metadatastore\metadataStore\config\parseConfig.py in <module>()
      4
      5
----> 6 database = conf_dict.get('metadataStore', 'database')
      7 host = conf_dict.get('metadataStore', 'host')
      8 port = conf_dict.get('metadataStore', 'port')

C:\Users\edill\AppData\Local\Continuum\Anaconda\lib\ConfigParser.pyc in get(self, section, option, raw, vars)
    605         except KeyError:
    606             if section != DEFAULTSECT:
--> 607                 raise NoSectionError(section)
    608         # Update with the entry specific variables
    609         vardict = {}

NoSectionError: No section: 'metadataStore'

Bulk Event record

event_type_descriptor_id is not retrieved during bulk inserts. This should be fixed inside metadataStore.dataapi.raw_commands.insert_bulk_event()

event_type_descriptor_id and header_id must be required fields for events so things of this nature will not happen in future development

Setup.py fails to find README.md

setup.py looks for README.md, can't find it and fails.

python setup.py install 0.0.2 Traceback (most recent call last): File "setup.py", line 80, in <module> long_description=read('README.md'), File "setup.py", line 34, in read return open(os.path.join(os.path.dirname(__file__), fname)).read() IOError: [Errno 2] No such file or directory: 'README.md'

Set up testing framework

As this code is currently being used to store actual commissioning data on 23-ID, it needs a comprehensive testing framework to ensure that bugs aren't introduced with updates to source code and changes to APIs

Events field is not returning

Description

  • Cleared the mongo database with the following:
ssh xf23id-broker
mongo
use metaDataStore
db.header.drop()
db.event_descriptor.drop()
db.beamline_config.drop()
db.event.drop()
  • grabbed the head of arkilic/metadataStore/dev branch
  • ran metadataStore/example/sample_code_userapi.py which added an entry to the newly emptied metaDataStore db
  • ran vistools/qt_apps/broker_query_example.py (from https://github.com/NSLS-II/vistools.git, vistrails_integration branch)
  • searched on 'owner' = edill and 'data' = True
  • results are shown in the screengrab below

screenshot from 2014-08-13 09 29 20

  • if I use the search command from ipython, I still don't get an 'events' field, so it's not a problem with the query widget. screengrab of that shown below (top right corner of terminator)

screenshot from 2014-08-13 09 31 04

Problem

The events field is not showing up in the dictionaries returned by metadataStore.userapi.commands.search()

Fix collection api search

search() finds the right header but cannot parse the correct event_descriptor and events. Also, make sure this does not originate from bulk event insert

API into the channel archiver

I realize that this issue is really more related to the channel archiver, but I'm leaving it here as a placeholder for now.

I can see both of these as valid options for the api into the channel archiver.

Option 1:

def get_pv(time_list, pv_name, beamline_id, interpolation_type):
    '''
    Parameters
    ----------
    time_list: list
        list of times to obtain the values of the pv for
    pv_name: string
        pv name that the channel archiver knows about
    beamline_id: string
        Not sure if this is needed, since the channel archivers are specific to the 
        beamline (i think?)
    interpolation_type: string
        Type of interpolation to use: easy examples are linear and step function

    Returns
    -------
    list
        List of pv values interpolated based on "interpolation_type" at times 
        occurring at the values in "time_list"
   '''

Option 2:

def get_pv(t_start, t_finish, pv_name, beamline_id):
    '''
    Parameters
    ----------
    t_start : datetime.datetime
        earliest time to obtain the values of pv_name
    t_finish : datetime.datetime
        latest time to obtain the values of pv_name
    pv_name : string
        pv name that the channel archiver knows about
    beamline_id : string
        Not sure if this is needed, since the channel archivers are specific to the beamline (i think?)

    Returns
    -------
    pv_vals, time
    pv_vals : list
        List of all pv values
    time : list of datetime objects
        time stamps of pv values 
   '''

Thoughts @arkilic @tacaswell ?

Event_descriptor key

The event_descriptor_# should be nested inside an event_descriptor key so that it is always clear where you should go (programatically) to retrieve the event_descriptors

CamelCase needs to go away in this repository

We really should get rid of the camelcase in this repo. The specific places I'm thinking of are:

  1. Repo name: metadataStore -> metadatastore
  2. Package name: metadataStore -> metadatastore
  3. mongo database name: (metaDataStore, metadataStore) -> metadatastore

Allow searching on parent Document's uid.

e.g., find_event_descriptors(run_start_uid=...)

Is this something that can be done, @arkilic? As discussed on the train, I agreewe should prefer our uids over mongo's ids wherever possible, but the commands don't currently support that.

delaying creation of event_descriptors

I had a conversation with @dchabot many weeks ago where he made the (correct) point that the creation of event_descriptors should be delayed as long as possible, which means that they should be created just before the insertion of the first event. The recent changes to mds, especially here seem to be moving in the opposite direction because now event_descriptors need to be created before the run_header gets created. Given that I seem to have missed a great deal in the last week, @dchabot are you now thinking that we don't need to defer the creation of event descriptors as late as possible?

Brainstorm from Dan & Arman during MDS refactor

Random thoughts occurred to us while refactoring MDS to meet the jsonspec.

  • Should BeamlineConfig be an EmbeddedDocument? Lookup would be faster (fewer queries) at a small but significant storage cost. This scales with runs, not events or descriptors, the cost would be moderate.
  • Neither of us like the names BeginRunEvent and EndRunEvent and might prefer RunHead and RunTail.

Plan to revise MDS API for more consistency and more power

From a discussion with @arkilic @tacaswell @ericdill and myself, here is how we want to make the MDS API consistent:

All find_* functions have a signature like find_run_start(**kwargs). The keyword arguments can be:

  1. Documents that reference the document searched for: e.g., find_events(event_descriptor=ev_desc).
  2. Attributes of the document searched for: e.g., find_run_start(owner='dallan')
  3. Arbitrary queres that are passed to mongo as __raw__.

This means that fetch_events is removed and merged with find_events, and that all find_* functions need to accept kwargs.

Meanwhile, in databroker, DataBroker.find_headers has some pre-defined kwargs and open-ended **kwargs, accepting arbitrary mongo queries. The defined kwargs are like

find_headers(start_time=None, end_time=None, owner=None, scan_id=None, ..., **kwargs)

These kwargs fall into two categories:

  1. Queries that will be fast (things we index on, like owner)
  2. Common, difficult mongo queries (e.g. measuring='motor1', which looks into data_keys).

search on _id

Having the ability to search on the hashed header id (_id) makes my life significantly easier when integrating into VisTrails, since I need to guarantee that I've returned a unique header

different return from search

Make a search_2 function in userapi.commands that returns header_dict, event_dict, and event_descriptor_dict. I'm suggesting a search_2 function so that when people change their mind again about what the run header format should be, the work done for search is not lost...

Each of the dictionaries is keyed on _id so that you can access the header and event_descriptor for a given event by doing header_dict[event.header_id] and event_descriptor_dict[event.descriptor_id] This will make it significantly easier to combine the events from multiple run headers

ObjectID in the run_header

VisTrails is whining about not knowing what an ObjectID is. Can you remove that from the run_header or will that blow a whole bunch of stuff up?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.