This project has been incorporated into the databroker repository.
nsls-ii / metadatastore Goto Github PK
View Code? Open in Web Editor NEWDEPRECATED: Incorporated into https://github.com/NSLS-II/databroker
License: Other
DEPRECATED: Incorporated into https://github.com/NSLS-II/databroker
License: Other
This project has been incorporated into the databroker repository.
Add a list of the keys for the event data dictionary to the event_descriptor
To fully exploit all of the DynamicDocuments we have, we need a way to provide a way to shove random extra data in from the command api.
See #84 for a template.
We do not want to support this long term and removing it later will require data migration. This needs to be done as soon as possible before we have lots of data to migrate.
Bulk header create does not create using the default values for owner, custom, etc.
insert_run_start
does not accept or document the project
or group
fields. find_run_start
does not document group
as an accepted argument.
Currently on master branch. Ran sample_collection_code.py and get the following error:
Traceback (most recent call last):
File "/home/edill/dev/python/metadataStore/example/sample_collection_code.py", line 71, in <module>
create_event(event)
File "/home/edill/dev/python/metadataStore/metadataStore/collectionapi/commands.py", line 106, in create_event
owner=owner, data=data)
File "/home/edill/dev/python/metadataStore/metadataStore/dataapi/commands.py", line 271, in insert_event
__validate_keys(formatted_data_keys, desc_data_keys)
File "/home/edill/dev/python/metadataStore/metadataStore/dataapi/commands.py", line 293, in __validate_keys
raise ValueError('Data keys for event data and descriptor data do not match! Check ' + str(key))
ValueError: Data keys for event data and descriptor data do not match! Check some_motor_1
Process finished with exit code 1
Generated documentation (i.e., Sphinx) should be on a gh-pages
branch and not under version control
xf23id1-ioc2:/POOL/demo_data/CHX/C4b_T/250K$
The insert_run_start function would suggest that time and beamline_id are the only required RunStart fields:
def insert_run_start(time, beamline_id, beamline_config=None, owner=None,
scan_id=None, custom=None, uid=None):
However, the fields in RunStart are this:
beamline_config, required
owner, required
scan_id, required
time, required
uid, required
beamline_id, optional
group, optional
project, optional
sample, optional
However, these do not feel like terribly good ideas, as it should not be the job of mds to guess what the user wanted to do. That's what ophyd is for!
In the mean time, scan_id
and beamline_config
should not be kwargs. However, in the interest of @tacaswell 's sanity this change should wait until after tomorrow's deployment.
Use sphinx to create documentation for the code
In [2]: from metadataStore.userapi.commands import search
---------------------------------------------------------------------------
NoSectionError Traceback (most recent call last)
<ipython-input-2-b282184031ce> in <module>()
----> 1 from metadataStore.userapi.commands import search
c:\dev\my_src\python\metadatastore\metadataStore\userapi\commands.py in <module>()
8 from pymongo.errors import OperationFailure
9
---> 10 from metadataStore.sessionManager.databaseInit import metadataLogger
11
12 from metadataStore.dataapi.raw_commands import save_header, save_beamline_config, insert_event, insert_event_descriptor, find
c:\dev\my_src\python\metadatastore\metadataStore\sessionManager\databaseInit.py in <module>()
4 from pymongo.errors import ConnectionFailure
5
----> 6 from metadataStore.config.parseConfig import database, host, port
7 from metadataStore.sessionManager.databaseLogger import DbLogger
8
c:\dev\my_src\python\metadatastore\metadataStore\config\parseConfig.py in <module>()
4
5
----> 6 database = conf_dict.get('metadataStore', 'database')
7 host = conf_dict.get('metadataStore', 'host')
8 port = conf_dict.get('metadataStore', 'port')
C:\Users\edill\AppData\Local\Continuum\Anaconda\lib\ConfigParser.pyc in get(self, section, option, raw, vars)
605 except KeyError:
606 if section != DEFAULTSECT:
--> 607 raise NoSectionError(section)
608 # Update with the entry specific variables
609 vardict = {}
NoSectionError: No section: 'metadataStore'
NoSection exceptions are really annoying
Maybe do this in the broker, @arkilic suggests.
calling search() with no parameters should not return any results
event_type_descriptor_id is not retrieved during bulk inserts. This should be fixed inside metadataStore.dataapi.raw_commands.insert_bulk_event()
event_type_descriptor_id and header_id must be required fields for events so things of this nature will not happen in future development
setup.py looks for README.md, can't find it and fails.
python setup.py install 0.0.2 Traceback (most recent call last): File "setup.py", line 80, in <module> long_description=read('README.md'), File "setup.py", line 34, in read return open(os.path.join(os.path.dirname(__file__), fname)).read() IOError: [Errno 2] No such file or directory: 'README.md'
When there are huge numbers of Events:
(Document(e) for e in Cursor)
As per our discussion today, @arkilic.
As this code is currently being used to store actual commissioning data on 23-ID, it needs a comprehensive testing framework to ensure that bugs aren't introduced with updates to source code and changes to APIs
Make it more user friendly
If there is any external documentation, we could link it. We need a really simple how to guide for vistrails-metadataStore integration in order to guide users to utilize vistrails for their analysis
ssh xf23id-broker
mongo
use metaDataStore
db.header.drop()
db.event_descriptor.drop()
db.beamline_config.drop()
db.event.drop()
The events field is not showing up in the dictionaries returned by metadataStore.userapi.commands.search()
search() finds the right header but cannot parse the correct event_descriptor and events. Also, make sure this does not originate from bulk event insert
calling userapi.commands.search(data=True)
or userapi.commands.search(data=False)
returns all entries in the mongo db
I realize that this issue is really more related to the channel archiver, but I'm leaving it here as a placeholder for now.
I can see both of these as valid options for the api into the channel archiver.
Option 1:
def get_pv(time_list, pv_name, beamline_id, interpolation_type):
'''
Parameters
----------
time_list: list
list of times to obtain the values of the pv for
pv_name: string
pv name that the channel archiver knows about
beamline_id: string
Not sure if this is needed, since the channel archivers are specific to the
beamline (i think?)
interpolation_type: string
Type of interpolation to use: easy examples are linear and step function
Returns
-------
list
List of pv values interpolated based on "interpolation_type" at times
occurring at the values in "time_list"
'''
Option 2:
def get_pv(t_start, t_finish, pv_name, beamline_id):
'''
Parameters
----------
t_start : datetime.datetime
earliest time to obtain the values of pv_name
t_finish : datetime.datetime
latest time to obtain the values of pv_name
pv_name : string
pv name that the channel archiver knows about
beamline_id : string
Not sure if this is needed, since the channel archivers are specific to the beamline (i think?)
Returns
-------
pv_vals, time
pv_vals : list
List of all pv values
time : list of datetime objects
time stamps of pv values
'''
Thoughts @arkilic @tacaswell ?
The event_descriptor_# should be nested inside an event_descriptor key so that it is always clear where you should go (programatically) to retrieve the event_descriptors
We really should get rid of the camelcase in this repo. The specific places I'm thinking of are:
http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
I have some experience with setting this up, mpl is compliant.
Convert time from datetime to Unix time
Add a run header version field to the run header so that we can keep track of the expected fields coming out of the data broker.
Mostly relevant to collection-time
e.g., find_event_descriptors(run_start_uid=...)
Is this something that can be done, @arkilic? As discussed on the train, I agreewe should prefer our uids over mongo's ids wherever possible, but the commands don't currently support that.
I had a conversation with @dchabot many weeks ago where he made the (correct) point that the creation of event_descriptors should be delayed as long as possible, which means that they should be created just before the insertion of the first event. The recent changes to mds, especially here seem to be moving in the opposite direction because now event_descriptors need to be created before the run_header gets created. Given that I seem to have missed a great deal in the last week, @dchabot are you now thinking that we don't need to defer the creation of event descriptors as late as possible?
More information to come later
Each event needs to have a time associated with it
Random thoughts occurred to us while refactoring MDS to meet the jsonspec.
BeamlineConfig
be an EmbeddedDocument
? Lookup would be faster (fewer queries) at a small but significant storage cost. This scales with runs, not events or descriptors, the cost would be moderate.BeginRunEvent
and EndRunEvent
and might prefer RunHead
and RunTail
.From a discussion with @arkilic @tacaswell @ericdill and myself, here is how we want to make the MDS API consistent:
All find_*
functions have a signature like find_run_start(**kwargs)
. The keyword arguments can be:
find_events(event_descriptor=ev_desc)
.find_run_start(owner='dallan')
__raw__
.This means that fetch_events
is removed and merged with find_events
, and that all find_*
functions need to accept kwargs.
Meanwhile, in databroker, DataBroker.find_headers
has some pre-defined kwargs and open-ended **kwargs
, accepting arbitrary mongo queries. The defined kwargs are like
find_headers(start_time=None, end_time=None, owner=None, scan_id=None, ..., **kwargs)
These kwargs fall into two categories:
measuring='motor1'
, which looks into data_keys
).Having the ability to search on the hashed header id (_id
) makes my life significantly easier when integrating into VisTrails, since I need to guarantee that I've returned a unique header
Whatever happened with the reverting merges and re-reverting merges needs to be cleaned up in the history. I cant even follow what happened. Let's meet later in the week or next week, as python training is happening this week
This is so that we can control the server of mds, fs, and analysis store separately if we want to.
Make a search_2
function in userapi.commands
that returns header_dict
, event_dict
, and event_descriptor_dict
. I'm suggesting a search_2
function so that when people change their mind again about what the run header format should be, the work done for search
is not lost...
Each of the dictionaries is keyed on _id
so that you can access the header and event_descriptor for a given event by doing header_dict[event.header_id]
and event_descriptor_dict[event.descriptor_id]
This will make it significantly easier to combine the events from multiple run headers
The project
field in the RunStart object needs to be exposed through the insert_run_start
function
VisTrails is whining about not knowing what an ObjectID is. Can you remove that from the run_header or will that blow a whole bunch of stuff up?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.