pennmem / cmlreaders Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 6.0 20.04 MB

CML data reading made easier...

Home Page: https://pennmem.github.io/cmlreaders/html/index.html

Python 99.68% Batchfile 0.03% Shell 0.29%

cmlreaders's People

Contributors

Stargazers

Watchers

Forkers

loganjf seqasim effie-li ctw dhalpern

cmlreaders's Issues

New release

I think it's about time for a new release (mainly for a new conda package for easier feedback from users). Things to do first:

Update README by removing the proposal and showing examples of what is actually implemented now
Ensure documentation is up to date and rebuild
Do some QA testing on the EEG reading (e.g., make sure that when loading by events we get the right clips)

Since we've added quite a bit of functionality, should we call this version 0.4?

Loading pairs/contacts fails for subjects with _1

For incremented montage numbers, the subject code appears as (e.g.) R1006P_1 in pairs.json. This results in a key error when reading this data.

Improve usability of specific readers

Generally, we want to use CMLReader to interact with specific readers indirectly. But there are some use cases where other readers are useful, such as the one for reading Ramulator event log files (this is helpful for me when analyzing output from tests). At present, it's a bit awkward to use:

reader = RamulatorEventLogReader("experiment_log", "fake subject", "fake experiment", 0, file_path="event_log.json")
events = reader.as_dataframe()

All that I really need to pass in this case is that path to the file, but the constructor requires subject, experiment, and session. I propose we add a new class method which works like this:

events = RamulatorEventLogReader.fromfile("event_log.json")

This has the advantage of not making the constructor overly complicated (e.g., arguments are optional if file_path is not specified, otherwise they are required).

Allow setting root directory with an environment variable

It's annoying to have to remember to always specify the rootdir keyword argument. We should instead allow a RHINO_ROOT environment variable to be defined which is used by default unless rootdir is specified to override (/ is still default if neither is done).

Allow loading multiple sessions worth of EEG data when loading via events

CML Data Readers version: 0.7.1
Python version: 3.6
Operating System:

The "load_eeg" method of CMLReader should handle multi-session EEG loading better: If a user has passed events from multiple sessions, the reader object should not look for (or require) the "session" argument when it was instantiated. All the information it needs to load the EEG is in the events variable itself (which I think is kind of the whole point).

Right now, you get an error if you try to run the following code:

from cmlreaders import CMLReader, get_data_index
df = get_data_index("r1")
s = 'R1111M'
exp = 'FR1'

sessions = df[np.logical_and(df["subject"] == s, df['experiment']==exp)]['session'].unique()

#Load events from all sessions
events = pd.concat([
    CMLReader(s, exp, session).load("events")
    for session in sessions
])

#Just get word events
word_events = events[events.type=='WORD']

#Get EEG
reader = CMLReader(s, exp)
pairs = CMLReader(s, exp).load("pairs")
eeg = reader.load_eeg(events=word_events, rel_start=-100, rel_stop=1700, scheme=pairs)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-62-37e9cfb3b0f4> in <module>()
     18 reader = CMLReader(s, exp)
     19 pairs = CMLReader(s, exp).load("pairs")
---> 20 eeg = reader.load_eeg(events=word_events, rel_start=-100, rel_stop=1700, scheme=pairs)

~/anaconda3/envs/CML/lib/python3.6/site-packages/cmlreaders-0.7.1-py3.6.egg/cmlreaders/cmlreader.py in load_eeg(self, events, rel_start, rel_stop, epochs, scheme)
    268             })
    269 
--> 270         return self.load('eeg', **kwargs)

~/anaconda3/envs/CML/lib/python3.6/site-packages/cmlreaders-0.7.1-py3.6.egg/cmlreaders/cmlreader.py in load(self, data_type, file_path, **kwargs)
    198                    montage=self.montage,
    199                    file_path=file_path,
--> 200                    rootdir=self.rootdir).load(**kwargs)
    201 
    202     def load_eeg(self, events: Optional[pd.DataFrame] = None,

~/anaconda3/envs/CML/lib/python3.6/site-packages/cmlreaders-0.7.1-py3.6.egg/cmlreaders/readers/eeg.py in load(self, **kwargs)
    286                             rootdir=self.rootdir)
    287 
--> 288         path = Path(finder.find('sources'))
    289         with path.open() as metafile:
    290             self.sources_info = json.load(metafile,

~/anaconda3/envs/CML/lib/python3.6/site-packages/cmlreaders-0.7.1-py3.6.egg/cmlreaders/path_finder.py in find(self, data_type)
    120             raise InvalidDataTypeRequest("Unknown data type")
    121 
--> 122         expected_path = self._lookup_file(data_type)
    123 
    124         return expected_path

~/anaconda3/envs/CML/lib/python3.6/site-packages/cmlreaders-0.7.1-py3.6.egg/cmlreaders/path_finder.py in _lookup_file(self, data_type)
    160                                                session=self.session,
    161                                                localization=self.localization,
--> 162                                                montage=self.montage)
    163         return expected_path
    164 

~/anaconda3/envs/CML/lib/python3.6/site-packages/cmlreaders-0.7.1-py3.6.egg/cmlreaders/path_finder.py in _find_single_path(self, paths, **kwargs)
    210         if len(found_files) == 0:
    211             raise FileNotFoundError("Unable to find the requested file in any "
--> 212                                     "of the expected locations:\n {}".format('\n'.join(checked_paths)))
    213 
    214         if len(found_files) > 1:

FileNotFoundError: Unable to find the requested file in any of the expected locations:
 /protocols/r1/subjects/R1111M/experiments/FR1/sessions/None/ephys/current_processed/sources.json

Because load_eeg sees that "sessions" is None, even though it doesn't need the user to supply that information if it already has a dataframe of events.

If the session numbers in events exactly matches a session number supplied when the object was instantiated, the reader works. But this almost defeats the purpose of being able to give the reader arbitrary lists of events.

r1.json path should be included

I should be able to use the path finder to get the path to r1.json.

Fix dependency on ramutils

ramutils is required to run some tests, but the ramutils conda package is broken if we want to use different versions of other dependencies in cmlreaders (see pennmem/ram_utils#219).

Improve performance of EEG loading

When rereferencing/filtering, we currently load in all data first before dropping what we don't need. Instead, we should infer which data we need and only load that.

Add support for filtering by contacts

Right now, the EEG reader allows you to filter based on events, but another common use case is to only load eeg corresponding to particular channels. @mivade's proposed API was:

# get electrode contact info as a DataFrame
# this will have contact labels, locations, regions, coordinates, etc.
contacts = reader.load('contacts')

# or specify only contacts that are located in the MTL
# require_monopolar will raise an exception if monopolar is not possible
subset_eeg = reader.load('eeg',
                         contacts=contacts[contacts.region == 'MTL'],
                         require_monopolar=True)

Data Quality Checks

As part of pre-release testing, cmlreaders should be used to conduct some high level data quality checks:

Confirm that every EEG session can be loaded
Confirm that the number of recorded channels matches the number of channels in pairs.json (hardware bipolar mode) or contacts.json (monopolar recordings)

Ensure compatibility with Jupyter

Unless a user is paying careful attention, Jupyter, CMLReader, pandas, and MNE have dependency conflicts, causing Jupyter kernel errors. Some of these might not be obvious to a developer who is testing functionality in plain Python or iPython.

Usually, it seems Jupyter issues can be resolved by simply re-installing Jupyter ('conda install jupyter') after they've installed other packages. But this isn't ideal, especially if we're interested in getting first-time users up-and-running with a simple conda install. The current setup virtually guarantees that new lab members will attempt to get a CML/PTSA/Jupyter Anaconda environment working, fail, and come to others looking for help.

I've found that for Python 3 environments, at the least:

Users will need to ensure pandas 0.22 or 0.23 is installed from conda-forge.
Users will need to re-install Jupyter and pytz after installing PTSA or MNE.

I suspect these issues will change or get swapped for other problems as any of the above packages are updated.

Add support for loading pyFR data

Support loading SuperEEG objects

SuperEEG

Cache pairs/contacts or improve loading speed

Loading pairs or contacts can sometimes take close to a second. Since this data is not going to change if loading from the same subject, we should figure out a way to cache the results.

Alternatively, we could think of ways to improve the parsing of the horribly nested JSON structure.

Integrate electrode_categories with contacts/pairs info

It's great that users now have an easy way to get electrode_categories information from a cml reader object.

As this is critical information for many analysis pipelines, it would be ideal if electrode_categories information is integrated into returned contacts/pairs.json information. That way, a user could easily filter electrodes by SOZ/ictal/lesion before passing to the EEG reader. Currently, users must code a routine wherein electrode labels in electrode_categories dicts are matched to labels in a pairs/contacts dataframe. People are definitely going to make mistakes.

One implementation could be as follows:

If a user passes 'contacts' or 'pairs' (and maybe 'localization') into a reader object:

Load the electrode_categories data for that subject/montage, if it exists (if not, return NaNs/empty strings)
Load the contacts/pairs dataframe
Add an electrode_categories column to the contacts/pairs dataframe
Populate the column with the dictionary key (e.g. 'soz' or 'ictal') corresponding to the category assigned to each electrode. If electrode pairs (bipolar), populate with the key name so long as either of the underlying monopolars appear in electrode_categories.
If an electrode label doesn't have a category, populate with NaN/empty string.

I understand that this isn't super elegant, but so long as we're storing all of our critical data in these ridiculous ways, I think this is the only option.

Fix testing on Python 3.7

TravisCI doesn't support Python 3.7 yet: travis-ci/travis-ci#9069

Since we're using conda for CI testing, the script should be updated to not use the Python version in the first place, but rather an environment variable to choose which version to install via conda.

Better error handling when things are None

PathFinder has weird behavior because a lot of things default to None. For example:

>>> finder = PathFinder('R1111M')
>>> finder.find('task_events')

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-5-8a564b501890> in <module>()
----> 1 finder.find('task_events')

~/src/cmlreaders/cmlreaders/path_finder.py in find(self, data_type)
    111             raise InvalidDataTypeRequest("Unknown data type")
    112 
--> 113         expected_path = self._lookup_file(data_type)
    114 
    115         return expected_path

~/src/cmlreaders/cmlreaders/path_finder.py in _lookup_file(self, data_type)
    150                                                session=self.session,
    151                                                localization=self.localization,
--> 152                                                montage=self.montage)
    153         return expected_path
    154 

~/src/cmlreaders/cmlreaders/path_finder.py in _find_single_path(self, paths, **kwargs)
    200         if len(found_files) == 0:
    201             raise FileNotFoundError("Unable to find the requested file in any "
--> 202                                     "of the expected locations:\n {}".format('\n'.join(checked_paths)))
    203 
    204         if len(found_files) > 1:

FileNotFoundError: Unable to find the requested file in any of the expected locations:
 /protocols/r1/subjects/R1111M/experiments/None/sessions/None/behavioral/current_processed/task_events.json

Defer FileNotFoundError until calling the load method

The BaseCMLReader immediately tries to find files in __init__:

cmlreaders/cmlreaders/base_reader.py

Lines 60 to 76 in 9423bd9

    
           def __init__(self, data_type: str, subject: Optional[str] = None, 
        
                        experiment: Optional[str] = None, 
        
                        session: Optional[int] = None, 
        
                        localization: Optional[int] = 0, montage: Optional[int] = 0, 
        
                        file_path: Optional[str] = None, rootdir: Optional[str] = "/"): 
        
               self._file_path = file_path 
        
               # When no file path is given, look it up using PathFinder unless we're 
        
               # loading EEG data. EEG data is treated differently because of the way 
        
               # it is stored on rhino: sometimes it is split into one file per channel 
        
               # and other times it is a single HDF5 or EDF/BDF file. 
        
               if file_path is None and data_type != 'eeg': 
        
                   finder = PathFinder(subject=subject, experiment=experiment, 
        
                                       session=session, localization=localization, 
        
                                       montage=montage, rootdir=rootdir) 
        
                   self._file_path = finder.find(data_type)

This leads to some awkward exception handling logic if you want to optionally load something because you have to put the try...except around the creation of a reader object. It would be far more natural to do this around reader.load.

An example of what I mean follows. What you have to do now is:

        try:
            category_reader = ElectrodeCategoriesReader(
                data_type="electrode_categories",
                subject=self.subject,
                experiment=self.experiment,
                session=self.session,
                localization=self.localization,
                montage=self.montage,
                rootdir=self.rootdir,
            )
        except FileNotFoundError:
            print("oops")

        categories = category_reader.load()

Ideally, we would instead do:

category_reader = ElectrodeCategoriesReader(
    data_type="electrode_categories",
    subject=self.subject,
    experiment=self.experiment,
    session=self.session,
    localization=self.localization,
    montage=self.montage,
    rootdir=self.rootdir,
)

try:
    categories = category_reader.load()
except FileNotFoundError:
    print("oops")

Rereferencing EEG data fails for some subjects

Example with R1264P:

IndexError                                Traceback (most recent call last)
<ipython-input-12-724e0abfb97a> in <module>()
----> 1 get_resting_connectivity(subject, rootdir)

<ipython-input-11-2a66889ef259> in get_resting_connectivity(subject, rootdir)
     11         events = connectivity.get_countdown_events(reader)
     12         resting = connectivity.countdown_to_resting(events, rate)
---> 13         eeg = connectivity.read_eeg_data(reader, resting, reref=True)
     14         eeg_data.append(eeg)
     15 

~/src/thetamod/thetamod/connectivity.py in read_eeg_data(reader, events, reref)
    112 
    113     eeg = reader.load_eeg(events=events, rel_start=0, rel_stop=1000,
--> 114                           scheme=scheme)
    115 
    116     return eeg

~/src/cmlreaders/cmlreaders/cmlreader.py in load_eeg(self, events, rel_start, rel_stop, epochs, contacts, scheme)
    162             })
    163 
--> 164         return self.load('eeg', **kwargs)

~/src/cmlreaders/cmlreaders/cmlreader.py in load(self, data_type, file_path, **kwargs)
     92                                        montage=self.montage,
     93                                        file_path=file_path,
---> 94                                        rootdir=self.rootdir).load(**kwargs)
     95 
     96     def load_eeg(self, events: Optional[pd.DataFrame] = None,

~/src/cmlreaders/cmlreaders/readers/eeg.py in load(self, **kwargs)
    272             kwargs['epochs'] = epochs
    273 
--> 274         return self.as_timeseries(**kwargs)
    275 
    276     def as_dataframe(self):

~/src/cmlreaders/cmlreaders/readers/eeg.py in as_timeseries(self, epochs, contacts, scheme)
    346             if not reader.rereferencing_possible:
    347                 raise RereferencingNotPossibleError
--> 348             data = self.rereference(data, scheme)
    349 
    350         # TODO: channels, tstart

~/src/cmlreaders/cmlreaders/readers/eeg.py in rereference(self, data, scheme)
    377         c1, c2 = scheme.contact_1 - 1, scheme.contact_2 - 1
    378         reref = np.array(
--> 379             [data[i, c1, :] - data[i, c2, :] for i in range(data.shape[0])]
    380         )
    381         return reref

~/src/cmlreaders/cmlreaders/readers/eeg.py in <listcomp>(.0)
    377         c1, c2 = scheme.contact_1 - 1, scheme.contact_2 - 1
    378         reref = np.array(
--> 379             [data[i, c1, :] - data[i, c2, :] for i in range(data.shape[0])]
    380         )
    381         return reref

IndexError: index 78 is out of bounds for axis 1 with size 78

I've only checked a few cases, but it looks like this is happening with pre-System 3 subjects. I have not encountered this problem with any monopolar Ramulator subjects.

Sorting timestamp-named directories shouldn't rely on mtime

Sorting directories named similar to 20180301_180306 should rely on the directory name rather than the mtime since it's conceivable that the mtime could wildly mismatch the name.

Simplify reader setup

As things stand, we have to import any new readers in cmlreaders/readers/__init__.py. This should be reworked to either automatically discover readers to import or restructure things to make this unnecessary.

Add test on TravisCI for loading EEG for entire session

There was a bug that prevented using load_eeg without any kwargs (i.e., to load an entire session). This has been fixed in an upcoming PR, but we should add a good test for this. Do we have any known very short sessions to make this test quick?

Remove testing requiring ramutils from TravisCI test suite

ramutils does a lot of version pinning which can tend to cause a lot of issues. We should just run those tests locally or on rhino or something and not run them on TravisCI.

Add reader for session.json

Unity-based experiments produce a session.json file instead of experiment.log and session.log. This file should be supported by CMLReaders. It is formatted as one valid json string per line, so the parsing should be straightforward.

Loading events fails for PS2

Example:

reader = CMLReader("R1111M", "PS2", 0)
events = reader.load("events")

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-37-2a71e3ba88c9> in <module>()
----> 1 events = reader.load("events")
      2 pairs = reader.load("pairs")

~/src/cmlreaders/cmlreaders/cmlreader.py in load(self, data_type, file_path, **kwargs)
    194                    montage=self.montage,
    195                    file_path=file_path,
--> 196                    rootdir=self.rootdir).load(**kwargs)
    197 
    198     def load_eeg(self, events: Optional[pd.DataFrame] = None,

~/src/cmlreaders/cmlreaders/base_reader.py in __init__(self, data_type, subject, experiment, session, localization, montage, file_path, rootdir)
     79                                 session=session, localization=localization,
     80                                 montage=montage, rootdir=rootdir)
---> 81             self._file_path = finder.find(data_type)
     82 
     83         self.subject = subject

~/src/cmlreaders/cmlreaders/path_finder.py in find(self, data_type)
    120             raise InvalidDataTypeRequest("Unknown data type")
    121 
--> 122         expected_path = self._lookup_file(data_type)
    123 
    124         return expected_path

~/src/cmlreaders/cmlreaders/path_finder.py in _lookup_file(self, data_type)
    160                                                session=self.session,
    161                                                localization=self.localization,
--> 162                                                montage=self.montage)
    163         return expected_path
    164 

~/src/cmlreaders/cmlreaders/path_finder.py in _find_single_path(self, paths, **kwargs)
    210         if len(found_files) == 0:
    211             raise FileNotFoundError("Unable to find the requested file in any "
--> 212                                     "of the expected locations:\n {}".format('\n'.join(checked_paths)))
    213 
    214         if len(found_files) > 1:

FileNotFoundError: Unable to find the requested file in any of the expected locations:
 /Users/depalati/mnt/rhino/protocols/r1/subjects/R1111M/experiments/PS2/sessions/0/behavioral/current_processed/all_events.json

Temporary workaround: request task_events instead of events.

Montage/Localization handled incorrectly

cmlreaders incorrectly uses the localization number to look up special subject identifiers instead of the montage number. For example, R1006P had a montage change that was not the result of a re-implant. Therefore, the montage number was incremented, but the localization number was not. Attempting to request pairs information for this subject results in an error:

reader = CMLReader(subject="R1006P", localization=0, montage=1)
pairs_df = reader.load("pairs") # returns a key error

Instead, the montage number should always be used when looking up data in /data10/...
When loading data from the /protocols "database", both the localization and montage numbers are needed.

CMLReader.load_eeg() fails on sessions with >1 EEG file per session

CML Data Readers version: 0.4.0
Python version: 3.6
Operating System: Mac OS

Description

R1384J, experiment FR1, session 1 is split into 2 HDF5 files due to a pause in the session;
CMLReader is unable to load the data for this session.

What I Did

reader = CMLReader(subject='R1384J',experiment='FR1',session=1)
events = reader.load('events')
eeg = reader.load_eeg(events,0,100)

Traceback (most recent call last):
  File "/Users/leond/anaconda2/envs/thetamod/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-cbd19709a38f>", line 1, in <module>
    eeg  = reader.load_eeg(events,0,100)
  File "/Users/leond/anaconda2/envs/thetamod/lib/python3.6/site-packages/cmlreaders/cmlreader.py", line 164, in load_eeg
    return self.load('eeg', **kwargs)
  File "/Users/leond/anaconda2/envs/thetamod/lib/python3.6/site-packages/cmlreaders/cmlreader.py", line 94, in load
    rootdir=self.rootdir).load(**kwargs)
  File "/Users/leond/anaconda2/envs/thetamod/lib/python3.6/site-packages/cmlreaders/readers/eeg.py", line 274, in load
    return self.as_timeseries(**kwargs)
  File "/Users/leond/anaconda2/envs/thetamod/lib/python3.6/site-packages/cmlreaders/readers/eeg.py", line 343, in as_timeseries
    data = reader.read()
  File "/Users/leond/anaconda2/envs/thetamod/lib/python3.6/site-packages/cmlreaders/readers/eeg.py", line 210, in read
    data = np.array([ts[epoch[0]:epoch[1], :].T for epoch in self.epochs])
ValueError: could not broadcast input array from shape (178,100) into shape (178)

Loading Split EEG with Missing Channels Fails Silently

For some subjects that have split EEGs, CML readers does not correctly handle non-sequential or missing channels. For example:

reader = CMLReader(subject="R1006P", experiment="FR2", session=1)
eeg = reader.load_eeg()

The returned timeseries includes CH0, CH1, CH99, and CH100. None of these channels exist in the noreref directory for this subject/experiment/session combination. It appears that cmlreaders is assuming that all channels exist consecutively. It is not clear why these channels are not in the noreref directory, but the channels that are present are consistent with the channels listed in the jacksheet. At least for this subject, the split channels are also consistent with what is in the pairs.json file for the montage that was used in this session.

Begin testing on Python 3.7

Python 3.7 is now available. As soon as it is available with conda, testing on 3.7 should be enabled on TravisCI.

Improve functionality of ElectrodeCategoryReader

The electrode category file is highly irregular, but we do need to be able to consistently read it in a reasonable form

Tests shouldn't write data to package

Test data output is currently written to cmlreaders/test/data/output. This should be going to a temporary directory instead.

Multi-session event reading

A common use case when working with events is to load multiple sessions at a time. Right now, this is not as easy as it should be.

all_events = []
for session in sessions_completed:
    sess_events = cml.CMLReader(subject="R1409D", experiment="FR6", session=session,
                                localization=0, montage=0, rootdir=rhino_root).load('task_events')
    all_events.append(sess_events)
all_sessions_df = pd.concat(all_events)

The proposed API is to have a special method associated with cml_reader to allow loading multiple sessions of events. The other option is to allow additional kwargs in .load(), but then it becomes extremely difficult to document that function since it would take different parameters depending on the data type being loaded. Instead, we want to mimic the behavior of load_eeg and have it be a separate method associated with the class. Single sessions of events can be loaded using either reader.load() after having specified a session when creating the reader, or by using reader.load_events(sessions=[1]). At a minimum, the following cases should be handled:

Given a single experiment, load all completed sessions
Given a single experiment, load a specific subset of sessions

reader = CMLReader(subject="R1409D",  experiment="FR6")

# Load all sessions
all_fr6_events = reader.load_events()

# Load specific sessions
subset_fr5_events = reader.load_events(sessions=[0, 1])

Given multiple experiments, load all completed sessions from each experiment
Given a reader with no experiment specified, raise an error if load_events is called

reader = CMLReader(subject="R1409D")

# Invalid Request
all_events = reader.load_events()

# Load sessions across experiments
all_record_only_events = reader.load_events(experiments=['catFR1', 'FR1'])

Depending on if it is important enough of a use case, it could also handle the following cases:

Given multiple experiments and a specific session, load that session number of each experiment, i.e. the first session of FR1 and catFR1 for a particular subject
Given multiple experiments and a specific set of sessions, load those specific sessions for each experiment given, raising an error if any of the requested session/experiment combinations are not available

reader = CMLReader(subject="R1409D")

# Multi-experiment, single session
multi_exp_single_sess = reader.load_events(experiments=['catFR1', 'FR1'], sessions=[0])

# Multi-experiment, multi-session
multi_exp_multi_sess =  reader.load_events(experiments=['catFR1', 'FR1'], sessions=[0, 1])

rootdir should expand userdirs

Instead of

self.rootdir = rootdir

PathFinder should have

self.rootdir = os.path.expanduser(rootdir)

Support scalp data

We can use the subject ID in PathFinder to determine which directory in /protocols to use.

Jacksheet reader doesn't work

When loading a jacksheet, I get a dataframe back that only has one column: "channel_label". This combines both the jackbox number and the contact label. We should instead have two columns named "number" and "label".

Poor error message when loading EEG and events are empty

Example: trying to filter events with a resulting DataFrame with no rows results in an IndexError when concatenating TimeSeries objects because none get added in the as_timeseries method. We should explicitly check for this case and raise a more helpful error message.

Conda package doesn't list all requirements

Trying to build a package for pennmem/artdet, I get import errors because cmlreaders requires pandas but is not listed as a requirement in conda.recipe/meta.yaml.

Magic importing of readers doesn't work if not in the right directory

pkgutil defies all logic in how it works, and apparently as written, the magic importing only works if you are in the same directory as the cmlreaders package...

Rename TimeSeries class

The TimeSeries class serves as a simple container for EEG data which can then be exported to other formats for actual analysis. This leads to some confusion (especially since PTSA has a class with the same name). Instead, it should be named EEGData or something similar.

Automatically determine localization and montage numbers

Rather than defaulting to 0, CMLReader should read the data index (this can be cached to avoid re-reading several times) and determine localization and montage number from there when session is specified. In cases where either is nan, 0 can be assumed.

BaseReader.fromfile can fail for readers that support multiple datatypes

Example: MontageReader doesn't know if it's trying to read contacts or pairs.

Add pandas accessors to add common shortcuts

Pandas has a notion of accessors to add additional namespaced functionality to DataFrame and other pandas objects. We could use this by adding things like event accessors which can do some common queries, for example something like:

@pd.api.extensions.register_dataframe_accessor("events")
class EventsAccessor(object):
    ...
    @property
    def stim_events(self):
        """Filter events and return only stim events."""
        return self._obj[self._obj["type"] == "STIM_ON"]

def __init__(self, subject, rootdir='/', experiment=None, session=None,
             localization=None, montage=None):

It would be more natural to be able to type

finder = PathFinder(subject, experiment, session)

which is a more natural ordering than having rootdir for some reason appear after subject. I would make it as the last keyword argument.

	def __init__(self, data_type: str, subject: Optional[str] = None,
	experiment: Optional[str] = None,
	session: Optional[int] = None,
	localization: Optional[int] = 0, montage: Optional[int] = 0,
	file_path: Optional[str] = None, rootdir: Optional[str] = "/"):

	self._file_path = file_path

	# When no file path is given, look it up using PathFinder unless we're
	# loading EEG data. EEG data is treated differently because of the way
	# it is stored on rhino: sometimes it is split into one file per channel
	# and other times it is a single HDF5 or EDF/BDF file.
	if file_path is None and data_type != 'eeg':
	finder = PathFinder(subject=subject, experiment=experiment,
	session=session, localization=localization,
	montage=montage, rootdir=rootdir)
	self._file_path = finder.find(data_type)