Code Monkey home page Code Monkey logo

exmemo's People

Contributors

kalekundert avatar sdhutchins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

exmemo's Issues

Unfriendly error messages for missing files/directories

Workspaces and experiment directories are defined by meeting certain requirements: having a name that matches a certain pattern, having certain subdirectories, containing certain files, etc.

When these requirements are not met, the error messages are not helpful. In the case of workspaces, it just says "this is not a workspace". In the case of experiments, it just doesn't find the experiment in question (and may find none at all), which leads to some confusing errors.

I was able to trigger a bunch of this behavior by cloning a workspace on a remote server. The workspace wasn't detected initially because none of the empty directories were cloned, so I had to make those directories manually. Maybe the root of the workspace should be defined only by the presence of the .exmemorc file. Exmemo does probably assume that all those directories exists, but I could probably just check for them as needed, and either fail gracefully or create them on-the-fly.

The experiment I had made also wasn't detected because I'd committed some of the files in the experiment, but not the *.rst file. Experiments are detected by (i) being a directory, (ii) prefixed with a date, (iii) containing an *.rst file with a name matching the directory. So without the *.rst file, the experiment wasn't detected. In this case, 'm not so sure what the best course of action is:

  • Maybe I could improve the error message somehow, i.e. detect if a directory with the right name would match, but is just missing its *.rst file. That would at least tell the user what to do.

  • I could add another kind of dot-file to identify experiments (i.e. either one in each experiment, or one in the experiment root that serves as an index), but it would be just as easy to not commit that file too. It would also require the user to manage a magic file that they don't really understand, and it would clutter up their directories. I don't think this is a good idea.

  • I could relax the requirements and consider any date-prefixed directory to be an experiment. One problem I can see with this is that exmemo would like to know the name of the *.rst file for each experiment, so it can open it up when you ask for it. Right now this always works, because the file and the directory have to have the same name (excluding the date prefix). If I get rid of this requirement, I'd have to guess at the name of the *.rst file, although in most cases I could probably guess right.

Some project folders are not created

Hey, I really dig this package! It's extremely well written.

I'm curious if this has been tested on windows?

When I try the exmemo note new "Large step with half twist", it just hangs. Also, when I create a new project, the documents, data, and protocols folders aren't created. I'm not sure if this is a bug or not.

@kalekundert

More sophisticated keyword searches

The current method of translating keywords to paths (based on substring matching) has some annoying corner cases:

  • Names with several variants may be difficult to specify unambiguously. For example, I have protocols named "pcr" and "pcr_cloning". Typing just "pcr" is always ambiguous, even though I know that I mean just "pcr". This situation comes up farily often.

  • Sometimes I find myself running a command with an ambiguous keyword a bunch of times in a row, e.g. if I'm trying to get the arguments how I want them. This is annoying because unless I edit the command to make it unambiguous (which is annoying itself), I'm asked which path I mean each time. Some ways to address this:

    • Provide a way to specify which variant I mean on the command line.
    • Just pick a variant to use by default, and provide a command-line option to use if you want to be asked which variant you want. The default could be based on a number of tie-breakers, including most recently used, most commonly used, some configured priority, Levenstein distance, etc.
  • I only think a certain file has a certain substring when it doesn't. See #3. It would be nice to provide custom keywords for certain paths, basically aliases.

Bibliography directive

I often create ad-hoc bibliographies using footnotes and :doi: roles. This isn't so bad, but it does have some annoyances:

  • The footnote syntax is weird and hard to remember.
  • The RTD theme doesn't line up the footnotes nicely.

A custom bibliography directive could address both of these issues. The challenge would be integrating with the footnote system, so the same [AuthorYear]_ syntax could continue to be used in the document.

Alternatively, I could maybe play with the CSS to render the footnotes more nicely.

Keywords for notebook entries

I often don't remember the exact name I used for an experiment, but I remember keywords like what kind of experiment it was, what mutant I was working with, etc. It might be nice if I could associate keywords with experiments using some sort of config file.

The Sphinx extensions are a mess

I've got to think more about how to organize all the Sphinx extensions associated with exmemo. I think a lot of them could be useful for other projects, and shouldn't require installing all of exmemo. Plus, Sphinx extensions normally live in sphinxcontrib directory, which these don't. I also need documentation.

More robust symlinks

I had an idea for how to symlink data files into notebook directories, in such a way that (i) rearranging the notebook directories won't break the symlinks and (ii) no absolute paths are used (important for collaboration). The idea is to have a "root" symlink in the .expt metadata directory associated with each experiment. All other symlinks will go through that one (e.g. .expt/root/data/gels/20210916_xyz.svg). If the experiment is moved, the root symlink can be automatically updated.

Cache protocol/data/experiment files

A lot of exmemo commands perform recursive globs to find files matching a keyword. These globs take longer and longer----eventually becoming a bottleneck---as the project grows and more files need to be searched. Part of the problem is that querying the filesystem is fairly expensive. I think I could speed these searches up significantly if I maintained a cache of the names of the files in the project.

However, a cache wouldn't do me any good if I need to update it every time I run a command anyways. (I definitely can't assume that new files won't be created between commands, and it would be annoying to the user if it were possible for the cache to be stale.) So I think I'd have to run a daemon to make this work. Having to manage a daemon would also be annoying, but I think I could make stay behind the scenes pretty well:

  • When needing to convert a keyword to a path, check to see if a daemon is running.
  • If not, start one. Use locks to prevent race conditions.
  • Ask the daemon for the path. If the daemon just started, it'll have to check the filesystem first. So the answer will be right, but you won't get the performance boost. If the daemon has been running, it can monitor the filesystem via inotify and return an up-to-date answer immediately.

Better way to reference experiments

The RST role for refering to experiments is way too verbose, but it's not obvious how I could make it simpler.

Also, it too hard to create new experiments, and there's no support for nested experiments.

Add retroactive admonitions

Sometimes, when I find mistakes in my protocols, I realize that I've been making the same mistake in all the experiments I've done up to that point. I wonder if I can think of a good way to annotate that...

Better ways to find experiments

I want a more flexible API for looking up experiments:

  • Search whole notebook? Just the current directory? Recursively?
  • Search by ID, substring, or path?
  • Expect one hit or many? What to do if too many/few hits found?

Despite wanting all this flexibility, I also need to think about how to provide a consistent experience to end-users.

Rename 'sequences' to 'reagents'

I'm actually not sure if this directory is even part of exmemo, but in any case 'reagents' better reflects what this directory is actually for.

Add command to automatically nest an notebook entry

I often set out to do an experiment to answer question X, but some complications come up, and I have to do some more related experiments to get get to the bottom of things. Considering that I now want to organize experiments heirarchically, a natural way to handle this case would be to:

  • Make a new notebook directory for the broader question
  • Move the original directory into a subdirectory, named after the initial approach I tried.

It'd be nice to have a command to do manage doing this automatically. Some names:

  • breakout
  • drill
  • expand

Create interactive web server

Curently, the lab notebook is hosted as a static site; it's basically just a graphical way to view notebook entries. However, it would be nice if the web interface was more interactive and more customized to the lab notebook application. Some specific features that would be nice:

  • Reorganize experiments
    • Split into parent/child
    • Add tags/categories
  • Browse experiments by tag/category
    • I like the idea of tags + arguments to create tables.
  • Improved search:
    • Lab notebooking really benefits from powerful search features.
    • The search functionality in Sphinx is not very good.
    • One straight-forward way to improve the search would be to generate the index in advance. This would make the search faster, and allow me to be more sophisticated about stemming etc.
  • Keep track of planned/current/complete expeiments (Kanban style)
    • Inspired by Kanobo.
    • A simple way to think about kanban:
      • Visualize work
      • Limit work-in-progress
    • Would probably also want to be able to add experiments via the web server.
    • Manage dependencies between experiments, e.g. expt #โ 2 can be done once I get expt #โ 1 to work.
      • This could allow me to automatically move experiments from "planned" to "current" once the precursor experiment is marked as "complete".
  • Better experiment navigation:
    • With sphinx, there's a practical limit on how deep I can nest experiments before the navigation gets unmanageable.
    • This is a limit I'd really like to remove.
  • Distinguish between goals and experiments:
    • I haven't thought about this much, but I've gotten the sense that I have two different kinds of notebook entries.
    • "Experiments" detail the results of a single protocol:
      • Pretty formulaic:
        • Intro: why I'm doing the experiment
        • Protocol: what exactly I did (thank you stepwise!)
        • Figure: the data I collected
        • Observations: my interpretation of the data
      • Doesn't matter if the experiment "worked" or not, I just report what happened.
    • "Goals":
      • A big picture thing that I want to accomplish, like "Get cDNA display working".
      • Often a planning area for new experiments/directions.
      • Tend to have "Considerations" sections, and TOCs of other experiments.
      • Have to keep adding subgoals/subexperiments until I have success (or give up).
      • Sometimes start as normal experiments, but turn into goals when the experiment doesn't work and I need to figure out why.
    • Is there a benefit to distinguishing these types of notebook entries?
      • Only goals can really be nested.
      • Goals are much longer term that experiments
      • Not clear.

The underlying directory structure would remain the single source of truth, and "edit" actions by the web server would directly operate on the file system. Some more thoughts about the implementation:

  • My first instinct would be to use Django, although the fact that the server wouldn't have a database might make Django a poor fit. Perhaps a simpler framework like Flask would be better.
  • I'd be able to develop this without interrupting my work at all, because this web server would just be a new interface to the same data I already have. The existing system should continue to work unchanged.
  • Would I continue to use Sphinx/restructured text?
    • Compatible with what I already have.
    • I need support for cross-linking and footnotes, which Sphinx has.
    • Sphinx is extensible, even though plugins are not easy to write.
    • I might be able to run sphinx like normal, and have Flask serve the resulting _build directory.
      • Flask would have to translate Sphinx's links into its own. Not clear how hard/fragile this would be.
      • Since Sphinx would be the tool generating the final HTML for each experiment page, Flask would have to supply Sphinx with a custom theme to provide sidebars, etc.
    • Alternatively, I might be able to invoke Sphinx within the web server to generate the necessary HTML.
      • In pseudocode:
        • Parse the *.rst into nodes.
        • Convert the nodes into html, with hooks to handle cross-references.
        • Embed the resulting html in a Mako/Jinja template with sidebars etc. and return to client.
      • I've never done anything like this before, so I don't know how hard it would be, but at the very least it would involve really digging into the guts of sphinx.
      • Hooking into the cross-referencing directives/roles (e.g. .. toctree::, :expt:) might be hard.
  • The routing would be pretty simple:
    • /: Dashboard-style views of current/planned/complete experiments.
    • /expt/<id>: Show the requested experiment.
      • Each experiment would have links to:
        • Child experiments, e.g. a TOC.
        • Parent experiments, perhaps as breadcrumbs.
        • Back-references, e.g. experiments that reference this one.
    • /tag/<name>: Show all the experiments associated with the given tag.
      • If the tag has arguments, this would be rendered as a table.
    • /search/<query>: Show a list of all the experiments matching the given query (with some context).
    • Every page would have a side bar with:
      • A search form.
      • Links to the top-level experiments.
      • Links to the current experiments.
  • I'd probably want to add a .meta directory to each experiment to hold information about ids and tags. Right now the only metadata is the .id file in each experiment. Alternatively, I could keep the metadata in a single structured file (e.g. nested text) that contains ids and tags. A directory is probably more future-proof, though.
    • Decided to use a directory, with as much information as possible stored in different files. This should reduce the chance of merge conflicts.
  • Frontend:
    • This is definitely something that could bog me down.
    • I could look for templates, but I'm unlikely to find anything that fits well.
    • There should be very little need for javascript, though. I should be able to get pretty far with CSS and semantic HTML, which are easier for me.
    • Use Furo theme for inspiration? I like that it has left and right side bars.
  • Might be nice to monitor the filesystem, and automatically rebuild Sphinx/serach index in response to changes.

Probably the first thing to play with is the Flask/Sphinx linkage. If I can get that to work, the rest should be pretty straight-forward and incremental.

Avoid conflicting experiment ids

If multiple people are collaborating on a project, they could easily end up creating experiments with duplicate keys. I'll need a way to avoid this. Off the top of my head, one option would be to have each user specify how many users there are, and which user number they are. New experiment ids would be multiplied by the total number of users, then summed with the specific user number. That would work, but requires a lot of manual intervention.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.