Code Monkey home page Code Monkey logo

Comments (10)

dotsdl avatar dotsdl commented on July 20, 2024

@datreant/coredevs opinions?

from scipy_proceedings.

orbeckst avatar orbeckst commented on July 20, 2024
  • big picture: what gap does it fill
  • organization of the library, major classes
  • examples with code
  • MDS as an example of a tool based on datreant

3-4 pages are enough, recycle your docs: you already spent a lot of effort on making them good.

Oliver Beckstein
email: [email protected]

Am May 27, 2016 um 9:02 schrieb David Dotson [email protected]:

@datreant/coredevs opinions?


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from scipy_proceedings.

kain88-de avatar kain88-de commented on July 20, 2024

I will describe my two use cases for datreant

1. Collect post-processed data from a cluster

I usually run hundreds of simulations at a time (parameter studies and improving sampling) resulting a lot of raw data. Afterwards I post process the trajectories and store the post-processing next to the invididual simulations. Ergo I have hundreds of small files distributed over several folders. I then have a script which collects all of these files into one handy datreant object. Afterwards I only need to copy the datreant folder to my laptop. That is only one easy to manage folder, I have some metadata saved with it already and it uses compressed h5 files.

2. Analyzing hundreds of simulations simultaneously

The data I want to analyse at once is ~10GB which doubles to triples with intermediate results, So there is no way this fits into RAM. So what I'm doing instead is to loop over the datreant limbs and load them one at a time. Intermediate results are also stored in the datreant. It goes a little bit like this.

for limb in simulations_keys:
    sim = sims.data[limb]
    res = fancy_math(sim)
    sims.data[limb + '/fancy-data'] = res

End result I use up 1-2GB of RAM instead of 10-30GB. This is still pretty fast because HDF5 is fast loading files from disk. Without datreant this would be quite hard.

from scipy_proceedings.

kain88-de avatar kain88-de commented on July 20, 2024

I can look to give you some more concrete code example if you like

from scipy_proceedings.

dotsdl avatar dotsdl commented on July 20, 2024

Awesome. Thanks everyone. Yeah @kain88-de the more code examples you can share of where datreant really serves a crucial role, the more I have to work with. I definitely know how I use datreant (mostly through mdsynthesis), but how other people use these objects is more interesting to me.

If we have a central example we can keep coming back to for illustration in each section, I think that would be the best path forward. Have to find something compelling (but simple enough) for that, first.

from scipy_proceedings.

kain88-de avatar kain88-de commented on July 20, 2024

One of the helper classes I use is TreantView see bottom. My Treant usually looks like this only with more entries

obs
+-- parameter1
     +-- value_1
     |    +--  rmsd
     |    |      +-- pyData.h5
     |    +-- energy
     |    |      +-- pyData.h5
     +-- value_2
     |    +--  rmsd
     |    |      +-- pyData.h5
     |    +-- energy
     |    |      +-- pyData.h5

Now the TreantView can be used as

>>> rmsd_view = TraentView(obs, 'parameter1`, include='rmsd`)
>>> print(rmsd_view.keys)
['value_1', 'value_2']

The view can then be used

for rmsd in rmsd_view.keys():
     res = fancy_analysis(rmsd_view[rmsd])
     obs.data['parameter1/' + rmsd + '/fancy-analysis'] = res

Afterwards the Treant looks like this.

obs
+-- parameter1
     +-- value_1
     |    +--  rmsd
     |    |      +-- pyData.h5
     |    +-- energy
     |    |      +-- pyData.h5
     |    +-- fancy-analysis
     |    |      +-- pyData.h5
     +-- value_2
     |    +--  rmsd
     |    |      +-- pyData.h5
     |    +-- energy
     |    |      +-- pyData.h5
     |    +-- fancy-analysis
     |    |      +-- pyData.h5

Afterwards I generate a TreantView for fancy-analysis iterate over it and plot the results.

Bottom

class TreantView(object):
    def __init__(self, treant, head, tail=None, include=None, exclude=None,
                 sort_func=None):
        self._treant = treant
        self._head = head
        self._tail = None
        self._get_keys()

        if include is not None:
            self.filter(include, exclude)

        self._tail = tail
        if tail is not None:
            self.use_tail(tail)

        if sort_func is not None:
            self.sort(sort_func)

    def _get_keys(self):
        """get keys of treant leave"""
        keys = self._treant.data.keys()
        keys = [k.replace(self._head + '/', '') for k in keys if self._head in k]
        if self._tail is not None:
            keys = [k.replace('/' + self._tail, '/') for k in keys]
        self._keys = keys

    def use_tail(self, tail):
        """use a common tail for all keys and blend that out in the
        selection process"""
        self._tail = tail
        self._keys = [k.replace('/' + tail, '') for k in self._keys]

    def sort(self, sort_func):
        """Use sort_func to convert elements to a data-type according to which
           the keys will get sorted
        """
        sort_idx = np.argsort([sort_func(k) for k in self._keys])
        self._keys = list(np.array(self._keys)[sort_idx])

    def filter(self, include, exclude=None):
        """filter to subset of leave only"""
        if exclude is None:
            exclude = []
        keys = [k for k in self._keys if
                (np.all([w in k for w in include]) and
                 (np.all([w not in k for w in exclude])))]
        self._keys = keys

    def __getitem__(self, item):
        if isinstance(item, int):
            return self._treant.data[join_paths(self.head, self._keys[item],
                                           self._tail)]
        elif isinstance(item, str):
            return self._treant.data[join_paths(self.head, item, self._tail)]
        else:
            raise NotImplementedError('can only handle "int" and "str" objects')

    @property
    def keys(self):
        return self._keys

    @property
    def treant(self):
        return self._treant

    @property
    def head(self):
        return self._head

from scipy_proceedings.

dotsdl avatar dotsdl commented on July 20, 2024

Thanks everyone; we're good. Have a look at the manuscript and give feedback if you have it. Making PR to main repo in 10 minutes or less, but we can still discuss and refine even once it's in review.

from scipy_proceedings.

kain88-de avatar kain88-de commented on July 20, 2024

Can we still commit grammer and spelling PR right now? Then I'll read over it today

from scipy_proceedings.

dotsdl avatar dotsdl commented on July 20, 2024

Yup! We can work on it as much as we want, I believe. There are plenty of
spelling and grammar errors to fix from my own rereads.
On May 31, 2016 12:38 AM, "kain88-de" [email protected] wrote:

Can we still commit grammer and spelling PR right now? Then I'll read over
it today


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABQ-267ZP7_dCHhy_qi5yp1lonFSMQeHks5qG-VlgaJpZM4IopC0
.

from scipy_proceedings.

kain88-de avatar kain88-de commented on July 20, 2024

But's its a good overview. I'll have to update my workflow to include all your cool new features

from scipy_proceedings.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.