Code Monkey home page Code Monkey logo

Comments (7)

electronsandstuff avatar electronsandstuff commented on July 19, 2024

The current naming of load and save isn't the best and migrating to the dict interface will require renaming anyways so this is a good time to think about this topic.

Other python libraries seem to use either load/dump or open/save. Since I am moving to a dict interface, but want to maintain backwards compatibility, I don't think I can keep the current name load without causing issues. That might always have to refer to the function that loads blocks of data. Instead, what I could do is change to the names open/save (like in pillow, for instance). That way the open function gets a new name which is unique to the dict interface. The save function can figure out which version to use based on the objects passed to it.

Since I am also planning for pandas compatibility, I also worry about the naming for those functions. The pandas naming convention is read_gdf and to_gdf. The issue is that there are two different standard files that can be read/written: screens/touts and initial distributions. I am kind of against using one function to handle both of them since it can be confusing to get two completely different types of outputs depending on the input and I feel this could lead to strange bugs for users (IE screens/touts returns like a dict of lists of dataframes and the initial distribution files are just dataframes). I suppose I should also segregate the naming of the pandas functions to make it clear what they do. I could put them in like a submodule named pd or pandas. Maybe it makes sense to keep the same names as the non-pandas versions, but just add the pd or pandas prefix?

Note: one argument against using submodules with the name pd or pandas is that if a user does from easygdf import * then it could cause serious confusion with names.

Right now it's looking like the public interface's names will be:

  • open(...) - Reads file with dict-like blocks
  • save(...) - Saves file with dict-like (or array-like) blocks
  • open_screens_touts(...) - alias for load_screens_touts
  • open_initial_distribution(...) - alias for load_initial_distribution
  • pd_open_screens_touts(...) - same as open_screens_touts, but with pandas dataframes
  • pd_open_initial_distribution(...) - same as open_initial_distribution, but with pandas dataframes

I still think the last few function names are kind of long, I should maybe look into shorter names for them.

from easygdf.

electronsandstuff avatar electronsandstuff commented on July 19, 2024

Alternative suggestions for long function names:

screens_touts: st, particles, particle_data, pdata, parts,
initial_distribution: dist, idist, init_dist

from easygdf.

electronsandstuff avatar electronsandstuff commented on July 19, 2024

Now I am thinking of the following names:

  • read(...) - dict reader
  • write(...) - dict writer
  • read_particles(...) - alias to open_screens_touts(...)
  • read_dist(...) - alias to open_initial_distribution(...)
  • write_particles(...) - alias to save_screens_touts(...)
  • write_dist(...) - alias to save_initial_distribution(...)

I am still thinking of the names for the panda functions. I would only need to have ones for the reading functions since I can tell what the type is for the write methods. I am also now thinking of whether this should be a new function or a keyword argument. I am leaning towards the latter. I will also keep all existing functions for backwards compatibility.

from easygdf.

electronsandstuff avatar electronsandstuff commented on July 19, 2024

I'm starting to work on these changes in branch dict-pandas-interface

from easygdf.

electronsandstuff avatar electronsandstuff commented on July 19, 2024

One issue I am immediately running into is that in the GDF format, objects can have both a value and children. This means I can't just interpret it as dicts of data.

That is, you can have blocks that look like:

[
  {'name': 'a', 'value': 0.0, 'children': [...]},
  {'name': 'b', 'value': 1.0, 'children': [...]},
]

Clearly, if there is only data and no children, then I think I should just return a key-value pair. If there is no value and only children, then I could return just a key-list pair. When there are both values and children, however, I guess the simplest thing to do would be {'value': 0.0, 'child': {...}}. Note: the multiple child blocks will get converted to a single dict and so it is child here, not children.

This does add another layer of depth to the tree which I don't like, but if I don't do this, then they value would show up in the parent dict or in child. I guess I could put it in the child with some fixed name like 'parent_val', but that could cause naming conflict. I could check for that issue, however.

from easygdf.

electronsandstuff avatar electronsandstuff commented on July 19, 2024

Actually, reading more through the standard, repeated block names are a common thing for GDF files. For screen outputs, for instance, the saved blocks all have the same name, but different value. This would imply using the key (name, value) to save things in the dict interface. This isn't that appealing since it was meant to make things easier to use and now you would have to type of crazy key names to access raw data.

from easygdf.

electronsandstuff avatar electronsandstuff commented on July 19, 2024

Sadly, it's starting to seem like a dict interface might not be reasonable for the GDF files. The most common output will be touts / screen outs which can't be represented this way. I might just have to stick to the current format. I suppose the names could still be improved, however, and I can still add the pandas methods.

from easygdf.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.