Comments (4)
Some trickier parts to port
I started working on this last week. It took a long time in part because are so many command-line options in mmpdb! In this comment I'll describe some of the trickier parts to port.
no built-in support for mutual exclusion groups
There were also some parts of the old argparse configuration which didn't easily port to click, in particular, what argparse calls a 'mutual exclusion group', where at most one of N options may be used. (Eg, you can specify --cut-rgroup
or --cut-rgroup-file
but not both.)
Click doesn't implement that at all, and requires that those be built using callback functions. While the callback system is much more flexible and powerful, it's certainly not as easy.
Argument grouping
Another problem was argument grouping. In argparse code, all of the arguments are stored in a single object. In click code, these are passed to the "main" handler each as its own argument.
# argparse
parser = ArgumentParser()
parser.add_argument("-x" ...)
args = parser.parse_args(...)
print(args.x)
# click
@option("-x", ...)
def main(x):
print(x)
I wanted to group some options together. For example, both fragment
and smifrag
take the same set of fragmentation parameters. These same parameters are also stored in the fragments file, so there's already a datatype for them. I wanted to pass that data type, rather than the individual arguments.
With a pointer by Charles Hoyt on Twitter, I was able to solve this using a wrapper adapter which accepts **kwargs
(meaning, accepts any and all arguments), pops off the appropriate terms, creates the correct data type, adds them back to the **kwargs
, and forwards the result to the original command.
I use this a few times in new click code.
Let me be clear - overall I prefer the click way. A lot of the argparse code did something like:
x = args.x
and I like removing that boilerplate.
detect user-specified vs. default arguments
The hardest problem was to support a feature in the old argparse code. If you specified a fragmentation option and specified a cache file, then it would give an error and say that you could only do one or the other, but not both. This isn't quite the same as a mutual exclusion group because it's more like "you can use one or more of these OR you can use that, but not both."
The only solution I could figure out requires leaving the fragmentation options None, as a special indicator for "has not been used". Then in the **kwargs
wrapper I can record which fields are not None, and change any None value to the default. It feels clumsy, but the old one was also clumsy.
from mmpdb.
Parameter validation during argument parsing
Both argparse and click support data type validation and conversion during argument parsing. For example, you might support a --smiles
option, like this:
command --smiles c1cccc
That's an invalid SMILES. When does the program determine its invalid?
Both argparse and click have ways to call a user-defined function to handle the parsing, and return something besides a string.
In the argparse version I deliberately did not use this feature, and instead waited until executing the command to do non-trivial validation like this. (See my next comment for reasons.)
The click version I used this feature, which makes for a somewhat better UI (error messages are more directly tied to the parameter name, and occur in the argument specification order), and simplifies the underlying command implementation because it doesn't need to call the verifier function.
from mmpdb.
Eager vs delayed loading
The old argparse-based version used delayed loading. The file commandline.py
contained nearly all of the code needed to express the command-line configuration, but imported no other functionality. For example, it did not import the rdkit module until a command was actually run.
The new click-based version imports all of the modules under mmpdb/cli/
:
__init__.py fragment.py list_.py rgroup2smarts.py
click_utils.py fragment_click.py loadprops.py smi_utils.py
create_index.py help_.py predict.py smifrag.py
drop_index.py index.py propcat.py transform.py
and those in turn import other modules. (Though I've checked and ensured that mmpdb --help
does not require importing RDKit.)
This may increase startup time. On the other hand, I will be migrating to importlib.resources
to find the SQL scripts, which means the mmpdb distribution can be switched to a wheel (a zip file), which should decrease startup time.
Background
The argparse system delayed imports until needed. This was implemented via a sort of wrapper function, like:
# in commandline.py
def fragment_command(parser, args):
from . import do_fragment
do_fragment.fragment_command(parser, args)
The argparse code would call fragment_command
, which imported the appropriate do_*.py
file then forwarded the call to the actual function.
The main reason I did this was to minimize startup time. Python programs a while to start, in part because they load so many modules. Each module lookup has Python searching the PYTHONPATH looking for a package, or a module, or a shared library. This can be slow, on a slow network file system. (15 years ago I remember it took Python 1 second to start, when using the Lustre file system, which is known for poor performance with many small files.)
This is much less an issue for most modern Python deployments, because often the package is installed as a "wheel" file, which is zip file containing all of the data. Once the package is found, Python's import mechanism has quick access to the rest of the contents of the wheel file.
NOTE: mmpdb doesn't yet support wheel installs because it includes a few SQL scripts which are currently found by file path name. This can handled by switching to importlib.resources
, which I will work on soon.
I've reorganized commandline.py
and the do_*.py
files so they are under the cli/
subdirectory. This was needed to make it easier to write new mmpdb commands. The downside is that more files are imported at startup time. I think the benefits are worth it, and don't expect this to be a noticeable issue.
from mmpdb.
The transition appears to be done. Probably a few errors remain, but those will likely be resolved during normal pre-release testing/use.
from mmpdb.
Related Issues (20)
- How to get environment smiles? HOT 3
- How to build mmpdb with large data set HOT 4
- cannot fix the AttributeError: module '__main__' has no attribute '__spec__' HOT 2
- store fragments in a SQLite db instead of JSON-Lines HOT 2
- Use SQLAlchemy to work with the database HOT 1
- Remove vendered use of peewee HOT 1
- Use importlib.resources instead of __file__ HOT 1
- Can we specify the environmental radius when generating the mmpdb? HOT 11
- Which "cut-smarts" patterns (i.e. fragmentation parameters) are used by default for Transform function? HOT 1
- mmpdb transform behaves unexpectedly HOT 5
- Error when using "--out mmpa" HOT 4
- [docs] README.md: create installation guide HOT 5
- AttributeError: module '__main__' has no attribute '__spec__'
- Support for CXSMILES HOT 4
- Obtain list of matched pairs with common core from an ID. HOT 4
- Automatically get MMPs for a given data set HOT 4
- Turning on --property flag leading to a smaller number of transformed structures
- Unexpected generated molecule with double cuts MMPDB HOT 4
- Possible SQL injection vulnerability HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mmpdb.