lincc-frameworks / python-project-template Goto Github PK
View Code? Open in Web Editor NEWPython project best practices for scientific software
Home Page: https://lincc-ppt.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
Python project best practices for scientific software
Home Page: https://lincc-ppt.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
It would be good to include some information about what to expect if moving from a setup.py file to pyproject.toml file.
And explain what directory structure is "necessary" for the various components to work together. i.e. pylint expects there to be a ./src
directory for linting.
Should the author of our libraries be the individuals, or LINCC-Frameworks?
Disclaimer I don't know if this exists
It would be very nice if we could find something that will allow us to record the time to run a test as a proxy for code performance.
Ideally this could be applied to the smoke test automation so that dormant projects will be monitored as well.
The specific example where this would have been valuable:
The project FlexCode
takes a dependency on xgboost
, but didn't define the specific version of the dependency.
The xgboost version was updated from 0.9 to 1.0, and that introduced a significant increase in FlexCode
run times.
This went completely unnoticed until someone asked, "hey isn't this taking a lot longer to run???"
If there was a smoke test that recorded test run time in addition to pass/fail, then at least there would have been an indicator of the problem.
Want to make sure that readthedocs gets updated automatically.
A makefile should be present to allow users to build docs locally. Currently, if a user wanted to create one for the existing template, they'd likely have to copy one over and modify it as needed. We can protect from potential confusion by just having a predefined file ready to go.
One note, is that a default Makefile (e.g. one made by sphinx-quickstart) expects the documentation source files to live in a source directory. So this default should either be changed in the makefile, or the doc source files should be moved into a source/ folder. The advantage of the latter is that it seems more like the standard.
Check to see if a new issue is automatically added to lincc-frameworks project tracker.
Currently there is a GitHub action that will hydrate two test projects (one uses the default responses to the questions, the other provides some non-default responses)
However, those tests were introduced before the template allowed the user to make branching choices (i.e. black vs. pylint) that result in different files or file content.
It would be nice to be able to confirm via something like integration or unit tests that the various responses result in a hydrated test projects that contain the correct files or file contents.
It's too wordy and complicated. Needs a tl;dr; section at the top.
Add pytest coverage to the list of dev dependencies, pre-commit hook, and GitHub workflow.
In the hydrated project we install the package with pip install .
But in the template tests, we run pop install -e . And then run the pytest step differently as well.
If we could execute the hydrated template CI as part of the template CI tests that would be ideal, and prevent us from having to keep the two in sync.
Not strictly necessary but a nice to have.
vscode does this and it does no harm that I know.
Moving the template into a subdirectory, ./python_project_template
will allow us to separate metadata and tests for the template from metadata and tests that are the template.
For example. We want this project to have a README.md file, but we don't want to populate a new project with the same README.md file. Instead, we'll use the a ./python-project-template/README.md
as the hydrated README file. Copier has a configuration that allows defining the template directory, so we can use that to separate the template itself from its metadata.
Need to fix:
--*/tooth-pile(9=14)$@!
.Currently it seems that readthedocs builds don't work as expected. See this build: https://readthedocs.org/projects/hipscat/builds/19413701/
Teach me to use GitHub Projects first thing in the morning. Nothing here!
The precommit hook that is meant to clear output from notebooks isn't discovering the notebooks in the nb/ directory.
Current isort and pylint look at all files recursively from the root directory. Need to specify only to look in ./src directory.
As it stands, the user of the template must pip install .
or pip install .[dev]
in order for changes to the source code to be discoverable by tests. Thus if a user add a src method, and then immediately adds a test for it, the test will fail because the new method has not been packaged.
In some sense this is a good thing because it means that what's being tested is what would be deployed. But at the same time, it means that every small change requires a pip install
in order to be tested.
One was around this is to include a .env
file in the template with the contents:
PYTHONPATH=src
.
This .env file will be picked up by VSCode (at least) and presumably other editors as well. And it means that tests run by executing the code in the src
directory instead of the site-packages
directory when using the editor for testing.
Using the terminal still requires that the code be pip installed for the tests to work.
Currently the CI tests only cover Python 3.10. It should be easy enough to expand the test matrix to include python 3.8-3.10.
GitHub documentation about test matrix: https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs#example-using-a-multi-dimension-matrix
An example of using multiple different versions of Python for testing: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#using-the-python-starter-workflow
Likely this is the diff that will be necessary, but need to test to be sure:
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index e17de1c..0dd5d30 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -13,6 +13,7 @@ jobs:
strategy:
fail-fast: true
matrix:
+ python-version: ['3.8', '3.9', '3.10']
include:
- name: Base example
@@ -29,7 +30,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v2
with:
- python-version: '3.10'
+ python-version: ${{ matrix.python-version }}
- name: Install Python dependencies
run: |
sudo apt-get update
Users could become apprehensive about using the template if they have the answer a bunch of questions, and they don't feel like they know the consequences of the answers, or what they will be used for. i.e. "Am I locking myself into something that I can't undo by answering black vs pylint vs none?" "Is it important that I get the package name just right the first time?"
We should provide some additional, preferably structured documentation (in readthedocs) for each of the questions to let the users know things like:
Those are a lot of point, and may not all be necessary. We also don't want to overwhelm the user with documentation. So conciseness would be good too 🤷
Can we implement this without needing an api key? If we do should we just leave instructions for the users?
This is a great example to work from: https://github.com/JamesALeedham/Sphinx-Autosummary-Recursion#integrating-jupyter-notebooks-with-sphinx
It would be nice to have an example output from using the template as a parallel repo.
It would be really nice if someone who isn't related to the project were to do it. Just as a way to make sure that the project template works as expected. Look out for missing documentation, provide feedback, etc.
Currently the testing is done manually, we should find a way to automate it.
Just to have a simple test that actually runs.
A validator was added in PR#46 that disallows hyphens in project names or package names. This check makes sense in python module names, but hyphens are common in project names (pytest-cov, pre-commit, hipscat-import for a couple examples).
Want to introduce a pre-commit hook to run cd docs; make html
and confirm that it runs. Nothing else really beyond that. The documentation will actually be built by ReadTheDocs, we just need to make sure that it can be built.
It would be really nice to have some way to track who is using this template.
Tracking the number of forks of the repo is one way, but most users won't use the template in that way.
One alternative is to include a page in the Sphinx-rendered documentation. But again, not every user will publish a page on read the docs, and even if they did, it would be difficult to search/track.
Another alternative is to include something in the main README, or in a separate README, outside the root folder.
Encouraging the use of a GitHub tag might be good too, but unreliable.
Wait until we get some user feedback/discuss with the team before piling on more linting tools.
For reference:
Expected behavior is that we should be able to create a new tag or release in GitHub, and that setuptools_scm would automatically populate a _version.py
file with version information.
The pre-commit hooks are great, but it might be a good idea to have a check to refuse a pull request with style errors if a user has the hooks setup incorrectly.
The shorter name is easy to miss. What is the advantage of the short versus long name?
The instructions here (https://git-lfs.com/) make it look like this would be a pretty straightforward operation to perform when the template is hydrated. We could add the git lfs install
step as one of the "tasks" in copier.yml
, and include a .gitattributes
file in the template that includes some common file types.
It would also be wise to add some documentation in the readme about what it is, why it's useful, and pointing to the git-lfs documentation.
Extra bonus would be to make this an optional feature that that would be included by default. But I kind of like just making it available out of the box.
We should include information for citing software in publications.
I don't know what this looks like, but once we figure it out, it would be good to include in the template.
Line width is all over the place.
Could we, maybe as part of a smoke test, determine if there's a new version of the template a project was based on, and spit out a warning?
Why? I'd want to keep projects from getting stale and getting far behind what we've determined are best practices.
Alternatively, can we generate a list of projects that have used the template and which version they're updated to, to run copier update
on them after updating the template version?
This will make it more clear if a failed build is due to linting problems or failed packaging or tests.
MIT, gnu, ...?
Verify that this is added to the lincc-frameworks project automatically.
Melissa ported the installation and usage instructions to readthedocs here: https://lincc-ppt.readthedocs.io/en/latest/
This makes most of the main README file duplicative and prone to getting out of date. We should significantly reduce what is in the README file and direct users to the readthedocs page for all the detailed instuctions.
For the most part this will only affect people bringing the the template into an existing project. But it would be nice to give people the option from the beginning.
I think that the most technically challenging part would be the logical switch between black vs. pylint vs. any other linter.
Capturing some findings from freaky fixit + discussion w/drew:
pipx ensurepath
, to get copier working. Popped up as a warning when running, but may be worth an explicit warning in the docs for naive pipx usersA declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.