Comments (22)
At Manifold we have a fork of this repo where we scaffold out a Dockerfile and docker compose configuration when instantiating new projects. This has worked really well for us being able to leverage the DSCC while benefiting from consistent dev environments via Docker. https://github.com/manifoldai/docker-cookiecutter-data-science
from cookiecutter-data-science.
Sorry for delay here. I think that if we added a Dockerfile
it would just do the following:
- Copy the repo to the docker box
- Install the requirements
This is probably all we need, though we should pick miniconda
or miniconda3
based on what the python interpreter user selects.
FROM continuumio/miniconda3
COPY . /{{ cookiecutter.repo_name }}
RUN pip install -r /{{ cookiecutter.repo_name }}/requirements.txt
I don't think that we'll add support for any particular database to this project since there is so much variation in what people use.
I've also added the Needs Option
label to indicate this should be run if a user indicates they want a Dockerfile
(instead of by default).
We probably want to conditionally add some make commands for docker:
docker-build:
docker build -t {{cookiecutter.project_name}}:latest .
docker-bash:
docker run -it {{cookiecutter.project_name}}:latest
docker-jupyter:
docker run -i -t {{cookiecutter.project_name}}:latest -p 8888:8888 /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser"
This all needs implementation and testing, if someone want to give it a shot.
from cookiecutter-data-science.
Thanks @pjbull . snakemake
looks promising and I might change to it.
from cookiecutter-data-science.
from cookiecutter-data-science.
Can you explain your use case?
It doesn't make sense to create a docker image that just does a pip install. If you want your analysis to be reproducible, a Docker image is a good way to package up code + environment. If you don't add the code to the docker build as well, then the image is not really useful for reproducibility.
Beyond that (1) you can edit your code locally and build your Docker image to include your local edits. (2) you can edit your code directly on the Docker machine in most IDEs and text editors (e.g., https://blog.jetbrains.com/pycharm/2015/06/feature-spotlight-editing-remote-files/ )
from cookiecutter-data-science.
from cookiecutter-data-science.
Gotcha, seems like we have two different use cases here:
(1) Docker as deployment/reproducibility mechanism.
(2) Docker as a virtual environment for local development.
I believe the right design here is to use COPY
or ADD
in the Dockerfile
to include the project code in the container. Then, have a make
command that uses docker run
with the -v
flag to mount the local version in the container. That way, the project code always gets built into the image. But, if you're running locally with -v
, you can edit/debug.
Would that work for you?
from cookiecutter-data-science.
from cookiecutter-data-science.
Which part are you talking about?
We don't need docker compose since there is only one container. Other than that, looks like they use two different Dockerfile
s. One for deployment, one for development. That seems worse in that you'll have to maintain them separately...
from cookiecutter-data-science.
What's the difference between them?
from cookiecutter-data-science.
@chananshgong Having used the Docker setup in cookiecutter-django quite a bit, I tend to see it like @pjbull. Namely, in a Django development context this sharing the code directory from host to container is completely driven by the idea of the manage.py runserver
doing livereload. In other words, it's just a local development workaround so that you can keep using your other containers (e.g. Postgres) while avoiding building the django
container each time you make a code change.
Maybe it's just confusion but I'm not seeing the use case for data science projects? Aside from treating a container as a virtualenv, what further purpose would it solve if there are no other containers? And if there are other containers, what might they be? And is that a use case that many people have?
The last one is especially relevant when it comes to making changes to the base cookiecutter; nothing stops you from implementing this on a per project basis, though -- can you (or any other readers feel free to weight in!) make a case for doing this as an option in the cookiecutter?
from cookiecutter-data-science.
from cookiecutter-data-science.
@chananshgong I agree that Docker is useful for many things. The question wasn't whether Docker can be used for data science projects but rather the following:
Is this use case widespread enough to justify changing everyone's base setup, or should individuals consider changing it for their own projects?
I could potentially see adding an example Dockerfile
which is FROM python
and then RUN pip install
s the requirements or something, but each project will be so different that I don't see this as an improvement over the user adding a specific Docker configuration for what they are trying to do.
For example:
- Many (most?) users will only be doing exploratory work in this structure and aren't interested in the deployment story
- Many users would want Anaconda or some other Python distribution
- Many users would want a specific Linux base image so that they can install other packages
Can you help us understand the broader argument for doing this by expanding on the question highlighted above?
from cookiecutter-data-science.
from cookiecutter-data-science.
Again, the question wasn't whether Docker is useful in general but whether adding some specific configuration in this project (i.e. some example file) can benefit everyone.
I'm going to step out of this discussion and let the community weigh in on this issue over the coming weeks/months to see if there is sufficient interest.
from cookiecutter-data-science.
from cookiecutter-data-science.
I understand, but please be concrete - can you give me an example of a Dockerfile
that should be added?
from cookiecutter-data-science.
from cookiecutter-data-science.
This seems substantively identical to the first comment on this issue:
#62 (comment)
As stated there, adding a Dockerfile will need:
- implementation as optional feature
- documentation
- testing
from cookiecutter-data-science.
Sure, I will work on such a pull-request.
But can you please tell me your opinion on make.py (using click) instead of Makefile ?
from cookiecutter-data-science.
We won't switch away from make
as part of this change. Feel free to open another issue for make
alternatives. We've been experimenting with snakemake, which may be a good fit.
from cookiecutter-data-science.
Closing as dupe of #13 since we don't really support or not support "wrapper" technologies like Docker but could certainly include an example.
from cookiecutter-data-science.
Related Issues (20)
- Announce v2 release HOT 1
- add documentation for running make on Windows HOT 8
- Make v1 template docs accessible in new docs
- Termynal markdown page should not be included HOT 1
- v2 release logistics checklist HOT 1
- ideas for documentation about just+pyproject.toml+mkdocs HOT 1
- Defend against broken paths from non-editable installs
- Document how to use the Python source code scaffolding
- Document conda-forge as a way to install make
- Option for Poetry support for package managent
- Update directory structure in README to reflect v2
- Add tag for v2 HOT 1
- Release package on conda-forge HOT 1
- Add badges to README
- config import fails when using V2 scaffolding HOT 2
- Consolidate linting and formatting to use ruff
- Add documentation about contributing and requesting tools
- Is there an example project? HOT 1
- Ruff lint configuration names are incorrect HOT 1
- Release tagging mechanics HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cookiecutter-data-science.