Hi, Any chance someone can add docker support + sql docker support like in <a href

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

docker support,about drivendataorg/cookiecutter-data-science

Comments (22)

abng88 commented on July 20, 2024 3

At Manifold we have a fork of this repo where we scaffold out a Dockerfile and docker compose configuration when instantiating new projects. This has worked really well for us being able to leverage the DSCC while benefiting from consistent dev environments via Docker. https://github.com/manifoldai/docker-cookiecutter-data-science

from cookiecutter-data-science.

pjbull commented on July 20, 2024 1

Sorry for delay here. I think that if we added a Dockerfile it would just do the following:

Copy the repo to the docker box
Install the requirements

This is probably all we need, though we should pick miniconda or miniconda3 based on what the python interpreter user selects.

FROM continuumio/miniconda3

COPY . /{{ cookiecutter.repo_name }}

RUN pip install -r /{{ cookiecutter.repo_name }}/requirements.txt

I don't think that we'll add support for any particular database to this project since there is so much variation in what people use.

I've also added the Needs Option label to indicate this should be run if a user indicates they want a Dockerfile (instead of by default).

We probably want to conditionally add some make commands for docker:

docker-build:
    docker build -t {{cookiecutter.project_name}}:latest .

docker-bash:
    docker run -it {{cookiecutter.project_name}}:latest

docker-jupyter:
    docker run -i -t {{cookiecutter.project_name}}:latest -p 8888:8888 /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser"

This all needs implementation and testing, if someone want to give it a shot.

from cookiecutter-data-science.

chananshgong commented on July 20, 2024 1

Thanks @pjbull . snakemake looks promising and I might change to it.

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

If you just copy the code into the docker you will not be able to use an IDE (e.g. pycharm) to edit the code, debug and test. Maybe we should just do the pip install thing and leave the code outside, assuming a shared folder with the host.

…

On Sun, Feb 26, 2017 at 8:00 PM, Peter Bull ***@***.***> wrote: Sorry for delay here. I think that if we added a Dockerfile it would just do the following: - Copy the repo to the docker box - Install the requirements This is probably all we need, though we should pick miniconda or miniconda3 based on what the python interpreter user selects. FROM continuumio/miniconda3 COPY . /{{ cookiecutter.repo_name }} RUN pip install -r /{{ cookiecutter.repo_name }}/requirements.txt I don't think that we'll add support for any particular database to this project since there is so much variation in what people use. I've also added the Needs Option label to indicate this should be run if a user indicates they want a Dockerfile (instead of by default). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXPZfv1-Vb3wZut3sIDJWfve6lN43kqGks5rgb3HgaJpZM4L31rv> .

-- Hanan Shteingart Data Scientist @ gong.io

from cookiecutter-data-science.

pjbull commented on July 20, 2024

Can you explain your use case?

It doesn't make sense to create a docker image that just does a pip install. If you want your analysis to be reproducible, a Docker image is a good way to package up code + environment. If you don't add the code to the docker build as well, then the image is not really useful for reproducibility.

Beyond that (1) you can edit your code locally and build your Docker image to include your local edits. (2) you can edit your code directly on the Docker machine in most IDEs and text editors (e.g., https://blog.jetbrains.com/pycharm/2015/06/feature-spotlight-editing-remote-files/ )

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

I am looking for a way to run code in a docker yet use the IDE for debugging. According to this blog: https://blog.jetbrains.com/pycharm/2015/12/using-docker-in-pycharm/, the pycharm Docker integration does the following: 1. First, it creates and starts a new container based on the image we named when creating the project interpreter. 2. *This container mounts your project directory into the container at /opt/project in the container. Note: On Linux, you currently have to perform this volume mapping manually.* 3. This container also mounts volumes needed for PyCharm to do its work: Python skeletons and Python library sources. 4. It then executes the run configuration’s Python command

…

On Sun, Feb 26, 2017 at 9:27 PM, Peter Bull ***@***.***> wrote: Can you explain your use case? It doesn't make sense to create a docker image that just does a pip install. If you want your analysis to be reproducible, a Docker image is a good way to package up code + environment. If you don't add the code to the docker build as well, then the image is not really useful for reproducibility. Beyond that (1) you *can* edit your code locally and build your Docker image to include your local edits. (2) you *can* edit your code directly on the Docker machine in most IDEs and text editors (e.g., https://blog.jetbrains.com/pycharm/2015/06/feature- spotlight-editing-remote-files/ ) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXPZfp9LKiU0rWo9rMQwpDak561w9BzJks5rgdIagaJpZM4L31rv> .

-- Hanan Shteingart Data Scientist @ gong.io

from cookiecutter-data-science.

pjbull commented on July 20, 2024

Gotcha, seems like we have two different use cases here:

(1) Docker as deployment/reproducibility mechanism.
(2) Docker as a virtual environment for local development.

I believe the right design here is to use COPY or ADD in the Dockerfile to include the project code in the container. Then, have a make command that uses docker run with the -v flag to mount the local version in the container. That way, the project code always gets built into the image. But, if you're running locally with -v, you can edit/debug.

Would that work for you?

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

I think that would work. Why don't we copy what other people did in the cookiecutter for django?

…

On Sun, Feb 26, 2017 at 10:51 PM, Peter Bull ***@***.***> wrote: Gotcha, seems like we have two different use cases here: (1) Docker as deployment/reproducibility mechanism. (2) Docker as a virtual environment for local development. I believe the right design here is to use COPY or ADD in the Dockerfile to include the project code in the container. Then, have a make command that uses docker run with the -v flag to mount the local version in the container. That way, the project code always gets *built into the image*. But, if you're running locally with -v, you can edit/debug. Would that work for you? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXPZfkzfNlDxFJ5ZBxrTux2cT4faXHWCks5rgeXagaJpZM4L31rv> .

-- Hanan Shteingart Data Scientist @ gong.io

from cookiecutter-data-science.

pjbull commented on July 20, 2024

Which part are you talking about?

We don't need docker compose since there is only one container. Other than that, looks like they use two different Dockerfiles. One for deployment, one for development. That seems worse in that you'll have to maintain them separately...

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

What's the difference between them?

from cookiecutter-data-science.

isms commented on July 20, 2024

@chananshgong Having used the Docker setup in cookiecutter-django quite a bit, I tend to see it like @pjbull. Namely, in a Django development context this sharing the code directory from host to container is completely driven by the idea of the manage.py runserver doing livereload. In other words, it's just a local development workaround so that you can keep using your other containers (e.g. Postgres) while avoiding building the django container each time you make a code change.

Maybe it's just confusion but I'm not seeing the use case for data science projects? Aside from treating a container as a virtualenv, what further purpose would it solve if there are no other containers? And if there are other containers, what might they be? And is that a use case that many people have?

The last one is especially relevant when it comes to making changes to the base cookiecutter; nothing stops you from implementing this on a per project basis, though -- can you (or any other readers feel free to weight in!) make a case for doing this as an option in the cookiecutter?

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

Docker is good for many things. For once, my host is windows based so many python packages don't compile. Docker, on the other hand, gives me Linux environment within a container. Moreover, making a production service out of the development stage is easier with Dockers as everything is self contained. It enables data scientists to supply a plug & play service which R&D can easily deploy. Last but not least, many environments are hard to set up (e.g. Kaldi) and have pre-built Docker images.

…

On Mar 13, 2017 04:27, "Isaac Slavitt" ***@***.***> wrote: @chananshgong <https://github.com/chananshgong> Having used the Docker setup in cookiecutter-django quite a bit, I tend to see it like @pjbull <https://github.com/pjbull>. Namely, in a Django development context this sharing the code directory from host to container is completely driven by the idea of the manage.py runserver doing livereload. In other words, it's just a local development workaround so that you can keep using your other containers (e.g. Postgres) to avoid building the django container each time you make a code change. Maybe it's just confusion but I'm not seeing the use case for data science projects? Aside from treating a container as a virtualenv, what further purpose would it solve if there are no other containers? And if there are other containers, what might they be? And is that a use case that many people have? The last one is especially relevant when it comes to making changes to the base cookiecutter; nothing stops you from implementing this on a per project basis, though -- can you (or any other readers feel free to weight in!) make a case for doing this as an option in the cookiecutter? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXPZfglQu61G7xl5LlSWBxECfW67w6Cxks5rlKmegaJpZM4L31rv> .

from cookiecutter-data-science.

isms commented on July 20, 2024

@chananshgong I agree that Docker is useful for many things. The question wasn't whether Docker can be used for data science projects but rather the following:

Is this use case widespread enough to justify changing everyone's base setup, or should individuals consider changing it for their own projects?

I could potentially see adding an example Dockerfile which is FROM python and then RUN pip installs the requirements or something, but each project will be so different that I don't see this as an improvement over the user adding a specific Docker configuration for what they are trying to do.

For example:

Many (most?) users will only be doing exploratory work in this structure and aren't interested in the deployment story
Many users would want Anaconda or some other Python distribution
Many users would want a specific Linux base image so that they can install other packages

Can you help us understand the broader argument for doing this by expanding on the question highlighted above?

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

Regarding your questions: (1) it can be left optinal (2) the docker image string for the Dockerfile FROM statement can be a parameter (in other words, what is the image the project is using as baseline environment) (3) see 2 (4) GPU is now supported using NVIDIA docker. The main arguments for using Docker for data science are described in these links: https://arxiv.org/pdf/1410.0846.pdf https://www.continuum.io/blog/developer-blog/anaconda-and-docker-better-together-reproducible-data-science https://blog.dominodatalab.com/the-real-value-of-containers-for-data-science/ https://medium.com/@hyonschu/how-docker-enables-faster-reproducible-data-science-32b8a8a7c794 p.s. I am playing with my own private cookiecutter project template which replace Makefile with a make.py file with click package for command line handling. This allows more complex make commands as well as removes the dependency on yet another script language to learn. Moreover, make is not natively supported by Windows' command line.

…

On Mon, Mar 13, 2017 at 3:51 PM, Isaac Slavitt ***@***.***> wrote: @chananshgong <https://github.com/chananshgong> I agree that Docker is useful for many things. The question wasn't whether Docker can be used for data science projects but rather the following: Is this use case widespread enough to justify changing everyone's base setup, or should individuals consider changing it for their own projects? I could potentially see adding an example Dockerfile which is FROM python and then RUN pip installs the requirements or something, but each project will be so different that I don't see this as an improvement over the *user* adding a specific Docker configuration for what they are trying to do. For example: - Many (most?) users will only be doing exploratory work in this structure and aren't interested in the deployment story - Many users would want Anaconda or some other distribution with binary support - Many users would want an specific Linux base image so that they can install other packages - Some users will be using GPUs Can you help us understand the broader argument for doing this by expanding on the question highlighted above? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXPZfkT8lkjCLAiEjGP6KwQk1SYSMZ46ks5rlUnEgaJpZM4L31rv> .

-- Hanan Shteingart Data Scientist @ gong.io

from cookiecutter-data-science.

isms commented on July 20, 2024

Again, the question wasn't whether Docker is useful in general but whether adding some specific configuration in this project (i.e. some example file) can benefit everyone.

I'm going to step out of this discussion and let the community weigh in on this issue over the coming weeks/months to see if there is sufficient interest.

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

The links I have listed discuss how Docker for data science provides an environment for reproducible data science which is, in my opinion, the main idea behind this cookie-cutter project.

…

On Mon, Mar 13, 2017 at 4:38 PM, Isaac Slavitt ***@***.***> wrote: Again, the question wasn't whether Docker is useful in general but whether adding some specific configuration in this project (i.e. some example file) can benefit everyone. I'm going to step out of this discussion and let the community weigh in on this issue over the coming weeks/months to see if there is sufficient interest. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXPZfhnjkKZ8k1fOLo_CBWyDjka_DedGks5rlVTpgaJpZM4L31rv> .

-- Hanan Shteingart Data Scientist @ gong.io

from cookiecutter-data-science.

isms commented on July 20, 2024

I understand, but please be concrete - can you give me an example of a Dockerfile that should be added?

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

Assuming docker image is conda based: ``` FROM {{ cookiecutter.docker_image_name }} RUN conda install -y jupyter # for supporting jupyter notebooks access from host ADD requirements.txt /tmp RUN conda install --yes --file /tmp/requirements.txt ``` Then in my make.py file I have: ``` @main.group() def docker(): """Docker related commands""" pass @docker.command() def build(): """Build project's docker image""" cmd_line = 'docker build -t %s:latest .' % REPO_NAME echo_and_run(cmd_line) @docker.command() def bash(): """Start project's docker bash""" cmd_line = 'docker run -v %s:/opt/project -it %s:latest' % (PROJECT_PATH, REPO_NAME) echo_and_run(cmd_line) @docker.command() def jupyter(): """Start project's docker container with jupyter server on port 8888""" cmd_line = "docker run -v %s:/opt/project -it -p 8888:8888 %s:latest jupyter notebook --notebook-dir=/opt/project/notebooks --ip='*' --port=8888 --no-browser --NotebookApp.token=''" % (PROJECT_PATH, REPO_NAME) echo_and_run(cmd_line) def echo_and_run(cmd_line): click.echo(cmd_line) os.system(cmd_line) ```

…

On Mon, Mar 13, 2017 at 5:03 PM, Isaac Slavitt ***@***.***> wrote: I understand, but please be concrete - can you give me an example of a Dockerfile that should be added? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXPZfpfXHW_2xV3i054xMvZ1tHKjHniRks5rlVqxgaJpZM4L31rv> .

-- Hanan Shteingart Data Scientist @ gong.io

from cookiecutter-data-science.

pjbull commented on July 20, 2024

This seems substantively identical to the first comment on this issue:
#62 (comment)

As stated there, adding a Dockerfile will need:

implementation as optional feature
documentation
testing

from cookiecutter-data-science.

chananshgong commented on July 20, 2024

Sure, I will work on such a pull-request.
But can you please tell me your opinion on make.py (using click) instead of Makefile ?

from cookiecutter-data-science.

pjbull commented on July 20, 2024

We won't switch away from make as part of this change. Feel free to open another issue for make alternatives. We've been experimenting with snakemake, which may be a good fit.

from cookiecutter-data-science.

isms commented on July 20, 2024

Closing as dupe of #13 since we don't really support or not support "wrapper" technologies like Docker but could certainly include an example.

from cookiecutter-data-science.

docker support about cookiecutter-data-science HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent