Code Monkey home page Code Monkey logo

Comments (22)

abng88 avatar abng88 commented on July 20, 2024 3

At Manifold we have a fork of this repo where we scaffold out a Dockerfile and docker compose configuration when instantiating new projects. This has worked really well for us being able to leverage the DSCC while benefiting from consistent dev environments via Docker. https://github.com/manifoldai/docker-cookiecutter-data-science

from cookiecutter-data-science.

pjbull avatar pjbull commented on July 20, 2024 1

Sorry for delay here. I think that if we added a Dockerfile it would just do the following:

  • Copy the repo to the docker box
  • Install the requirements

This is probably all we need, though we should pick miniconda or miniconda3 based on what the python interpreter user selects.

FROM continuumio/miniconda3

COPY . /{{ cookiecutter.repo_name }}

RUN pip install -r /{{ cookiecutter.repo_name }}/requirements.txt

I don't think that we'll add support for any particular database to this project since there is so much variation in what people use.

I've also added the Needs Option label to indicate this should be run if a user indicates they want a Dockerfile (instead of by default).

We probably want to conditionally add some make commands for docker:

docker-build:
    docker build -t {{cookiecutter.project_name}}:latest .

docker-bash:
    docker run -it {{cookiecutter.project_name}}:latest

docker-jupyter:
    docker run -i -t {{cookiecutter.project_name}}:latest -p 8888:8888 /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser"

This all needs implementation and testing, if someone want to give it a shot.

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024 1

Thanks @pjbull . snakemake looks promising and I might change to it.

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

from cookiecutter-data-science.

pjbull avatar pjbull commented on July 20, 2024

Can you explain your use case?

It doesn't make sense to create a docker image that just does a pip install. If you want your analysis to be reproducible, a Docker image is a good way to package up code + environment. If you don't add the code to the docker build as well, then the image is not really useful for reproducibility.

Beyond that (1) you can edit your code locally and build your Docker image to include your local edits. (2) you can edit your code directly on the Docker machine in most IDEs and text editors (e.g., https://blog.jetbrains.com/pycharm/2015/06/feature-spotlight-editing-remote-files/ )

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

from cookiecutter-data-science.

pjbull avatar pjbull commented on July 20, 2024

Gotcha, seems like we have two different use cases here:

(1) Docker as deployment/reproducibility mechanism.
(2) Docker as a virtual environment for local development.

I believe the right design here is to use COPY or ADD in the Dockerfile to include the project code in the container. Then, have a make command that uses docker run with the -v flag to mount the local version in the container. That way, the project code always gets built into the image. But, if you're running locally with -v, you can edit/debug.

Would that work for you?

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

from cookiecutter-data-science.

pjbull avatar pjbull commented on July 20, 2024

Which part are you talking about?

We don't need docker compose since there is only one container. Other than that, looks like they use two different Dockerfiles. One for deployment, one for development. That seems worse in that you'll have to maintain them separately...

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

What's the difference between them?

from cookiecutter-data-science.

isms avatar isms commented on July 20, 2024

@chananshgong Having used the Docker setup in cookiecutter-django quite a bit, I tend to see it like @pjbull. Namely, in a Django development context this sharing the code directory from host to container is completely driven by the idea of the manage.py runserver doing livereload. In other words, it's just a local development workaround so that you can keep using your other containers (e.g. Postgres) while avoiding building the django container each time you make a code change.

Maybe it's just confusion but I'm not seeing the use case for data science projects? Aside from treating a container as a virtualenv, what further purpose would it solve if there are no other containers? And if there are other containers, what might they be? And is that a use case that many people have?

The last one is especially relevant when it comes to making changes to the base cookiecutter; nothing stops you from implementing this on a per project basis, though -- can you (or any other readers feel free to weight in!) make a case for doing this as an option in the cookiecutter?

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

from cookiecutter-data-science.

isms avatar isms commented on July 20, 2024

@chananshgong I agree that Docker is useful for many things. The question wasn't whether Docker can be used for data science projects but rather the following:

Is this use case widespread enough to justify changing everyone's base setup, or should individuals consider changing it for their own projects?

I could potentially see adding an example Dockerfile which is FROM python and then RUN pip installs the requirements or something, but each project will be so different that I don't see this as an improvement over the user adding a specific Docker configuration for what they are trying to do.

For example:

  • Many (most?) users will only be doing exploratory work in this structure and aren't interested in the deployment story
  • Many users would want Anaconda or some other Python distribution
  • Many users would want a specific Linux base image so that they can install other packages

Can you help us understand the broader argument for doing this by expanding on the question highlighted above?

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

from cookiecutter-data-science.

isms avatar isms commented on July 20, 2024

Again, the question wasn't whether Docker is useful in general but whether adding some specific configuration in this project (i.e. some example file) can benefit everyone.

I'm going to step out of this discussion and let the community weigh in on this issue over the coming weeks/months to see if there is sufficient interest.

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

from cookiecutter-data-science.

isms avatar isms commented on July 20, 2024

I understand, but please be concrete - can you give me an example of a Dockerfile that should be added?

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

from cookiecutter-data-science.

pjbull avatar pjbull commented on July 20, 2024

This seems substantively identical to the first comment on this issue:
#62 (comment)

As stated there, adding a Dockerfile will need:

  • implementation as optional feature
  • documentation
  • testing

from cookiecutter-data-science.

chananshgong avatar chananshgong commented on July 20, 2024

Sure, I will work on such a pull-request.
But can you please tell me your opinion on make.py (using click) instead of Makefile ?

from cookiecutter-data-science.

pjbull avatar pjbull commented on July 20, 2024

We won't switch away from make as part of this change. Feel free to open another issue for make alternatives. We've been experimenting with snakemake, which may be a good fit.

from cookiecutter-data-science.

isms avatar isms commented on July 20, 2024

Closing as dupe of #13 since we don't really support or not support "wrapper" technologies like Docker but could certainly include an example.

from cookiecutter-data-science.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.