Code Monkey home page Code Monkey logo

containers-rules-manuscript's Introduction

Recommendations to package and containerize bioinformatics software

Build Status

Description

This repository contains the manuscript entitled: Recommendations to package and containerize bioinformatics software. Our aim here is to describe how the bioinformatics community can produce better software containers and package to improve reducibility of original results.

  • Pre-print: Pending
  • Final Paper: Pending

Commenting and contributing

Feel free to comment, fork and pull-request the current version of the manuscript. If you want to discuss issues or topics around the manuscript please feel free to use the issues and/or pull requests.

Feel free to browse through existing/past issues and if one seems related, comment on it. If no existing issue seems appropriate, a new issue can be opened to discuss the suggestion. In particular, we would appreciate discussing more substantial changes (for example suggestion of new rules) in a dedicated issue before sending a pull request.

If you are new to Git, read the manuscript or Quick Guidelines to Git and GitHub - your input would be most valuable.

If, based on your contribution, you would like to be added as a co-author, please open an issue and provide your name and affiliation and a short description of your contribution or a link to the relevant issue and pull request.

Conversion to Ansible Doc

  • Any modifications to the text should be made to the manuscript.adoc file. This file is then converted to pdf and doc files automatically using pandoc and included in the main tex file.

Build document

Please be sure you have installed Docker. Then you can run the following command:

bash build.sh

A folder manuscript-draft would be created.

Disclaimer

The authors have no affiliation with Docker or Conda, nor any commercial entity mentioned in this article. The views described here reflect our own views without input from any third party organisation.

containers-rules-manuscript's People

Contributors

bgruening avatar biomadeira avatar blankenberg avatar hmenager avatar hroest avatar manabuishii avatar mr-c avatar osallou avatar pcm32 avatar rajido avatar susheel avatar timosachsenberg avatar vdda avatar ypriverol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

containers-rules-manuscript's Issues

Do not impose hard paths

I would also add some notes to not impose the usage of hard path when using containerised software. I've seen a lot of people expecting data to be mapped to a specific directory in the container e.g. /data or /inputs. This is a bad practice because limits the portability of the container and it makes hard to re-use it. Along the same line it should be avoided the use of custom WORKDIR in the container definition.

As general advice the container execution should be transparent, in the meaning the containers software should behave irrespective of the container usage, said in other term, the user should be able to use the containerised software independently the container usage.

Note about "One tool, one container"

Just some thoughts about the first point "One tool, one container".

While this is a good practice to maximise the containers usability and makes perfect sense in some scenarios, like for example the BioContainers project, in other contexts it's a principle to abstract IMO. In real world applications it's need to compose many tools together, just think bwa .. | samtools .. | etc.

In this scenario, having a a tool per container is a limiting factor that prevents that idiom or more in general the use of multiple tools in the same task.

Facetious Issue: GDPR Compliance

Based on a recent EBI internal conversation about source code authorship in Git repositories - Is our recommendation to add the maintainer label compliant with the new GDPR regulations? Data that will be stored will be:

  • Full name
  • Email Address

More a question for the Bioinformatics container registries (bio.tools and biocontainers.pro)

It could be argued as a legitimate interest to ask for this information - for the sake of reproducibility, but will we need to comply with the right to be forgotten directive which would mean deleting all maintainer labels of the user even for downstream multi-stage builds.

I'm not assuming the right to be forgotten trumps legitimate interest, but do we have a legitimate reason for processing this information. We could argue "we could keep personal information indefinitely in the public interest. We would need to define why it is in the public interest.

No Data

No data should be included in the container.

Upload your container image to a public registry or collection

A container should always distributed along with the Dockerfile/recipe used to create it, for transparency and documentation purpose.

However the availability of the Dockerfile does not guarantee the reproducibility of the container images, and consequently, of the associated data-analyses. When re-creating a container image, one or more software packages can be not more available.

To protect against software decay upload your container images to a public registry such as DockerHub or Quay. Even better, use community a collection such as BioContainers which manage the versioning and the long the archiving of container images.

Multi-stage builds

I think that multi-stage builds [1] actually solve and make most of these recommendation redundant:
https://github.com/ypriverol/containers-rules-manuscript/blob/master/manuscript.adoc#6-reduce-the-size-of-your-container-as-much-as-possible

I consider some of these to be actually harmful, combining multiple RUN commands makes the files hard to read and debug. Since multi-stage builds are actually used to specifically address point 6, I think we should recommend them or at least add them to the list of suggestions

  1. https://docs.docker.com/develop/develop-images/multistage-build/#name-your-build-stages

Suggestion to reorder recommendations

Just an enhancement suggestion to reorder recommendations to support a logical flow of a typical Dockerfile. The idea would be to have a example (Box) for each recommendation and have a final example box that brings every recommendation together at the thirteenth recommendation Provide reproducible builds

New suggested order:

  1. Choose base image wisely
    FROM biocontainers/biocontainers:v1.0.0

  2. Tool and container versions should be explicit

LABEL base_image="biocontainers:v1.0.0"
LABEL version="3"
LABEL software="Comet"
LABEL software.version="2016012"
  1. [Proposal] Add appropriate LABELs to point to software documentation, keywords and tags (Can be merged in existing recommendations)
LABEL about.summary="an open source tandem mass spectrometry sequence database search tool"
LABEL about.home="http://comet-ms.sourceforge.net"
LABEL about.documentation="http://comet-ms.sourceforge.net/parameters/parameters_2016010"
LABEL extra.identifiers.biotools="comet"
LABEL about.tags="Proteomics"
  1. Check the license of the software and add Maintainer information
LABEL about.license_file="http://comet-ms.sourceforge.net"
LABEL about.license="SPDX:Apache-2.0"
LABEL maintainer="Felipe da Veiga Leprevost <[email protected]>"
  1. [Proposal] Use ARG for build-time and ENV for runtime evironment variables (Can be merged in existing recommendations)
ARG COMMET_VERSION="2016012"
ENV PATH /home/biodocker/bin/Comet:$PATH
  1. [Proposal] Add explicit WORKDIR (Can be merged in existing recommendations)
    WORKDIR /data/

  2. Reduce the size of your container as much as possible

RUN ZIP=comet_binaries_${COMMET_VERSION}.zip && \
  wget https://github.com/BioDocker/software-archive/releases/download/Comet/$ZIP -O /tmp/$ZIP && \
  unzip /tmp/$ZIP -d /home/biodocker/bin/Comet/ && \
  chmod -R 755 /home/biodocker/bin/Comet/* && \
  rm /tmp/$ZIP

Note the use of build time ARGs in the RUN process

  1. Relevant tools and software should be executable and in the PATH
RUN mv /home/biodocker/bin/Comet/comet_binaries_${COMMET_VERSION}/comet.${COMMET_VERSION}.linux.exe /home/biodocker/bin/Comet/comet
  1. Document the build files
    TODO: Mention the possibility of interspesing the Dockerfile with # comments

  2. Add functional testing logic

  3. Avoid using ENTRYPOINT

  4. Provide helpful usage message via CMD

  5. Provide reproducible builds

  6. Make your package or container discoverable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.