Code Monkey home page Code Monkey logo

gwas's People

Contributors

bayramakdeniz avatar espenhgn avatar ofrei avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

rzetterberg

gwas's Issues

Conceptual question: which tools fit together into one container?

What is our policy for putting together multiple tools into one container? Obviously all potential tools won't fit into one container. I see pros and cons of distributing tools into multiple containers:

Pros:

  • it's good to keep the size of container low (in terms of GB it occupy on disk)
  • it's faster to re-build container from it's Docker file
  • it helps to avoid conflicting dependencies

Cons:

  • if software share the same pre-requisites (i.e. standard python or R packages) it's good to place it to one container.
  • if software is just a binaries (i.e. plink, king, gcta, metal, etc) it's convenient for the user to have it in one container.

I hope it won't be too much work to re-arrange tools into multiple containers once we learn what fits together.

Develop TSD-specific instructions for singularity containers

It's great to develop TSD-specific instructions about running these containers on TSD.
For specific projects (i.e. p33 and p697) we can have an official location within those project that hosts an up to date version of these singularity containers.

docker build failed

I'm able to docker pull bayramalex/all_analysis, and use this container. However docker build failed:

docker build .
Sending build context to Docker daemon  201.7kB
Step 1/43 : FROM 'ubuntu:18.04'
 ---> 56def654ec22
Step 2/43 : ENV TZ=Europe
 ---> Using cache
 ---> 961ed1ff64bb
Step 3/43 : ENV DEBIAN_FRONTEND noninteractive
 ---> Using cache
 ---> 74d871c6e530
Step 4/43 : RUN apt-get update && apt-get install -y  --no-install-recommends apt-utils    python3     python3-pip     tar     wget     unzip     git    libgsl0-dev    perl     &&     rm -rf /var/lib/apt/lists/*
 ---> Running in 0b176c189532
Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease
  Temporary failure resolving 'security.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu bionic InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Reading package lists...
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-updates/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-backports/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/bionic-security/InRelease  Temporary failure resolving 'security.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
Package perl is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  perl-base

Package apt-utils is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  apt

E: Package 'apt-utils' has no installation candidate
E: Unable to locate package python3
E: Unable to locate package python3-pip
E: Unable to locate package wget
E: Unable to locate package unzip
E: Unable to locate package git
E: Unable to locate package libgsl0-dev
E: Package 'perl' has no installation candidate
The command '/bin/sh -c apt-get update && apt-get install -y  --no-install-recommends apt-utils    python3     python3-pip     tar     wget     unzip     git    libgsl0-dev    perl     &&     rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100

Pin software versions

In the current Dockerfile recipes and bash installer files, versions of different tools are (usually) not pinned. Thus a (re)built container will likely differ from day to day, in particular, if packages are installed from sources like conda-forge and similar where updates are frequent.
Ideally, versions should explicitly be pinned in the recipes, e.g., like

FROM buildpack-deps:focal

RUN apt-get update && \
    apt-get install --no-install-recommends -y \
    cmake=3.16.3-1ubuntu1 \
    python3-dev=3.8.2-0ubuntu2
    ....
RUN pip install h5py==2.10.0 && \
    pip install git+https://github.com/NeuralEnsemble/parameters@b95bac2bd17f03ce600541e435e270a1e1c5a478#egg=parameters \
    ...
RUN git clone --depth 1 -b v3.1 https://github.com/nest/nest-simulator /usr/src/nest-simulator && \
    # compile
    ...

The above is just taken from another project of mine (complete example: https://github.com/LFPy/LFPykernels/blob/main/Dockerfile).

Version pinning is also a best practice suggested by Dockerfile linting tools like Hadolint (https://hadolint.github.io/hadolint/).

Clean folder structure within container

Currently all tools are placed in the root of the container:
image
I think it's best to have a separate folder for each tool, perhaps under a common top-level folder such as /tools. This would correspond to a structure like this:

/tools/flashpca-build   
/tools/generic-metal    
/tools/HDL                   
/tools/gctb_2.02_Linux  
/tools/htslib 
/tools/bcftools
/tools/vcftools
/tools/qctool_v2.0.6-Ubuntu16.04-x86_64
/tools/python_convert
/tools/BOLT-LMM_v2.3.4 

Further, I suggest to remote from the path things like "generic-", "_Linux" "-Ubuntu16.04-x86_64", to have a clean name of the tool. As for the versions, I think it's good to setup a specific nomenclature, i.e. similar to how it's done on TSD:
image

Automate builds with travis-ci

To avoid breaking docker build I think it's helpful to setup travis-ci builds. Practically this is done by (1) adding a .travis.yml file in the root of your repository, describing what needs to be done for every commit; (2) creating an account at https://travis-ci.org/ to manage your builds .

For you repo I think it makes sense to do docker build, just to see if it fails.
Here is some docker-specific info at travis-ci
https://docs.travis-ci.com/user/docker/

Structure README.md file(s)

I suggest to split README.md file into several files, for example:

  • into.md with general info about docker, singularity, interactive / passive modes, mounting data to containers, etc. Basically, everything users need to know about docker and singularity that is not specific to your containers.
  • One file per each tool. For now I suggest we pick one tool, i.e. plink, and finalize it's readme file - and then proceed to other tools.
  • root-level README.md file, serving as list of contents - and containing links to all other README files.

The readme file for each tool may have

  • usage example
  • version of the tool
  • link to it's original help
  • overview of demo data included (and it's location)

add "--no-home" to all singularity commands

I suggest may want to add --no-home to our singularity commands, or at least clarify this in the README file.
I see there is also --pwd, and some interference between this commands ( apptainer/singularity#4077 ).

By default singularity mounts home directory, thus all software deployed in user's home folder may interfere with software provided by the container.

Test our deep learning software stack

I've made a quick container with tensorflow and other deep learning packages:
https://github.com/comorment/gwas/blob/main/containers/py3ml/Dockerfile
It builds but I haven't tested in on TSD. For that we can use p33-appn-norment01 which has GPU installed
(there was a problem with nvidia drivers on that GPU - I'm not sure if this is already resolved).
Here is a useful link about GPU support in Docker containers:
https://towardsdatascience.com/how-to-properly-use-the-gpu-within-a-docker-container-4c699c78c6d1

MiXeR container

@bayramakdeniz I've tried to put MiXeR (https://github.com/precimed/mixer) into our python3 container - here is a quick change to the Docker file: 3888cfc

However, there is a weird issue. Native code of the MiXeR is compiled into shared object (/tools/mixer/src/build/libbgmg.so). Then this shared object is used from python using ctypes library. To test out that libbgmg.so is compatible with python you may start python, and try these commands

import ctypes
ctypes.CDLL('/tools/mixer/src/build/lib/libbgmg.so')

Currently this gives an error.
Could you compile on your machine and test if this works?

Learn how to expose webservices, i.e. Jupyter Notebook

I'm fairly sure both singularity and docker can expose a webservice, i.e. something like Jupyter Notebook that is running within container, but you access it from your host via browser. Can we have include instructions on how to use this, for example with Jupyter Notebook?

Move this git repo to its permanent location

I don't have strong opinion about where this should go, feel free to decide.
This could be https://github.com/comorment , to highlight that these tools are developed as part of comorment.
Or this could be in your own github account, i.e. https://github.com/bayramakdeniz/gwas . I think keeping under your account also helps to promote your package on github, because people trust that you are responsible for maintaining it. This is how it's done for plink ( https://github.com/chrchang/plink-ng/ ).

My only suggestion is to have a short name (i.e. https://github.com/bayramakdeniz/gwas ), don't use underscores, and only use lowercase in the repo name.

MATLAB within docker and/or singularity

It would be good to find how to package MATLAB software into a docker and/or singularity container. I think this can be achieved with matlab compiler (https://se.mathworks.com/products/compiler.html ) and matlab runtime ( https://se.mathworks.com/products/compiler/matlab-runtime.html ). This would be easier to explore using some simple matlab program, and a separate docker container - just to test this particular issue of running matlab within docker / singularity.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.