Code Monkey home page Code Monkey logo

jasen's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jasen's Issues

Create JSON megafile

Put all the JSON into one megafile.
The work lies in going through the file and assembling them in a way that's sensible.

Version output format

The output should contain a "version" field that describes the output format version. This would simplify both writing parsers and working with older files.

Clean up old files

Files and comments are going in every direction. Do a quick review of everything and rename/remove whatever isn't necessary anymore.

Establish & Link MRSA dataset

  • Package and upload to NGP (Isak to support)
  • Upload to SRA (requires agreement from each party)
  • Provide easily parsable results, in order to easily validate

Installation: All Singularity images retrievable without self-builds

I istalled Singularity using Conda.
Then I get this error:

(jasen) [JASEN]$ cd container && sudo bash -i build_container.sh && cd ..
[sudo] password for xxx:
Building tool chewbbaca to chewbbaca_2.8.5.sif
sudo: singularity: command not found
(jasen) [container]$ singularity version
3.8.6

Is it possible to install JASEN not being a sudo?

Input data validation script

Create script to verify the presence of fastq files, runtime settings file and sample metadata file (hypothetical). Also have the script perform basic sanity checks (fastq file is a fastq etc).

Iontorrent support

Since a few labs including the Swedish public health institute is using Iontorrent for sequencing, we need to support that.

  • adding feature to parse the value "iontorrent" in the .csv input file (the "platform" field).
  • optionally adding a field to the .csv file so that single ended (SE) illumina files are supported as well as PE
  • modifying the processes that uses fastq files so they can work with SE data
  • modifying parameters of de novo assemblers in order to optimize for the different error profile in iontorrent data. This is especially for SPAdes I think.

The error profile of Iontorrent data will likely give additional problems down the line, but these things should at least let us enable to analyse them.

Enforce limitation on characters in sample ids?

Should JASEN enforce limitations on how a sample ID should look to avoid potential issues with character encoding etc.

Limitations could be to limit ids to

  • ASCII characters
  • Minimum 5 characters long
  • Maximum 50 characters long

Results into json format script

Multiple tools don't produce json compatible output. In some cases it's just a matter of translating a tsv into a json, in some cases it is more complicated. Write small scripts to produce json compatible output for the following tools/steps/paths:

  • Decontamination/kraken2
  • Resistance lookup/ariba
  • Assembly QC/quast
  • Reference based QC/samtools, picard
  • SNP lookup/ bcftools etc.
  • MLST / tseeman's mlst implementation
  • cgMLST / chewbcca

Repository security with reviewing PR and unit testing

Despite being bureaucratic and slower, I think it woud be a good to implement reviewer approval rules i.e. at least one person must approve a PR before merging to the main branch. Also, utilising git actions to perform tests (linting and unit tests similar to that of nf-core would be elegant).

ariba prepareref FileNotFoundError

What needs to be done:

Essentially, the deploy_references.sh didn't indicate that the ariba prepareref command didn't work:
ariba prepareref -f nucleotide_fasta_protein_homolog_model.fasta --all_coding yes --force tmpdir
FileNotFoundError: [Errno 2] No such file or directory: 'card/00.info.txt'

Suggestions on how to get it done:

Catch it in bash and fail the installation process

What are the arguments for getting it done:

Easily obscured installation errors that can cost tons of debug time

Pipeline result processor (prp) unable to deal with novel mlst calls

When a new mlst is indentified (see below), prp throws an error as it expects an integer. Here is an example of a novel mlst output:

[
   {
      "filename" : "GMS18-fohm.fasta",
      "scheme" : "saureus",
      "id" : "GMS18-fohm.fasta",
      "alleles" : {
         "gmk" : "6",
         "glpF" : "8",
         "arcC" : "~862",
         "tpi" : "3",
         "yqiL" : "2",
         "pta" : "10",
         "aroE" : "14"
      },
      "sequence_type" : "-"
   }
]

Here is the error output:

Traceback (most recent call last):
  File "/data/bnf/sw/miniconda3/envs/jasen/bin/prp", line 33, in <module>
    sys.exit(load_entry_point('prp', 'console_scripts', 'prp')())
  File "/data/bnf/sw/miniconda3/envs/jasen/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/data/bnf/sw/miniconda3/envs/jasen/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/data/bnf/sw/miniconda3/envs/jasen/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/bnf/sw/miniconda3/envs/jasen/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/bnf/sw/miniconda3/envs/jasen/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/bnf/dev/ryan/JASEN/bin/pipeline_result_processor/prp/cli.py", line 102, in create_output
    res: MethodIndex = parse_mlst_results(mlst)
  File "/data/bnf/dev/ryan/JASEN/bin/pipeline_result_processor/prp/parse/typing.py", line 17, in parse_mlst_results
    result_obj = TypingResultMlst(
  File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for TypingResultMlst
alleles -> arcC
  value is not a valid integer (type=type_error.integer)

Clean up installation process

  • Ready for others to install/run
  • Merge PRs
  • Add standard reference databases
  • Write deployment scripts
  • Update instructions
  • Test run

--

Moving PubMLST and BLAST dbs outside of singularity image

Essentially, it would be nice to have support for PubMLST and BLAST databases to be held outside of the singularity image. Creating scripts to update the respective external databases would make updating them much easier. Alternatively, a script to update them within the image would be neat as well, if this at all possible. This should be modelled off the db updating process.

Automatic submission function for new MLST types

When encountering novel STs in the 7 locus MLST scheme it would be nice to be able to submit them directly via the REST API to pubmlst. I at least think it should be possible via https://bigsdb.readthedocs.io/en/latest/rest.html#post-db-submissions

There are two scenarios

  • Novel allele. It probably needs to be a two step operation where you first submit the new allele sequence, and then await the manual curation which usually happens within a week. And subsequently submit the new allele combination and wait for the next manual curation from the pubmlst admins.
  • Novel ST. Here one would skip step 1 in the previous description

All in all, it is an asynchronous operation where we have to await the manual curation in each step before we can pull the new data (also via the API) and then update our local db

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.