Code Monkey home page Code Monkey logo

metamags's People

Contributors

aclum avatar chienchi avatar hubin-keio avatar scanon avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

metamags's Issues

remove JGI mbin files from Docker folder

We do not have permission to distribute these files, they were added with commit id 14ec4da in June 2023. I have removed them in the master branch. We either need to update the mbin_nmdc task in the wdl such that it only requires Neha's container, njvarghese/mbin:v0.4, or we need to work with Neha to include mbin_stats.py, which I believe is NMDC code, in a new version of the container.

Error in package tasks

We are seeing this error in some runs...

Dumping /pscratch/sd/n/nmdcda/cromwell-executions/nmdc_mags/ab68c879-fcb8-412f-af79-8db74291bb31/call-package/execution/stderr
Traceback (most recent call last):
  File "/opt/conda/envs/mags_vis/bin/create_tarfiles.py", line 174, in <module>
    krona_plot(ko_result,prefix)
  File "/opt/conda/envs/mags_vis/bin/create_tarfiles.py", line 127, in krona_plot
    df = pd.read_csv(ko_result,sep="\t")
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in __init__
    self._open_handles(src, kwds)
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 229, in _open_handles
    errors=kwds.get("encoding_errors", "strict"),
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/io/common.py", line 614, in get_handle
    storage_options=storage_options,
  File "/opt/conda/envs/mags_vis/lib/python3.7/site-packages/pandas/io/common.py", line 396, in _get_filepath_or_buffer
    raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'NoneType'>

I've created a snapshot for testing in /global/cfs/cdirs/m3408/squads/mags/package_failure on perlmutter.

add eukcc component to binning workflow

JGI's eukcc database has been copied to /refdata/eukcc_db/eukcc2_db_ver_1.2

  • The mbin_nmdc task should be updated to use --eukccdb EUKCCDB to run eukcc.
  • mbin_nmdc task & workflow new output file eukcc_output/eukcc.csv.final
  • create new file type enum for eukcc.csv.final in nmdc-schema
  • The package task should be updated to make sure fungal bins get added to the final tarball

IMG taxon oids that can be used for testing, large assemblies but known to produce euk bins, or an existing metagenome can be spiked in with a small eukaryote for testing purposes.
3300067032
3300059473
3300059591 - this project produces just 1 euk bin Eukaryota; Ascomycota; Dothideomycetes; Pleosporales; Phaeosphaeriaceae; Parastagonospora

silent error during call-mbin_nmdc

I noticed that none of our mags workflow results generate any bins, even projects for which we know the JGI processing generated mags (ie
nmdc:wfmgan-11-585bp531.1, the JGI run of this data generated 27 medium and high quality mags, see img taxon oid 3300061644). I find errors in the call-mbin_nmdc stage 'ERROR: Models must be parsed before identifying HMM hits.'
A quick search suggests an issue with the checkM installation
Ecogenomics/CheckM#280

This issue is not caught b/c the last process of the task completes correctly so we should also set pipefail to catch these types of problems going forward.

I can find 400 records on pscratch, this likely doesn't include deleted projects and the issue dates back to at least early August 2023
/pscratch/sd/n/nmdcda/cromwell-executions/nmdc_mags> grep 'ERROR: Models must be parsed before identifying HMM hits' */call-mbin_nmdc/execution/stdout | wc -l
600

cc @mbthornton-lbl

eukCC failed on quoted node names.

/pscratch/sd/n/nmdcda/cromwell-executions/nmdc_mags/f859549b-b820-48f7-ae8d-c1e8abacd6e2/call-mbin_nmdc/execution

raise NewickError("Unexpected newick format '%s' " %subnw[0:50]) ete3.parser.newick.NewickError: Unexpected newick format ''nmdc:wfmgas-11-wpm64z83.1_scf_321_c1_2':0.423943'

Add support for passing a contig name mapping file to mbin.py

When NMDC runs workflows where the assembly was generated at JGI the binning workflow needs a mapping file to convert between the assembly fasta file and annotation files. We need to update the workflow such that there is an option to specify --map

related to #26

Files that can be used to test this:
https://data.jgi.doe.gov/search?q=1414320&expanded=IMG_AP-1414320

assembly file: assembly.contigs.fasta
mapping file: Ga0597049_contig_names_mapping.tsv

Package MAGs with annotations

This would bring the MAG output more inline with what IMG currently generates.

As part of this we need to:

  • Generate the various annotation files filtered for each MAG (scripts exist for this already)
  • Package each of these as a zip bundle per MAG
  • Update the schema to match the new output model
  • Test the changes out and validate that the output looks as expect (Emiley and IMG to verify)
  • Make an incremental update workflow to regenerate data from previous runs
  • Do the incremental processing for everything in system.
  • Re-register the new results.

review MAG output files

From Neha from summer 2023, specifically I'm not sure if we are saving metabat-bins.tar.gz which would be needed to implement the eukaryotic binning.

    output{
        #retaining the sdb is very important for loading into IMG
        File? sdb = "mbin.sdb"
        #flag file to indicate that no bins were generated
        File? nobins = "mbin.nobins"
        #flag file to indicate that pipeline ran through without error
        File? success = "mbin.success"
        #checkm results
        File? checkm = "checkm_qa.out"
        #gtdbtk results
        File? bacsum = "gtdbtk_output/gtdbtk.bac120.summary.tsv"
        File? arcsum = "gtdbtk_output/gtdbtk.ar122.summary.tsv"
        #retaining the metabat-bins folder is important for downstream Euk pipeline
        #NOTE: if the lineage SDB is provided, change below to 'filtered-metabat-bins.tar.gz'
        File? lqbins = "metabat-bins.tar.gz"
        #hq+mq bins folder
        File? hqmqbins = "hqmq-metabat-bins.tar.gz"
        #optional to retain depth file, only for reprocessing
        File? depth = "metabat.depth"
    } 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.