Code Monkey home page Code Monkey logo

macrel's People

Contributors

celiosantosjr avatar dependabot[bot] avatar hiramhe avatar luispedro avatar psj1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

macrel's Issues

input file: macrel.out.all_orfs.faa vs macrel.out.smorfs.faa

Hi Luis,

I ran macrel contigs but it got interrupted because of the peptides issue (now solved).

That means I have (I think) the input file ready to feed to macrel peptides without having to run starting from the contigs again.

Now, as macrel contigs produced TWO output files being:
macrel.out.all_orfs.faa
macrel.out.smorfs.faa

Should macrel peptides be run on one, or the other, or both?

Example:

macrel peptides \
    --fasta macrel.out.all_orfs.faa \    <- or : macrel.out.smorfs.faa or macrel.out.*.faa   ???
    --output out_peptides \
    -t 8

Thank you!
Dany

paladin issue while running macrel abundance

Hello im currently trying to run

macrel abundance -1 may10.fq.gz --fasta macrelabundancepeptides.faa --output out_abundancemay15latest --force

However I am receiving this error

[main] Version: 1.3.1
[main] CMD: paladin index -r3 /tmp/tmpdrzjismz/paladin.faa
[main] Real time: 0.051 sec; CPU: 0.009 sec
align: invalid option -- 'z'
Output folder already exists, but --force flag was usedTraceback (most recent call last):
  File "/home/user/.conda/envs/macrelabundancemay15/bin/macrel", line 10, in <module>
    sys.exit(main())
  File "/home/user/.conda/envs/macrelabundancemay15/lib/python3.9/site-packages/macrel/main.py", line 340, in main
    do_abundance(args, tdir,logfile)
  File "/home/user/.conda/envs/macrelabundancemay15/lib/python3.9/site-packages/macrel/main.py", line 195, in do_abundance
    subprocess.check_call([
  File "/home/user/.conda/envs/macrelabundancemay15/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['paladin', 'align', '-t', '1', '-T', '20', '-f', '10', '-z', '11', '-a', '-V', '-M', '/tmp/tmpdrzjismz/paladin.faa', '/tmp/tmpdrzjismz/preproc.fq.gz']' returned non-zero exit status 1.

the reads file is 14gb.

Looking forward for your help and feedback

clustering macrel's output

Hi Luis,

I am using cd-hit to cluster Macrel's output. Do you have better ideas?
Out of 70k predicted AMPs, only about half of them get clustered. Can you advice on tools to use to cluster Macrel's output?

This is my current use of cd-hit on Macrel's output:

cd-hit -i macrel.out.prediction.all.fasta -o /.../.../cd_hit_onMacrel98 -c 0.98 -n 5 -d 0 -M 30000 -T 10

Thank you
Dany

running problem

Hi, I have test it with my own data. The warining message was listed as follows"/opt/conda/lib/python3.10/site-packages/sklearn/base.py:299: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.22.1 when using version 1.2.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn("
would you like to help me fix it? thanks a lot

tmp file does not exist error

I encountered an error while running the abundance subcommand:
macrel abundance -1 SRR16178793_1.fastq.gz -2 SRR16178793_2.fastq.gz --fasta peptide.fasta --output ./output/SRR16178793_abun --tag ./outtag/SRR16178793_outtag -t 16

the error is that:
......
[M::mem_process_seqs] Processed 3299460 protein sequences in 743.597 CPU sec, 47.076 real sec
[M::process] Read 714582 protein sequences (34476126 AA)...
[M::mem_process_seqs] Processed 3298914 protein sequences in 688.743 CPU sec, 43.647 real sec
[M::mem_process_seqs] Processed 714582 protein sequences in 146.474 CPU sec, 9.278 real sec
[M::renderNumberAligned] Aligned 34710242 out of 51321549 total detected ORF sequences (67.63%)
[main] Version: 1.4.6
[main] CMD: paladin align -t 16 -T 20 -f 10 -z 11 -a -V -M /tmp/tmpdcdn385h/paladin.faa /tmp/tmpdcdn385h/preproc.pair.1.fq.gz
[main] Real time: 8120.267 sec; CPU: 84019.436 sec
NGLess v1.5.0 (C) NGLess authors
https://ngless.embl.de/

When publishing results from this script, please cite the following references:

     - Coelho, L.P., Alves, R., Monteiro, P., Huerta-Cepas, J., Freitas, A.T., and Bork, P.,
     NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. in
     Microbiome 7:84 (2019). DOI: https://doi.org/10.1186/s40168-019-0684-8

[Mon 29-04-2024 21:57] Line 9: /tmp/counts.paladin154619-0.txt: renameFile:renamePath:rename: does not exist (No such file or directory)
Exiting after fatal error:
/tmp/counts.paladin154619-0.txt: renameFile:renamePath:rename: does not exist (No such file or directory)

Traceback (most recent call last):
File "/path/to/bin/macrel", line 10, in
sys.exit(main())
File "/path/to/macrel/main.py", line 371, in main
do_abundance(args, tdir,logfile)
File "/path/to/macrel/main.py", line 222, in do_abundance
subprocess.check_call([
File "/path/to/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ngless', '--no-create-report', '--quiet', '-j', '16', '/path/to/scripts/count.ngl', '/tmp/tmpdcdn385h/paladin.out.sam', './output/SRR16178793/ ./outtag/.abundance.txt']' returned non-zero exit status 1.

Could you please help to see what caused the error? thanks a lot.

conda install -c bioconda macrel not working

hello, the conda install command doesn't work and it gives the following message. could I ask for help?


Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • macrel

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

Contigs mode is not returning AMPs

It seems that the methionine excision, when N-terminal is not working when macrel is called by this module. This can be affecting the prediction. Also, the option --keep-negatives seems to be default when calling this mode.

Consider using `pyrodigal` for small ORFs detection

Hi!

I normally don't like making this kind of advertisement out of the blue, but this could be of interest to you: in the last two years I developed Python bindings for Prodigal in a package named pyrodigal, and I just released a new version that supports setting a custom minimum gene length. Maybe this could save you the trouble of having to compile and maintain a customized Prodigal fork just for this feature.

Cheers!

macrel peptides - version problems

Hi Luis,

I am having a problem with running macrel peptides.
macrel contigs it's running fine, but I am guessing it's because it didn't get to the peptide part yet.

Steps I ran:

conda create -n macrel_env
conda activate macrel_env
conda install -c bioconda macrel
macrel get-examples
macrel peptides \
    --fasta example_seqs/expep.faa.gz \
    --output out_peptides \
    -t 4

error message of the last command being:

rpy2.rinterface_lib.embedded.RRuntimeError: Error: package or namespace load failed for ‘Peptides’:
 package ‘Peptides’ was installed before R 4.0.0: please re-install it

I tried a few re-install attampts, none of which worked. Hope you can suggest a solution.

Thank you,
Dany

No prediction output when using macrel reads

I have been testing out the functions of macrel with some of my own data. I used the contigs subcommand without issues and produce all expected files. I used some metagenomic data to run the reads subcommand and have been not receiving the AMP prediction output from any of the sequences I have tested.

From this I decided to use the test files that you provide to run the reads subcommand to see if I can reproduce all of the correct files. However I am also not receiving the expected macrel.out.prediction.gz file that is associated with the test reads that you provide.

Here is the code that I am using to produce the data:
macrel reads -1 ./test_reads_data/R1.fq.gz -2 ./test_reads_data/R2.fq.gz --output expected_output

From this code I am able to receive macrel.out.all_orfs.faa and and macrel.out.smorfs.faa but I never produce the macrel.out.prediction.gz file with the reads subcommand for any input data I have used.

UserWarning: Trying to unpickle estimator DecisionTreeClassifier

Hi Luis,

Just wanted to know if this is something I should worry about.

I am using the conda installed macrel, using a conda environment.

Warning message:
In options(stringsAsFactors = TRUE) :
  'options(stringsAsFactors = TRUE)' is deprecated and will be disabled
/shared/homes/12705859/miniconda3/envs/macrel_env/lib/python3.6/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.22.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)

The prediction input file problem

Hi, I have tested the gut 16s amplicon of a lot of host such as duck, blacksoldierfly,silkworm, housefly, et .al and some enviromental metageomes sequnced on illumina by paire ends. Unfortunately there is no result in prediction gz files. I also tested with the example data. The result is normal. I don't know what's the problem

AttributeError: module 'pyrodigal' has no attribute 'OrfFinder'

I encountered an error while running the contigs subcommand:

macrel contigs \
    --fasta example_seqs/excontigs.fna.gz \
    --output out_contigs

The error is as follows.

Traceback (most recent call last):
  File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/bin/macrel", line 10, in <module>
    sys.exit(main())
  File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/lib/python3.10/site-packages/macrel/main.py", line 331, in main
    do_smorfs(args, tdir,logfile)
  File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/lib/python3.10/site-packages/macrel/main.py", line 146, in do_smorfs
    predict_genes(args.fasta_file, all_peptide_file)
  File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/lib/python3.10/site-packages/macrel/ORFs_prediction.py", line 34, in predict_genes
    gorf, morf_finder = create_pyrodigal_orffinder()
  File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/lib/python3.10/site-packages/macrel/ORFs_prediction.py", line 4, in create_pyrodigal_orffinder
    gorf = pyrodigal.OrfFinder(closed=True,
AttributeError: module 'pyrodigal' has no attribute 'OrfFinder'

How can I solve this problem?

Run terminates with error while on conda

I get this error during the middle of running macrel on conda:

Traceback (most recent call last):
File "/user/conda_macrel_env/bin/macrel", line 11, in
sys.exit(main())
File "/user/conda_macrel_env/lib/python3.7/site-packages/macrel/main.py", line 282, in main
do_predict(args, tdir)
File "/user/conda_macrel_env/lib/python3.7/site-packages/macrel/main.py", line 241, in do_predict
fs)
File "/user/conda_macrel_env/lib/python3.7/site-packages/macrel/AMP_predict.py", line 7, in predict
model1 = pickle.load(gzip.open(model1, 'rb'))
ModuleNotFoundError: No module named 'sklearn'

The commandline that I used after activating the conda macrel environment is this:
macrel contigs --fasta test_bacterium_genome.fasta --output test_contigs

I would appreciate your feedback here.

Retrain Macrel model?

Is it possible to retrain the model used in Macrel with new training data?

I'm trying to optimize specifically for shorter peptides (< 50 aa), but the training data used in the Macrel paper (downloaded from the original Bhadra 2018 paper) has a lot of much longer peptides. I found that retraining the model in amPEPpy with only shorter peptides improved accuracy on my data specifically, I was hoping to try the same with Macrel.

Thanks,
Carter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.