Code Monkey home page Code Monkey logo

crispridentify's People

Contributors

alexander-mitrofanov avatar amri2k avatar mefisto57 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

crispridentify's Issues

Cas gene predict

Hi,
I wonder if the option “--cas True” is valid on the premise that the CrisprcasIdentifier is installed?
Regards

why porvide precompiled binaries while using conda ?

Hello

you provide prebuilt binaries for

blast
fasta
hmmer
prodigal
clustalo
rnafold

those tools are available via conda, why don't you embed it in environnemt.yml and use it ?

NB among the birnaires you provide some are staticaly linked, some not. if you whis to provide binaries, please try at least to provide staticaly linked ones.
egg on stock ubuntu-20.04 docker blastn fail due to missing libidn

Singularity> ldd /opt/CRISPRidentify/tools/blasting/blastn 
	linux-vdso.so.1 (0x00007ffc78f42000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd60ee73000)
	libbz2.so.1 => /lib/x86_64-linux-gnu/libbz2.so.1 (0x00007fd60ee60000)
	libidn.so.11 => not found
	libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00007fd60ee43000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd60ee38000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd60ece7000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd60ecc4000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd60ead2000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd60eab7000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd60ee95000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd60eab1000)

regards

Eric

Conda installation

Hi,
I'm failing in installing the environment, even when removing all builds from the environment.
Do you have any idea what I could try?

CRISPRidentify.py must be run from CRISPRidentify directory only

Hello.

if ones try to run CRISPRidentify.py it must be in the same directory than CRISPRidentify.py

egg

No input was provided
Elapsed time:  2.193450927734375e-05
(crispr_identify_env) > cd ../
(crispr_identify_env) > python3.7 foo/CRISPRidentify.py  
Traceback (most recent call last):
  File "CRISPRidentify/CRISPRidentify.py", line 13, in <module>
    from pipeline import Pipeline
ModuleNotFoundError: No module named 'pipeline'

this is due to the way PYTHONPATH is handled
following patch allows to run from anywhere

--- CRISPRidentify.py.ori   2021-06-21 17:41:51.922067095 +0000
+++ CRISPRidentify.py   2021-06-21 17:46:53.788085259 +0000
@@ -10,7 +10,8 @@
 import subprocess
 import warnings
 warnings.filterwarnings("ignore")
-sys.path.insert(0, 'components/')
+dir_path = os.path.dirname(os.path.realpath(__file__))
+sys.path.insert(0, os.path.join(dir_path, 'components/'))
 from pipeline import Pipeline
 from components_ml import ClassifierWrapper
 import shutil

regards

Eric

License Terms?

Hi folks! Thanks for putting together this excellent repo. Have you picked a license to release this code under? I checked the repo and I'm not finding a license file with terms anywhere.

Output options

Dear all,
Is it possible to add an output option to generate only "Complete_summary" files? I tried to use CRISPRIdentify for assemblies containing multiple contigs and there were hundreds of folders generated with empty files.
Best wishes,
Sofia

For More than 42 lines

Hi,
Thanks for this good tool. I am noticing that if the fasta file has more than 42 lines, the tool cannot give output? Is that true?
Best,
Xichuan

please tag new relesae

Hello,

master and last tagged version differs a lot.
can you please tag a new release.
it helps package maintainer to have tagged version

regards

Eric

Single FASTA file TypeError

When executing a single fasta file using the arguments below the following occurs:

python CRISPRidentify.py --file TestInput/NC_019693.fa --json_report output.json      
Traceback (most recent call last):
  File "CRISPRidentify.py", line 265, in <module>
    run_over_one_file(complete_path_file, folder_result, pickle_folder, json_folder)
TypeError: run_over_one_file() takes 3 positional arguments but 4 were given

It seems that the function run_over_one_file() is incorrectly called from another function and is providing too many variables?

No CRISPRidentify.py command

Hi,

Thanks for providing such a powerful tool!
I installed it according to the suggested command "conda env create -f environment.yml", and finished successfully. However, I can not execute "python CRISPRidentify.py" after activating the environment. Indeed, I can not find this command "CRISPRidentify.py" in all the directories either. Could you kindly help to address this problem?
Thank you very much in advance.

Best,
Ling-Dong

JSON output formatting

When using the JSON option the output format seems to be scrambled in some way.

For example:

"4265926   ......................  GGG                                                      s:0 i:0 d:0\n4265951   .........C..T.-T..--C.  CCCTTCCCTAAGAGGGAAGGGGGCTGGGGGGTTAGGTCTCTTTTTCAAACA      s:4 i:0 d:3\n4266021   ...........A..........                                                           s:1 i:0 d:0\n____________________________________________________________________________________________________\n          GGAATAAATATCGTTGCTGTAC 

Is if I read correctly an entry from the results folder but it is just one long string and all sub elements are glued together. Is this how it supposed to be?

Summary

Hi,

First of all, thanks for this pipeline!
I used it on a multifasta file and I got a result folder for each contigs but I did not have the summary files.
Would be great to generate a summary of the results of all contigs (in 1 file) that would contain DR, spacer, start, end, type etc... Do you think it would be possible?

Cheers,
Nico

allow CRISPRidentify to be run from anywhere :: not just archive dir

Hello,

I was trying to install CRISPRidentify using standard unix conventions
and allow our users to ru it from anywhere.

tools and dspecific data tools path are broken as theyt are hard coded path relative to the archive tree. egg
tools/vmatch/vmatch tool hardcoded
tools/prodigal/prodigal tool hardcoded
tools/hmm_search/hmmsearch --tblout result_hmm.out tools/hmm_search/models_tandem.hmm protein_results.fa hmme and model hardcoded
tools/hmm_search/models_tandem.hmm data hardcoded

tools and data shoul be searched relative to the install directory.

regards

Eric

CRISPRidentify on conda

Hi

Are you planning on releasing CRISPRIdentify on conda?
It would make it much easier to include in other tools.

Cheers,
Russel

TypeError: 'method' object is not subscriptable

  1. Run initial array detection
  2. Refine detected arrays
  3. Evaluate candidates
    Traceback (most recent call last):
    File "CRISPRidentify.py", line 249, in
    run_over_one_file(complete_path_file, folder_result, pickle_folder)
    File "CRISPRidentify.py", line 210, in run_over_one_file
    flag_dev_mode=FLAG_DEVELOPER_MODE)
    File "components/pipeline.py", line 30, in init
    self._run_evaluation()
    File "components/pipeline.py", line 60, in _run_evaluation
    flag_dev_mode=self.flag_dev_mode)
    File "components/module_evaluation.py", line 27, in init
    self._extract_features_and_evaluate()
    File "components/module_evaluation.py", line 65, in _extract_features_and_eval uate
    feature_vector = FeatureExtractor(0, crispr_candidate, list_features).extrac t()[0]
    File "components/components_evaluation.py", line 957, in extract
    self.list_spacers).output()
    File "components/components_evaluation.py", line 782, in init
    self._compute_similarity_repeats_spacers()
    File "components/components_evaluation.py", line 824, in _compute_similarity_r epeats_spacers
    self.similarity_score_repeats = self._compute_similarity_repeats()
    File "components/components_evaluation.py", line 797, in _compute_similarity_r epeats
    x = kernel_matrix(graphs, r=3, d=4)
    File "components/components_eden.py", line 158, in kernel_matrix
    data_matrix = vectorize(graphs, **opts)
    File "components/components_eden.py", line 136, in vectorize
    return Vectorizer(**opts).transform(graphs)
    File "components/components_eden.py", line 371, in transform
    feature_rows.append(self._transform(graph))
    File "components/components_eden.py", line 459, in _transform
    graph = self._graph_preprocessing(original_graph)
    File "components/components_eden.py", line 447, in _graph_preprocessing
    graph = _edge_to_vertex_transform(original_graph)
    File "components/components_eden.py", line 1005, in _edge_to_vertex_transform
    graph.nodes[n]['node'] = True
    TypeError: 'method' object is not subscriptable

mkvtree: Illegal character '>' in file

Hi
I tried to analysis the CRISPR array of a 6 complete E.coli genome.
4 working , but 2 showed mkvtree /vmatch errors.
Seems it is related to number of contigs in the file.

1/GCF_000005845.2_ASM584v2_genomic.ID.fasta -works fine
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000005845.2/
2/GCF_000007445.1_ASM744v1_genomic.ID.fasta -works fine
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000007445.1/
3/GCF_000008865.2_ASM886v2_genomic.ID.fasta - not working
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000008865.2/
4/GCF_000009565.1_ASM956v1_genomic.ID.fasta -works fine
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000009565.1/
5/GCF_000010245.2_ASM1024v1_genomic.ID.fasta -works fine
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000010245.2/
6/GCF_000010385.1_ASM1038v1_genomic.ID.fasta -not working
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000010385.1/

                            Executing file 1 out of 6 (GCF_000005845.2_ASM584v2_genomic.fna)
  1. Run initial array detection

  2. Refine detected arrays

  3. Evaluate candidates

  4. Enhance evaluated arrays

  5. Complement arrays with additional info

  6. Write down the results

                             Executing file 2 out of 6(GCF_000007445.1_ASM744v1_genomic.fna
    
  7. Run initial array detection

  8. Refine detected arrays

  9. Evaluate candidates

  10. Enhance evaluated arrays

  11. Complement arrays with additional info

  12. Write down the results

                             Executing file 3 out of 6(GCF_000008865.2_ASM886v2_genomic.fna)
    
  13. Run initial array detection
    mkvtree: Illegal character '>' in file "new_input.fa" line 2
    vmatch: cannot open file "new_input.fa.prj": No such file or directory

  14. Refine detected arrays

  15. Evaluate candidates

  16. Enhance evaluated arrays

  17. Complement arrays with additional info

  18. Write down the results

                             Executing file 4 out of 6 (GCF_000009565.1_ASM956v1_genomic.fna)
    
  19. Run initial array detection

  20. Refine detected arrays

  21. Evaluate candidates

  22. Enhance evaluated arrays

  23. Complement arrays with additional info

  24. Write down the results

                             Executing file 5 out of 6 (GCF_000010245.2_ASM1024v1_genomic.fna)
    
  25. Run initial array detection

  26. Refine detected arrays

  27. Evaluate candidates

  28. Enhance evaluated arrays

  29. Complement arrays with additional info

  30. Write down the results

                             Executing file 6 out of 6 (GCF_000010385.1_ASM1038v1_genomic.fna)
    
  31. Run initial array detection
    mkvtree: Illegal character '>' in file "new_input.fa" line 2
    vmatch: cannot open file "new_input.fa.prj": No such file or directory

  32. Refine detected arrays

  33. Evaluate candidates

  34. Enhance evaluated arrays

  35. Complement arrays with additional info

  36. Write down the results

Thank you
G

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.