backofenlab / crispridentify Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 6.0 70.34 MB

License: MIT License

Python 100.00%

crispridentify's People

Contributors

Stargazers

Watchers

Forkers

liupfskygre healthvivo hezuogongying niccw omarcabrero senaj

crispridentify's Issues

Cas gene predict

Hi，
I wonder if the option “--cas True” is valid on the premise that the CrisprcasIdentifier is installed?
Regards

why porvide precompiled binaries while using conda ?

Hello

you provide prebuilt binaries for

blast
fasta
hmmer
prodigal
clustalo
rnafold

those tools are available via conda, why don't you embed it in environnemt.yml and use it ?

NB among the birnaires you provide some are staticaly linked, some not. if you whis to provide binaries, please try at least to provide staticaly linked ones.
egg on stock ubuntu-20.04 docker blastn fail due to missing libidn

Singularity> ldd /opt/CRISPRidentify/tools/blasting/blastn 
	linux-vdso.so.1 (0x00007ffc78f42000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd60ee73000)
	libbz2.so.1 => /lib/x86_64-linux-gnu/libbz2.so.1 (0x00007fd60ee60000)
	libidn.so.11 => not found
	libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00007fd60ee43000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd60ee38000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd60ece7000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd60ecc4000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd60ead2000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd60eab7000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd60ee95000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd60eab1000)

regards

Eric

Conda installation

Hi,
I'm failing in installing the environment, even when removing all builds from the environment.
Do you have any idea what I could try?

CRISPRidentify.py must be run from CRISPRidentify directory only

Hello.

if ones try to run CRISPRidentify.py it must be in the same directory than CRISPRidentify.py

egg

No input was provided
Elapsed time:  2.193450927734375e-05
(crispr_identify_env) > cd ../
(crispr_identify_env) > python3.7 foo/CRISPRidentify.py  
Traceback (most recent call last):
  File "CRISPRidentify/CRISPRidentify.py", line 13, in <module>
    from pipeline import Pipeline
ModuleNotFoundError: No module named 'pipeline'

this is due to the way PYTHONPATH is handled
following patch allows to run from anywhere

--- CRISPRidentify.py.ori   2021-06-21 17:41:51.922067095 +0000
+++ CRISPRidentify.py   2021-06-21 17:46:53.788085259 +0000
@@ -10,7 +10,8 @@
 import subprocess
 import warnings
 warnings.filterwarnings("ignore")
-sys.path.insert(0, 'components/')
+dir_path = os.path.dirname(os.path.realpath(__file__))
+sys.path.insert(0, os.path.join(dir_path, 'components/'))
 from pipeline import Pipeline
 from components_ml import ClassifierWrapper
 import shutil

regards

Eric

License Terms?

Hi folks! Thanks for putting together this excellent repo. Have you picked a license to release this code under? I checked the repo and I'm not finding a license file with terms anywhere.

Output options

Dear all,
Is it possible to add an output option to generate only "Complete_summary" files? I tried to use CRISPRIdentify for assemblies containing multiple contigs and there were hundreds of folders generated with empty files.
Best wishes,
Sofia

For More than 42 lines

Hi,
Thanks for this good tool. I am noticing that if the fasta file has more than 42 lines, the tool cannot give output? Is that true?
Best,
Xichuan

please tag new relesae

Hello,

master and last tagged version differs a lot.
can you please tag a new release.
it helps package maintainer to have tagged version

regards

Eric

Single FASTA file TypeError

When executing a single fasta file using the arguments below the following occurs:

python CRISPRidentify.py --file TestInput/NC_019693.fa --json_report output.json      
Traceback (most recent call last):
  File "CRISPRidentify.py", line 265, in <module>
    run_over_one_file(complete_path_file, folder_result, pickle_folder, json_folder)
TypeError: run_over_one_file() takes 3 positional arguments but 4 were given

It seems that the function run_over_one_file() is incorrectly called from another function and is providing too many variables?

No CRISPRidentify.py command

Hi,

Thanks for providing such a powerful tool!
I installed it according to the suggested command "conda env create -f environment.yml", and finished successfully. However, I can not execute "python CRISPRidentify.py" after activating the environment. Indeed, I can not find this command "CRISPRidentify.py" in all the directories either. Could you kindly help to address this problem?
Thank you very much in advance.

Best,
Ling-Dong

JSON output formatting

When using the JSON option the output format seems to be scrambled in some way.

For example:

"4265926   ......................  GGG                                                      s:0 i:0 d:0\n4265951   .........C..T.-T..--C.  CCCTTCCCTAAGAGGGAAGGGGGCTGGGGGGTTAGGTCTCTTTTTCAAACA      s:4 i:0 d:3\n4266021   ...........A..........                                                           s:1 i:0 d:0\n____________________________________________________________________________________________________\n          GGAATAAATATCGTTGCTGTAC

Is if I read correctly an entry from the results folder but it is just one long string and all sub elements are glued together. Is this how it supposed to be?

Summary

Hi,

First of all, thanks for this pipeline!
I used it on a multifasta file and I got a result folder for each contigs but I did not have the summary files.
Would be great to generate a summary of the results of all contigs (in 1 file) that would contain DR, spacer, start, end, type etc... Do you think it would be possible?

Cheers,
Nico

allow CRISPRidentify to be run from anywhere :: not just archive dir

Hello,

I was trying to install CRISPRidentify using standard unix conventions
and allow our users to ru it from anywhere.

tools and dspecific data tools path are broken as theyt are hard coded path relative to the archive tree. egg
tools/vmatch/vmatch tool hardcoded
tools/prodigal/prodigal tool hardcoded
tools/hmm_search/hmmsearch --tblout result_hmm.out tools/hmm_search/models_tandem.hmm protein_results.fa hmme and model hardcoded
tools/hmm_search/models_tandem.hmm data hardcoded

tools and data shoul be searched relative to the install directory.

regards

Eric

CRISPRidentify on conda

Are you planning on releasing CRISPRIdentify on conda?
It would make it much easier to include in other tools.

Cheers,
Russel

TypeError: 'method' object is not subscriptable

Run initial array detection
Refine detected arrays
Evaluate candidates
Traceback (most recent call last):
File "CRISPRidentify.py", line 249, in
run_over_one_file(complete_path_file, folder_result, pickle_folder)
File "CRISPRidentify.py", line 210, in run_over_one_file
flag_dev_mode=FLAG_DEVELOPER_MODE)
File "components/pipeline.py", line 30, in init
self._run_evaluation()
File "components/pipeline.py", line 60, in _run_evaluation
flag_dev_mode=self.flag_dev_mode)
File "components/module_evaluation.py", line 27, in init
self._extract_features_and_evaluate()
File "components/module_evaluation.py", line 65, in _extract_features_and_eval uate
feature_vector = FeatureExtractor(0, crispr_candidate, list_features).extrac t()[0]
File "components/components_evaluation.py", line 957, in extract
self.list_spacers).output()
File "components/components_evaluation.py", line 782, in init
self._compute_similarity_repeats_spacers()
File "components/components_evaluation.py", line 824, in _compute_similarity_r epeats_spacers
self.similarity_score_repeats = self._compute_similarity_repeats()
File "components/components_evaluation.py", line 797, in _compute_similarity_r epeats
x = kernel_matrix(graphs, r=3, d=4)
File "components/components_eden.py", line 158, in kernel_matrix
data_matrix = vectorize(graphs, **opts)
File "components/components_eden.py", line 136, in vectorize
return Vectorizer(**opts).transform(graphs)
File "components/components_eden.py", line 371, in transform
feature_rows.append(self._transform(graph))
File "components/components_eden.py", line 459, in _transform
graph = self._graph_preprocessing(original_graph)
File "components/components_eden.py", line 447, in _graph_preprocessing
graph = _edge_to_vertex_transform(original_graph)
File "components/components_eden.py", line 1005, in _edge_to_vertex_transform
graph.nodes[n]['node'] = True
TypeError: 'method' object is not subscriptable

environment.yaml without build

I am trying to build the environment on an OS system. I am unable to install the dependencies in the yaml file and I believe this is due to the platform specific build constraints on the dependencies. Can you share a environment.yaml without the constraints?

https://stackoverflow.com/questions/55554431/conda-fails-to-create-environment-from-yml

mkvtree: Illegal character '>' in file

Hi
I tried to analysis the CRISPR array of a 6 complete E.coli genome.
4 working , but 2 showed mkvtree /vmatch errors.
Seems it is related to number of contigs in the file.

1/GCF_000005845.2_ASM584v2_genomic.ID.fasta -works fine
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000005845.2/
2/GCF_000007445.1_ASM744v1_genomic.ID.fasta -works fine
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000007445.1/
3/GCF_000008865.2_ASM886v2_genomic.ID.fasta - not working
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000008865.2/
4/GCF_000009565.1_ASM956v1_genomic.ID.fasta -works fine
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000009565.1/
5/GCF_000010245.2_ASM1024v1_genomic.ID.fasta -works fine
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000010245.2/
6/GCF_000010385.1_ASM1038v1_genomic.ID.fasta -not working
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000010385.1/

                            Executing file 1 out of 6 (GCF_000005845.2_ASM584v2_genomic.fna)

Run initial array detection
Refine detected arrays
Evaluate candidates
Enhance evaluated arrays
Complement arrays with additional info

Write down the results

                         Executing file 2 out of 6(GCF_000007445.1_ASM744v1_genomic.fna

Run initial array detection
Refine detected arrays
Evaluate candidates
Enhance evaluated arrays
Complement arrays with additional info

Write down the results

                         Executing file 3 out of 6(GCF_000008865.2_ASM886v2_genomic.fna)

Run initial array detection
mkvtree: Illegal character '>' in file "new_input.fa" line 2
vmatch: cannot open file "new_input.fa.prj": No such file or directory
Refine detected arrays
Evaluate candidates
Enhance evaluated arrays
Complement arrays with additional info

Write down the results

                         Executing file 4 out of 6 (GCF_000009565.1_ASM956v1_genomic.fna)

Run initial array detection
Refine detected arrays
Evaluate candidates
Enhance evaluated arrays
Complement arrays with additional info

Write down the results

                         Executing file 5 out of 6 (GCF_000010245.2_ASM1024v1_genomic.fna)

Run initial array detection
Refine detected arrays
Evaluate candidates
Enhance evaluated arrays
Complement arrays with additional info

Write down the results

                         Executing file 6 out of 6 (GCF_000010385.1_ASM1038v1_genomic.fna)

Run initial array detection
mkvtree: Illegal character '>' in file "new_input.fa" line 2
vmatch: cannot open file "new_input.fa.prj": No such file or directory
Refine detected arrays
Evaluate candidates
Enhance evaluated arrays
Complement arrays with additional info
Write down the results

Thank you
G

backofenlab / crispridentify Goto Github PK

crispridentify's People

Contributors

Stargazers

Watchers

Forkers

crispridentify's Issues

Recommend Projects

Recommend Topics

Recommend Org