Code Monkey home page Code Monkey logo

camsa's People

Contributors

aganezov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

camsa's Issues

converting points to fasta

After running camsa on two scaffold files generated by a SPAdes run on two samples, I wanted to convert the merged points file to a FASTA file.

I ran the command

camsa_points2fasta.py --points ASM.out/merged/merged.camsa.points --fasta scaffolds.fasta -o merged.fasta

Traceback (most recent call last):
File "/home/hpage3/miniconda3/bin/camsa_points2fasta.py", line 17, in
from Bio.Alphabet import generic_dna
File "/home/hpage3/miniconda3/lib/python3.7/site-packages/Bio/Alphabet/init.py", line 21, in
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

Can CAMSA use multithread or MPI ?

Hi aganezov,

First of all thank you for this promising program.

I processed some assemblies and scaffolds with fasta2camsa to create points files.
But these are running in single thread and is quite slow. Until now +- 24 h..
I runned the tutorials without problems... So, apparently all its ok.

My commands .

fasta2camsa_point.py --nucmer-path /my/path/to/nucmer contigs.fasta scaff_Soap_k61.fasta scaff_Soap_k43.fasta scaff_Soap_k23.fasta scaff_Abyss_k61.fasta scaff_Abyss_k43.fasta scaff_Abyss_k23.fasta -o /test

These are several assemblies of a fish with more or less 1GB of genome size and a bit heterozygous.
Do you can recommend me some more parameters to improve the merging of the several scaffolds and to increase the speed of analyses.

Thanks in advance
Andre

Assembly points classification in reference mode

Add support for the assembly points classification with respect to the given reference.
Current classification:

  • correct
  • inter-chromosomal error
  • orientation error
  • resolution error:
    • non-conflicting resolution error
    • conflicting error
  • missing

PyPI distribution

Create a PyPI distribution of CAMSA, so the simple pip install camsa can work.

.tbl file for NCBI submission

I am hoping to use CAMSA to merge two sets of scaffolds produced by different algorithms. Upon upload, NCBI requires information about what evidence was used to join contigs around gaps. NCBI allows this information in an annotation .tbl file with assembly gaps:

100 201 assembly_gap
gap_type within scaffold
linkage_evidence align-genus
420 521 assembly_gap
gap_type within scaffold
linkage_evidence paired-ends

Since there are possibly complex cases where contigs are joined in the CAMSA assembly (using either scaffolded input or both), is it possible to extract this information automatically from the output for the final CAMSA sequences?

Thanks for the help!

Installation instructions

Add instructions on how to install CAMSA package both with pip as well as downloading the compressed course code. Information shall be added to GitHub wiki, cblab.org website and main project README file.

Reference input

Add an argument to the CAMSA, which, when given, will provide for the reference assembly location

camsa2fasta error - scaffold extremity adjacent to more than one other scaffold

Hi aganezov,

Thanks for providing CAMSA! I could successfully merge scaffolds in several assemblies already. I have however encountered the following error for three different assemblies when using the command

~/camsa-env$ camsa_points2fasta.py --points M34_camsa_points --fasta M34.fna -o M34_merged.fasta

that I could not resolve myself:

2019-01-09 12:51:12,120 - CAMSA.utils.camsa_points2fasta - INFO - Starting the converting process 2019-01-09 12:51:12,120 - CAMSA.utils.camsa_points2fasta - INFO - Reading assembly points 2019-01-09 12:51:12,122 - CAMSA.utils.camsa_points2fasta - INFO - A total of 36 assembly points was obtained 2019-01-09 12:51:12,123 - CAMSA.utils.camsa_points2fasta - ERROR - Supplied assembly contained a conflict. 2019-01-09 12:51:12,123 - CAMSA.utils.camsa_points2fasta - ERROR - Scaffold Scaffold_27 by its extremity Scaffold_27t is reported as adjacent to more than one other scaffold's extremity.

How can I solve this? Looking forward to your response!

Best,
-Elif

Error building from source from pypi

Building from source from pypi fails with the following error:

/setup.py", line 43, in <module>
        long_description=open("pypi_full.rst").read()
    FileNotFoundError: [Errno 2] No such file or directory: 'pypi_full.rst'

I think you just need to change the first line of your MANIFEST to include pypi_full.rst.
(the wheel works fine, by the way...)

Error run_camsa.py

Hi,

I am getting an error while executing run_camsa.py
Please suggest!

2020-04-12 20:48:48,149 - CAMSA.main - INFO - Starting the analysis
2020-04-12 20:48:48,149 - CAMSA.main - INFO - Processing input
2020-04-12 20:48:48,149 - CAMSA.main - INFO - Reading assembly points
2020-04-12 20:48:49,078 - CAMSA.main - INFO - Merging assembly points from different sources into a set of unique ones.
2020-04-12 20:48:49,819 - CAMSA.main - INFO - Processing assemblies' subgroups
2020-04-12 20:48:49,819 - CAMSA.main - INFO - Processing assembly points taking both order and orientation into account
2020-04-12 20:48:49,895 - CAMSA.main - INFO - Processing assembly points, taking just order into account
2020-04-12 20:48:50,768 - CAMSA.main - INFO - Computing assembly points conflicts
2020-04-12 20:48:55,288 - CAMSA.main - INFO - Obtaining a merged assembly, using maximal-matching strategy
Traceback (most recent call last):
File ".camsa-env/bin/run_camsa.py", line 252, in
min_cw=args.c_merging_cw_min)
File "camsa-env/lib/python3.7/site-packages/camsa/core/merging.py", line 131, in maximal_matching
for cc in networkx.connected_component_subgraphs(G=cover_graph, copy=True):
AttributeError: module 'networkx' has no attribute 'connected_component_subgraphs'
(END)

Thanks

Assembly points to FASTA translation

Add a util, that, given a set of assembly points and a fasta file with sequences for observed scaffolds, translates a set of assembly points into larger scaffolds.

A set of assembly points must be non semi/in-conflicting.

error in installing camsa

hi,
i was trying to install camsa but it occurs with a error

(base) hema@hema-Aspire-A715-51G:~$ pip install camsa
Collecting camsa
Using cached CAMSA-1.3-py2.py3-none-any.whl (4.5 MB)
Requirement already satisfied: six>=1.10.0 in ./anaconda3/lib/python3.11/site-packages (from camsa) (1.16.0)
Requirement already satisfied: networkx>=2.1 in ./anaconda3/lib/python3.11/site-packages (from camsa) (3.1)
Requirement already satisfied: Jinja2>=2.8 in ./anaconda3/lib/python3.11/site-packages (from camsa) (3.1.2)
Collecting enum-compat (from camsa)
Using cached enum_compat-0.0.3-py3-none-any.whl (1.3 kB)
Collecting blist>=1.3.6 (from camsa)
Using cached blist-1.3.6.tar.gz (122 kB)
Preparing metadata (setup.py) ... done
Collecting ConfigArgParse>=0.10.0 (from camsa)
Obtaining dependency information for ConfigArgParse>=0.10.0 from https://files.pythonhosted.org/packages/6f/b3/b4ac838711fd74a2b4e6f746703cf9dd2cf5462d17dac07e349234e21b97/ConfigArgParse-1.7-py3-none-any.whl.metadata
Using cached ConfigArgParse-1.7-py3-none-any.whl.metadata (23 kB)
Collecting biopython>=1.67 (from camsa)
Using cached biopython-1.81-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Collecting bg>=1.8.1 (from camsa)
Using cached bg-1.10-py3-none-any.whl (87 kB)
Requirement already satisfied: more-itertools in ./anaconda3/lib/python3.11/site-packages (from camsa) (8.12.0)
Collecting coverage (from bg>=1.8.1->camsa)
Obtaining dependency information for coverage from https://files.pythonhosted.org/packages/e8/bc/4707652867891c1da12759cc1dcdffed539da88e6fd8d32ff2d97b2b5db4/coverage-7.3.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
Using cached coverage-7.3.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.1 kB)
Requirement already satisfied: decorator in ./anaconda3/lib/python3.11/site-packages (from bg>=1.8.1->camsa) (5.1.1)
Collecting nose (from bg>=1.8.1->camsa)
Using cached nose-1.3.7-py3-none-any.whl (154 kB)
Collecting marshmallow (from bg>=1.8.1->camsa)
Obtaining dependency information for marshmallow from https://files.pythonhosted.org/packages/ed/3c/cebfdcad015240014ff08b883d1c0c427f2ba45ae8c6572851b6ef136cad/marshmallow-3.20.1-py3-none-any.whl.metadata
Using cached marshmallow-3.20.1-py3-none-any.whl.metadata (7.8 kB)
Collecting ete3 (from bg>=1.8.1->camsa)
Using cached ete3-3.1.3-py3-none-any.whl
Collecting mock (from bg>=1.8.1->camsa)
Obtaining dependency information for mock from https://files.pythonhosted.org/packages/6b/20/471f41173930550f279ccb65596a5ac19b9ac974a8d93679bcd3e0c31498/mock-5.1.0-py3-none-any.whl.metadata
Using cached mock-5.1.0-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: pytest in ./anaconda3/lib/python3.11/site-packages (from bg>=1.8.1->camsa) (7.4.0)
Requirement already satisfied: scipy in ./anaconda3/lib/python3.11/site-packages (from bg>=1.8.1->camsa) (1.10.1)
Requirement already satisfied: numpy in ./anaconda3/lib/python3.11/site-packages (from bg>=1.8.1->camsa) (1.24.3)
Requirement already satisfied: MarkupSafe>=2.0 in ./anaconda3/lib/python3.11/site-packages (from Jinja2>=2.8->camsa) (2.1.1)
Requirement already satisfied: packaging>=17.0 in ./anaconda3/lib/python3.11/site-packages (from marshmallow->bg>=1.8.1->camsa) (23.0)
Requirement already satisfied: iniconfig in ./anaconda3/lib/python3.11/site-packages (from pytest->bg>=1.8.1->camsa) (1.1.1)
Requirement already satisfied: pluggy<2.0,>=0.12 in ./anaconda3/lib/python3.11/site-packages (from pytest->bg>=1.8.1->camsa) (1.0.0)
Using cached ConfigArgParse-1.7-py3-none-any.whl (25 kB)
Using cached coverage-7.3.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (232 kB)
Using cached marshmallow-3.20.1-py3-none-any.whl (49 kB)
Using cached mock-5.1.0-py3-none-any.whl (30 kB)
Building wheels for collected packages: blist
Building wheel for blist (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [45 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-311
creating build/lib.linux-x86_64-cpython-311/blist
copying blist/_sortedlist.py -> build/lib.linux-x86_64-cpython-311/blist
copying blist/init.py -> build/lib.linux-x86_64-cpython-311/blist
copying blist/_btuple.py -> build/lib.linux-x86_64-cpython-311/blist
copying blist/_sorteddict.py -> build/lib.linux-x86_64-cpython-311/blist
running build_ext
building 'blist._blist' extension
creating build/temp.linux-x86_64-cpython-311
creating build/temp.linux-x86_64-cpython-311/blist
gcc -pthread -B /home/hema/anaconda3/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/hema/anaconda3/include -fPIC -O2 -isystem /home/hema/anaconda3/include -fPIC -DBLIST_FLOAT_RADIX_SORT=1 -I/home/hema/anaconda3/include/python3.11 -c blist/_blist.c -o build/temp.linux-x86_64-cpython-311/blist/_blist.o
blist/_blist.c: In function ‘unwrap_leaf_array’:
blist/_blist.c:4583:37: warning: implicit declaration of function ‘_PyObject_GC_IS_TRACKED’; did you mean ‘PyObject_GC_IsTracked’? [-Wimplicit-function-declaration]
4583 | if (leafs_n > 1 && !_PyObject_GC_IS_TRACKED(leafs[i]))
| ^~~~~~~~~~~~~~~~~~~~~~~
| PyObject_GC_IsTracked
blist/_blist.c: In function ‘py_blist_dealloc’:
blist/_blist.c:5789:9: warning: ‘UsingDeprecatedTrashcanMacro’ is deprecated [-Wdeprecated-declarations]
5789 | Py_TRASHCAN_SAFE_BEGIN(self)
| ^~~~~~~~~~~~~~~~~~~~~~
blist/_blist.c: In function ‘py_blist_sort’:
blist/_blist.c:6591:25: error: lvalue required as left operand of assignment
6591 | Py_TYPE(&saved) = &PyRootBList_Type;
| ^
blist/_blist.c:6592:27: error: lvalue required as left operand of assignment
6592 | Py_REFCNT(&saved) = 1;
| ^
blist/_blist.c: In function ‘init_blist_types1’:
blist/_blist.c:7378:32: error: lvalue required as left operand of assignment
7378 | Py_TYPE(&PyBList_Type) = &PyType_Type;
| ^
blist/_blist.c:7379:36: error: lvalue required as left operand of assignment
7379 | Py_TYPE(&PyRootBList_Type) = &PyType_Type;
| ^
blist/_blist.c:7380:36: error: lvalue required as left operand of assignment
7380 | Py_TYPE(&PyBListIter_Type) = &PyType_Type;
| ^
blist/_blist.c:7381:43: error: lvalue required as left operand of assignment
7381 | Py_TYPE(&PyBListReverseIter_Type) = &PyType_Type;
| ^
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for blist
Running setup.py clean for blist
Failed to build blist
ERROR: Could not build wheels for blist, which is required to install pyproject.toml-based projects

CAMSA overall description

Add CAMSA overview with a link to the paper and the most important parts taken from the paper.

  • full in wiki

CAMSA - Can't we run in parallel mode

Hi,

Is there any thread option to speed up the jobs ?

I am trying to generate the meta-assembly for fish genomes, taking a longer time.

Suggestions!

fasta2camsa_points - ERROR

Hi @aganezov,

I'm trying to merge two assemblies resulted from different settings with Supernova. Since I have fasta file I think I have to convert them in point formatted file by running fasta2camsa_points.py.

This is how I'm executing the command:
(camsa_env) [fc464@login-e-12 Pdid.Merge10xChromium]$ fasta2camsa_points.py ID1090_1_renamed_bcfrac066_pseudohap_1kb.fasta ID1090_1_new_ALL_pseudohap_1kb.fasta -o Pdid.10xMerge

And this is the Error that returns me.

(camsa_env) [fc464@login-e-12 Pdid.Merge10xChromium]$ fasta2camsa_points.py ID1090_1_renamed_bcfrac066_pseudohap_1kb.fasta ID1090_1_new_ALL_pseudohap_1kb.fasta -o Pdid.10xMerge
================================================================================
| Sergey Aganezov & Max A. Alekseyev (c)                                       |
| Computational Biology Institute, The George Washington University            |
|                                                                              |
| Converting FASTA formatted scaffolding results for further CAMSA processing. |
|                                                                              |
| For more information refer to github.com/compbiol/camsa/wiki                 |
| With any questions, please, contact Sergey Aganezov [aganezov(at)cs.jhu.edu] |
================================================================================

Command Line Args:   ID1090_1_renamed_bcfrac066_pseudohap_1kb.fasta ID1090_1_new_ALL_pseudohap_1kb.fasta -o Pdid.10xMerge
Config File (/home/fc464/software/CAMSA/camsa/logging.ini):
  c-logging-level:   20
  c-logging-formatter-entry:%(asctime)s - %(name)-15s - %(levelname)-7s - %(message)s
Config File (/home/fc464/software/CAMSA/camsa/utils/fasta/fasta2camsa_points.ini):
  c-cov-threshold:   90.0
  c-coords-pairs-strategy:mid-point-sort
  nucmer-cli-arguments:-maxmatch -c 100
  nucmer-path:       nucmer
  show-coords-cli-arguments:-r -c -l
  show-coords-path:  show-coords
  delta-filter-cli-arguments:-r -q
  delta-filter-path: delta-filter
Defaults:
  --ensure-all:      False

2018-11-28 10:04:53,346 - CAMSA.utils.fasta2camsa_points - INFO    - Starting the converting process
2018-11-28 10:04:53,352 - CAMSA.utils.fasta2camsa_points - INFO    - Working with "ID1090_1_new_ALL_pseudohap_1kb.fasta"
2018-11-28 10:04:53,352 - CAMSA.utils.fasta2camsa_points - INFO    - Running NUCmer for "ID1090_1_new_ALL_pseudohap_1kb.fasta" scaffolds file, using "ID1090_1_renamed_bcfrac066_pseudohap_1kb.fasta" as query. This might take time.
2018-11-28 10:04:53,352 - CAMSA.utils.fasta2camsa_points - INFO    - 	nucmer -maxmatch -c 100 -p /rds/project/shm37/rds-shm37-helixmbodyw/HeliconiiniProg/Pdid.Merge10xChromium/Pdid.10xMerge/fasta2camsa/ID1090_1_new_ALL_pseudohap_1kb ID1090_1_new_ALL_pseudohap_1kb.fasta ID1090_1_renamed_bcfrac066_pseudohap_1kb.fasta > /rds/project/shm37/rds-shm37-helixmbodyw/HeliconiiniProg/Pdid.Merge10xChromium/Pdid.10xMerge/fasta2camsa/logs/nucmer_ID1090_1_new_ALL_pseudohap_1kb.stdout.txt 2> /rds/project/shm37/rds-shm37-helixmbodyw/HeliconiiniProg/Pdid.Merge10xChromium/Pdid.10xMerge/fasta2camsa/logs/nucmer_ID1090_1_new_ALL_pseudohap_1kb.stderr.txt
2018-11-28 10:07:24,688 - CAMSA.utils.fasta2camsa_points - ERROR   - NUCmer exited with non-zero code, running for "ID1090_1_new_ALL_pseudohap_1kb.fasta" scaffolds file.
2018-11-28 10:07:24,688 - CAMSA.utils.fasta2camsa_points - ERROR   - NUCmer logs are stored in:
2018-11-28 10:07:24,688 - CAMSA.utils.fasta2camsa_points - ERROR   - 	stdout: "/rds/project/shm37/rds-shm37-helixmbodyw/HeliconiiniProg/Pdid.Merge10xChromium/Pdid.10xMerge/fasta2camsa/logs/nucmer_ID1090_1_new_ALL_pseudohap_1kb.stdout.txt"
2018-11-28 10:07:24,688 - CAMSA.utils.fasta2camsa_points - ERROR   - 	stderr: "/rds/project/shm37/rds-shm37-helixmbodyw/HeliconiiniProg/Pdid.Merge10xChromium/Pdid.10xMerge/fasta2camsa/logs/nucmer_ID1090_1_new_ALL_pseudohap_1kb.stderr.txt"
2018-11-28 10:07:24,690 - CAMSA.utils.fasta2camsa_points - ERROR   - Delta file for prefix="ID1090_1_new_ALL_pseudohap_1kb" was not found in the output folder.
2018-11-28 10:07:24,690 - CAMSA.utils.fasta2camsa_points - INFO    - Elapsed time: 0:02:31.344933

What am I doing wrong?
Best,
Francesco

Usage instructions

Add detailed usage instructions for CAMSA, including:

  • core
  • report

Information must be added in:

  • full to the wiki
  • sparse to cblab website and main project README

the fasta2camsa_points error

hi,
I was using camsa to assemble bacterial scafflods to a chromosome the fasta2camsa.py has ran successfully by the assembly_points are empty
(camsa) hema@hema-Aspire-A715-51G:~/camsa$ fasta2camsa_points.py assembly1.fasta assembly2.fasta assembly3.fasta assembly4.fasta assembly5.fasta assembly6.fasta assembly7.fasta assembly8.fasta assembly9.fasta assembly10.fasta assembly11.fasta assembly12.fasta assembly13.fasta assembly14.fasta assembly15.fasta assembly16.fasta assembly17.fasta assembly18.fasta assembly19.fasta assembly20.fasta assembly21.fasta assembly22.fasta assembly23.fasta assembly24.fasta assembly25.fasta assembly26.fasta assembly27.fasta assembly28.fasta assembly29.fasta assembly30.fasta assembly31.fasta assembly32.fasta assembly33.fasta assembly34.fasta assembly35.fasta assembly36.fasta assembly37.fasta assembly38.fasta assembly39.fasta assembly40 -o output

| Sergey Aganezov & Max A. Alekseyev (c) |
| Computational Biology Institute, The George Washington University |
| |
| Converting FASTA formatted scaffolding results for further CAMSA processing. |
| |
| For more information refer to github.com/compbiol/camsa/wiki |
| With any questions, please, contact Sergey Aganezov [aganezov(at)cs.jhu.edu] |

Command Line Args: assembly1.fasta assembly2.fasta assembly3.fasta assembly4.fasta assembly5.fasta assembly6.fasta assembly7.fasta assembly8.fasta assembly9.fasta assembly10.fasta assembly11.fasta assembly12.fasta assembly13.fasta assembly14.fasta assembly15.fasta assembly16.fasta assembly17.fasta assembly18.fasta assembly19.fasta assembly20.fasta assembly21.fasta assembly22.fasta assembly23.fasta assembly24.fasta assembly25.fasta assembly26.fasta assembly27.fasta assembly28.fasta assembly29.fasta assembly30.fasta assembly31.fasta assembly32.fasta assembly33.fasta assembly34.fasta assembly35.fasta assembly36.fasta assembly37.fasta assembly38.fasta assembly39.fasta assembly40 -o output
Config File (/home/hema/anaconda3/envs/camsa/lib/python3.9/site-packages/camsa/logging.ini):
c-logging-level: 20
c-logging-formatter-entry:%(asctime)s - %(name)-15s - %(levelname)-7s - %(message)s
Config File (/home/hema/anaconda3/envs/camsa/lib/python3.9/site-packages/camsa/utils/fasta/fasta2camsa_points.ini):
c-cov-threshold: 90.0
c-coords-pairs-strategy:mid-point-sort
nucmer-cli-arguments:--maxmatch -c 100
nucmer-path: nucmer
show-coords-cli-arguments:-r -c -l
show-coords-path: show-coords
delta-filter-cli-arguments:-r -q
delta-filter-path: delta-filter
Defaults:
--ensure-all: False

2023-09-20 12:32:50,137 - CAMSA.utils.fasta2camsa_points - INFO - Starting the converting process
2023-09-20 12:32:50,137 - CAMSA.utils.fasta2camsa_points - INFO - Working with "assembly2.fasta"
2023-09-20 12:32:50,137 - CAMSA.utils.fasta2camsa_points - INFO - Running NUCmer for "assembly2.fasta" scaffolds file, using "assembly1.fasta" as query. This might take time.
2023-09-20 12:32:50,137 - CAMSA.utils.fasta2camsa_points - INFO - nucmer --maxmatch -c 100 -p /home/hema/camsa/output/fasta2camsa/assembly2 assembly2.fasta assembly1.fasta > /home/hema/camsa/output/fasta2camsa/logs/nucmer_assembly2.stdout.txt 2> /home/hema/camsa/output/fasta2camsa/logs/nucmer_assembly2.stderr.txt
2023-09-20 12:32:50,199 - CAMSA.utils.fasta2camsa_points - INFO - NUCmer finished running for "{scaffolds_file}" scaffolds file.
2023-09-20 12:32:50,199 - CAMSA.utils.fasta2camsa_points - INFO - Running delta-filter util for "/home/hema/camsa/output/fasta2camsa/assembly2.delta" file.
2023-09-20 12:32:50,199 - CAMSA.utils.fasta2camsa_points - INFO - delta-filter -r -q /home/hema/camsa/output/fasta2camsa/assembly2.delta > /home/hema/camsa/output/fasta2camsa/assembly2.filtered.delta 2> /home/hema/camsa/output/fasta2camsa/logs/delta_filter_assembly2.stderr.txt
2023-09-20 12:32:50,201 -
CAMSA.utils.fasta2camsa_points - INFO - delta-filter util has finished running for "/home/hema/camsa/output/fasta2camsa/assembly2.delta"
2023-09-20 12:32:50,201 - CAMSA.utils.fasta2camsa_points - INFO - Running show-coords util for "/home/hema/camsa/output/fasta2camsa/assembly2.filtered.delta" file.
2023-09-20 12:32:50,201 - CAMSA.utils.fasta2camsa_points - INFO - show-coords -r -c -l /home/hema/camsa/output/fasta2camsa/assembly2.filtered.delta > /home/hema/camsa/output/fasta2camsa/assembly2.coords 2> /home/hema/camsa/output/fasta2camsa/logs/show_coords_assembly2.stderr.txt
2023-09-20 12:32:50,204 - CAMSA.utils.fasta2camsa_points - INFO - show-coords util has finished running for "/home/hema/camsa/output/fasta2camsa/assembly2.filtered.delta".
2023-09-20 12:32:50,204 - CAMSA.utils.fasta2camsa_points - INFO - Parsing "/home/hema/camsa/output/fasta2camsa/assembly2.coords".
2023-09-20 12:32:50,204 - CAMSA.utils.fasta2camsa_points - INFO - Writing coords data in terms of CAMSA assembly points in "/home/hema/camsa/output/assembly2.camsa.points"
2023-09-20 12:32:50,204 - CAMSA.utils.fasta2camsa_points - INFO - Finished converting data for "assembly2"
2023-09-20 12:32:50,204 - CAMSA.utils.fasta2camsa_points - INFO - Working with "assembly3.fasta"
2023-09-20 12:32:50,204 - CAMSA.utils.fasta2camsa_points - INFO - Running NUCmer for "assembly3.fasta" scaffolds file, using "assembly1.fasta" as query. This might take time.
2023-09-20 12:32:50,204 - CAMSA.utils.fasta2camsa_points - INFO - nucmer --maxmatch -c 100 -p /home/hema/camsa/output/fasta2camsa/assembly3 assembly3.fasta assembly1.fasta > /home/hema/camsa/output/fasta2camsa/logs/nucmer_assembly3.stdout.txt 2> /home/hema/camsa/output/fasta2camsa/logs/nucmer_assembly3.stderr.txt
2023-09-20 12:32:50,243 - CAMSA.utils.fasta2camsa_points - INFO - NUCmer finished running for "{scaffolds_file}" scaffolds file.
2023-09-20 12:32:50,243 - CAMSA.utils.fasta2camsa_points - INFO - Running delta-filter util for "/home/hema/camsa/output/fasta2camsa/assembly3.delta" file.
2023-09-20 12:32:50,243 - CAMSA.utils.fasta2camsa_points - INFO - delta-filter -r -q /home/hema/camsa/output/fasta2camsa/assembly3.delta > /home/hema/camsa/output/fasta2camsa/assembly3.filtered.delta 2> /home/hema/camsa/output/fasta2camsa/logs/delta_filter_assembly3.stderr.txt
2023-09-20 12:32:50,246 - CAMSA.utils.fasta2camsa_points - INFO - delta-filter util has finished running for "/home/hema/camsa/output/fasta2camsa/assembly3.delta"
2023-09-20 12:32:50,246 - CAMSA.utils.fasta2camsa_points - INFO - Running show-coords util for "/home/hema/camsa/output/fasta2camsa/assembly3.filtered.delta" file.
2023-09-20 12:32:50,246 - CAMSA.utils.fasta2camsa_points - INFO - show-coords -r -c -l /home/hema/camsa/output/fasta2camsa/assembly3.filtered.delta > /home/hema/camsa/output/fasta2camsa/assembly3.coords 2> /home/hema/camsa/output/fasta2camsa/logs/show_coords_assembly3.stderr.txt
Screenshot from 2023-09-20 12-42-47

CAMSA input file cleaner

A util tool that intakes a CAMSA formatted file(s) with assembly points and performs several checks on it, as well as changes some information with respect to possible unsupported by CAMSA situations.

Proposed stats:

Proposed signature:

camsaPrepareInput.py POINTS [POINTS ...] 
                     [--error-remove]
                     [--duplicates-{remove,compress}]
                     [--self-adjacent-remove]

where flags mean the following:

  • --error-remove - each assembly point, that is considered unsuitable for the CAMSA to process will be removed, if this flag is specified, and kept otherwise.
  • --duplicates-remove - when the same assembly point is reported more than once in the input, keep only the first appearance, removing the others.
  • --duplicate-compress - when the same assembly point is reported more than once in the input, only a single entry will be preserved, with cw being summed up all appearances
  • --self-adjacent-remove - assembly points that report adjacency(ies) between the asme scaffold will be removed, if flag is specified.

Converter from assembly points to DOT decribing SAG

Add an automatic converter, that takes as an input a set of files containing assembly points in CAMSA format and outputs a DOT formatted file, describing the SAG subgraph, obtained from the corresponding assembly points. Usage of bg package is possible, but if can be obtained simpler, shall exclude dependency additions.

Proposed signature:

camsa2dot.py PAIRS [PAIRS ...]

Error computing assembly points conflicts with run_camsa.py

I processed two assemblies and a scaffold with fasta2camsa to create points files. When I use run_camsa.py on the two points files I get the error message:

Run command:

run_camsa.py  scaffolds.all.camsa.points -o cout2

Error message:

2018-05-03 09:22:06,462 - CAMSA.main      - INFO    - Starting the analysis
2018-05-03 09:22:06,462 - CAMSA.main      - INFO    - Processing input
2018-05-03 09:22:06,505 - CAMSA.main      - INFO    - Merging assembly points from different sources into a set of unique ones.
2018-05-03 09:22:06,536 - CAMSA.main      - INFO    - Processing assemblies' subgroups
2018-05-03 09:22:06,539 - CAMSA.main      - INFO    - Computing assembly points conflicts
Traceback (most recent call last):
  File "/home/adam.rivers/anaconda3/envs/camsaenv/bin/run_camsa.py", line 135, in <module>
    compute_and_update_assembly_points_conflicts(assembly_points_by_ids=merged_assembly_points_by_ids)
  File "/home/adam.rivers/anaconda3/envs/camsaenv/lib/python3.6/site-packages/camsa/core/comparative_analysis.py", line 101, in compute_and_update_assembly_points_conflicts
    sag = construct_sag(assembly_points=assembly_points_by_ids.values())
  File "/home/adam.rivers/anaconda3/envs/camsaenv/lib/python3.6/site-packages/camsa/core/comparative_analysis.py", line 18, in construct_sag
    result.add_edge(u=u, v=v, weight=weight, ap_id=ap.self_id)
TypeError: add_edge() missing 2 required positional arguments: 'u_of_edge' and 'v_of_edge'

The fist few lines of the two points files are:

origin  seq1    seq1_or seq2    seq2_or gap_size        cw
scaffolds.all   k141_20552      -       k141_15348      +       -105    ? 
scaffolds.all   k141_35239      -       k141_88199      +       -200    ?
scaffolds.all   k141_54810      +       k141_60870      -       -181    ?
scaffolds.all   k141_60870      -       k141_50925      +       -271    ?
origin  seq1    seq1_or seq2    seq2_or gap_size        cw
Contigs_for_extension_clean     k141_79431      +       k141_39819      -       -202    ?
Contigs_for_extension_clean     k141_39819      -       k141_2749       +       -140    ?
Contigs_for_extension_clean     k141_2749       +       k141_6255       -       -128    ?
Contigs_for_extension_clean     k141_6255       -       k141_57986      -       -140    ?

I installed camsa using pip in an isolated conda environment using Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)

FASTA utils instructions

Instructions on how to use CAMSA utils with respect to FASTA files shall be added in:

  • full to wiki
  • sparse to the main project README file
  • as a mention to the the cblab website

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.