Code Monkey home page Code Monkey logo

Comments (11)

mnshgl0110 avatar mnshgl0110 commented on August 13, 2024

Hi Fuyou,
There is some information describing the output file format here: https://schneebergerlab.github.io/syri/fileformat.html
In case you have not read it, then I would suggest to give it a look. Please feel free to ask more questions, if any.
Best
Manish

from syri.

sunnycqcn avatar sunnycqcn commented on August 13, 2024

Hello Manish,
I am much appreciated for your replying so fast. I have read the information. I also read your paper about syri. If I want to make a figure, which will I should do?
By the way, I met a other problem about with different scaffold-number. I set the parameter --no-chrmatch. However, I still met this issue. But I get good results with the same scaffold genome.
syri - WARNING - starting
Reading Coords - WARNING - Chromosomes IDs do not match.
Reading Coords - ERROR - Unequal number of chromosomes in the genomes. Exiting
syri - WARNING - starting
Reading Coords - WARNING - Chromosomes IDs do not match.
Reading Coords - WARNING - --no-chrmatch is set. Not matching chromosomes automatically.
Reading Coords - WARNING - UNSE01000006.1, UNSE01000014.1, UNSE01000020.1, UNSE01000002.1, UNSE01000021.1, UNSE01000023.1, UNSE01000007.1, UNSE01000010.1, UNSE01000025.1, UNSE01000012.1, UNSE01000008.1, UNSE01000019.1, UNSE01000026.1, UNSE01000016.1, UNSE01000032.1, UNSE01000009.1, UNSE01000004.1, UNSE01000003.1, UNSE01000018.1, UNSE01000001.1, UNSE01000011.1, UNSE01000013.1, UNSE01000005.1, UNSE01000024.1, UNSE01000015.1, UNSE01000017.1, UNSE01000022.1, UNSE01000033.1, UNSE01000031.1, UNSE01000028.1, scaffold00012, scaffold00011, scaffold00008, scaffold00001, scaffold00015, scaffold00016, scaffold00020, scaffold00005, scaffold00006, scaffold00004, scaffold00002, scaffold00014, scaffold00009, scaffold00013, scaffold00007, scaffold00018, scaffold00017, scaffold00010, scaffold00019, scaffold00003 present in only one genome. Removing corresponding alignments
Traceback (most recent call last):
File "/isilon/saskatoon-rdc/users/fuf/comDIR/syri/syri/bin/syri", line 128, in
chrlink = startSyri(args)
File "syri/pyxFiles/synsearchFunctions.pyx", line 319, in syri.pyxFiles.synsearchFunctions.startSyri
File "syri/pyxFiles/synsearchFunctions.pyx", line 728, in syri.pyxFiles.synsearchFunctions.outSyn
File "/home/AAFC-AAC/fuf/miniconda3/envs/genome/lib/python3.5/site-packages/pandas/core/generic.py", line 4389, in setattr
return object.setattr(self, name, value)
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set
File "/home/AAFC-AAC/fuf/miniconda3/envs/genome/lib/python3.5/site-packages/pandas/core/generic.py", line 646, in _set_axis
self._data.set_axis(axis, labels)
File "/home/AAFC-AAC/fuf/miniconda3/envs/genome/lib/python3.5/site-packages/pandas/core/internals.py", line 3323, in set_axis
'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 0 elements, new values have 7 elements

Thanks,
Fuyou

from syri.

mnshgl0110 avatar mnshgl0110 commented on August 13, 2024

Hi Fuyou,

Could you please provide more information about what kind of plot do you require? In the syri output file (tsv), each row represents a genomic feature. So depending on which features you want in figure you can select them based on Column11.

Syri was designed using chromosomes in mind, so it provides best result for them. If you are comparing your assembly against a chromosome-level reference genome, then you can generate homology-based pseudo-chromosome level assembly (using tools like ntjoin, ragoo etc). And then use this assembly as a query genome (after selecting pseudo-chromosomes and removing unplaced contigs/scaffolds). If the reference genome is also incomplete, then you can use chroder to generate pseudo-chromosome like molecules by concatenating scaffolds in both reference and query genomes. In summary, SyRI expects that the two genomes consists of homologous DNA molecules (ideally with same ids) and that is best done with chromosomes.

Also, using --no-chrmatch stops syri from trying to match chromosome ids. Use this only when the chromosome IDs in the genomes are exactly matching.

Best
Manish

from syri.

sunnycqcn avatar sunnycqcn commented on August 13, 2024

Hello Manish,
Thanks for your suggestions. I will figure out what I need. Then if I still have some problem, I will let you give me more suggestions.
I am much appreciated for your work.
Best,
Fuyou

from syri.

sunnycqcn avatar sunnycqcn commented on August 13, 2024

Hello Manish,
I checked my results. I always can not get TSV format results. The error is as following:
syri - WARNING - starting
Traceback (most recent call last):
File "/home/AAFC-AAC/fuf/miniconda3/envs/genome/bin/syri", line 192, in
getTSV(args.dir, args.prefix, args.ref.name)
File "syri/pyxFiles/writeout.pyx", line 396, in syri.writeout.getTSV
File "syri/pyxFiles/writeout.pyx", line 106, in syri.writeout.extractseq
File "/home/AAFC-AAC/fuf/miniconda3/envs/genome/lib/python3.5/site-packages/Bio/Seq.py", line 250, in getitem
return self._data[index]
IndexError: string index out of range
Then I checked my syri install and reinstalled
I found the error is as following:
gcc: error: unrecognized command line option ‘-fno-plt’ ays
In fact, I met this error when I installed syri the first time.
This error looks the gcc version is not correct.
Could you tell me which gcc version is correct for syri? I installed syri in conda environment.
Thanks,
Fuyou

from syri.

sunnycqcn avatar sunnycqcn commented on August 13, 2024

Hello Manish,
I have installed syri again. I did not met any error. However, I still met the error is as following with two genomes.
syri - WARNING - starting
Traceback (most recent call last):
File "/isilon/saskatoon-rdc/users/fuf/comDIR/syri/syri/bin/syri", line 192, in
getTSV(args.dir, args.prefix, args.ref.name)
File "syri/pyxFiles/writeout.pyx", line 396, in syri.writeout.getTSV
File "syri/pyxFiles/writeout.pyx", line 106, in syri.writeout.extractseq
File "/home/AAFC-AAC/fuf/miniconda3/envs/syri/lib/python3.5/site-packages/Bio/Seq.py", line 250, in getitem
return self._data[index]
IndexError: string index out of range
The results directory is as following:
(syri) [fuf@biocluster test]$ ls -ltrh pairwiseWGA/L022/LREF
total 11M
-rwxr-xr-x 1 fuf domain users 1004 Mar 11 18:09 LREF.L022invOut.txt
-rwxr-xr-x 1 fuf domain users 1.1K Mar 11 18:09 LREF.L022TLOut.txt
-rwxr-xr-x 1 fuf domain users 364 Mar 11 18:09 LREF.L022invTLOut.txt
-rwxr-xr-x 1 fuf domain users 12K Mar 11 18:09 LREF.L022dupOut.txt
-rwxr-xr-x 1 fuf domain users 3.4K Mar 11 18:09 LREF.L022invDupOut.txt
-rwxr-xr-x 1 fuf domain users 41K Mar 11 18:09 LREF.L022ctxOut.txt
-rwxr-xr-x 1 fuf domain users 99K Mar 11 18:09 LREF.L022synOut.txt
-rwxr-xr-x 1 fuf domain users 178K Mar 11 18:09 LREF.L022sv.txt
-rwxr-xr-x 1 fuf domain users 38K Mar 11 18:09 LREF.L022notAligned.txt
-rwxr-xr-x 1 fuf domain users 7.3M Mar 11 18:09 LREF.L022snps.txt

#####################################
I met other errors as following with other two genomes.
syri - WARNING - starting
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/AAFC-AAC/fuf/miniconda3/envs/syri/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/AAFC-AAC/fuf/miniconda3/envs/syri/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "syri/pyxFiles/synsearchFunctions.pyx", line 359, in syri.pyxFiles.synsearchFunctions.syri
File "syri/pyxFiles/synsearchFunctions.pyx", line 697, in syri.pyxFiles.synsearchFunctions.getSynPath
ValueError: max() arg is an empty sequence
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/isilon/saskatoon-rdc/users/fuf/comDIR/syri/syri/bin/syri", line 128, in
chrlink = startSyri(args)
File "syri/pyxFiles/synsearchFunctions.pyx", line 308, in syri.pyxFiles.synsearchFunctions.startSyri
File "syri/pyxFiles/synsearchFunctions.pyx", line 309, in syri.pyxFiles.synsearchFunctions.startSyri
File "/home/AAFC-AAC/fuf/miniconda3/envs/syri/lib/python3.5/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/AAFC-AAC/fuf/miniconda3/envs/syri/lib/python3.5/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: max() arg is an empty sequence
The log file like as following:
2020-03-11 18:22:28,746 - syri - WARNING - :128 - starting
2020-03-11 18:22:28,748 - Reading Coords - INFO - :128 - Reading input from .tsv file
2020-03-11 18:22:28,794 - syri - INFO - :128 - Analysing chromosomes: ['scaffold00001', 'scaffold00002', 'scaffold00003', 'scaffold00004', 'scaffold00005', 'scaffold00006', 'scaffold00007', 'scaffold00008', 'scaffold00009', 'scaffold00010', 'scaffold00011', 'scaffold00012', 'scaffold00013', 'scaffold00014', 'scaffold00015', 'scaffold00016', 'scaffold00017', 'scaffold00018', 'scaffold00019', 'scaffold00020']
2020-03-11 18:22:29,032 - syri.scaffold00001 - INFO - mapstar:44 - scaffold00001 (157, 11)
2020-03-11 18:22:29,033 - syri.scaffold00001 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00001
2020-03-11 18:22:29,037 - syri.scaffold00002 - INFO - mapstar:44 - scaffold00002 (171, 11)
2020-03-11 18:22:29,037 - syri.scaffold00002 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00002
2020-03-11 18:22:29,039 - syri.scaffold00003 - INFO - mapstar:44 - scaffold00003 (145, 11)
2020-03-11 18:22:29,040 - syri.scaffold00003 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00003
2020-03-11 18:22:29,041 - syri.scaffold00004 - INFO - mapstar:44 - scaffold00004 (126, 11)
2020-03-11 18:22:29,042 - syri.scaffold00004 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00004
2020-03-11 18:22:29,043 - syri.scaffold00005 - INFO - mapstar:44 - scaffold00005 (2, 11)
2020-03-11 18:22:29,044 - syri.scaffold00005 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00005
2020-03-11 18:22:29,045 - syri.scaffold00006 - INFO - mapstar:44 - scaffold00006 (0, 11)
2020-03-11 18:22:29,045 - syri.scaffold00006 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00006
2020-03-11 18:22:29,047 - syri.scaffold00007 - INFO - mapstar:44 - scaffold00007 (1, 11)
2020-03-11 18:22:29,048 - syri.scaffold00007 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00007
2020-03-11 18:22:29,049 - syri.scaffold00008 - INFO - mapstar:44 - scaffold00008 (0, 11)
2020-03-11 18:22:29,050 - syri.scaffold00008 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00008
2020-03-11 18:22:29,050 - syri.scaffold00009 - INFO - mapstar:44 - scaffold00009 (0, 11)
2020-03-11 18:22:29,050 - syri.scaffold00009 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00009
2020-03-11 18:22:29,052 - syri.scaffold00010 - INFO - mapstar:44 - scaffold00010 (123, 11)
2020-03-11 18:22:29,053 - syri.scaffold00010 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00010
2020-03-11 18:22:29,054 - syri.scaffold00011 - INFO - mapstar:44 - scaffold00011 (0, 11)
2020-03-11 18:22:29,055 - syri.scaffold00011 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00011
2020-03-11 18:22:29,056 - syri.scaffold00012 - INFO - mapstar:44 - scaffold00012 (108, 11)
2020-03-11 18:22:29,057 - syri.scaffold00012 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00012
2020-03-11 18:22:29,058 - syri.scaffold00013 - INFO - mapstar:44 - scaffold00013 (2, 11)
2020-03-11 18:22:29,059 - syri.scaffold00013 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00013
2020-03-11 18:22:29,060 - syri.scaffold00014 - INFO - mapstar:44 - scaffold00014 (0, 11)
2020-03-11 18:22:29,060 - syri.scaffold00014 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00014
2020-03-11 18:22:29,062 - syri.scaffold00015 - INFO - mapstar:44 - scaffold00015 (79, 11)
2020-03-11 18:22:29,062 - syri.scaffold00015 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00015
2020-03-11 18:22:29,063 - syri.scaffold00016 - INFO - mapstar:44 - scaffold00016 (0, 11)
2020-03-11 18:22:29,064 - syri.scaffold00016 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00016
2020-03-11 18:22:29,064 - syri.scaffold00017 - INFO - mapstar:44 - scaffold00017 (0, 11)
2020-03-11 18:22:29,065 - syri.scaffold00017 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00017
2020-03-11 18:22:29,065 - syri.scaffold00018 - INFO - mapstar:44 - scaffold00018 (75, 11)
2020-03-11 18:22:29,065 - syri.scaffold00018 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00018
2020-03-11 18:22:29,069 - syri.scaffold00019 - INFO - mapstar:44 - scaffold00019 (39, 11)
2020-03-11 18:22:29,069 - syri.scaffold00020 - INFO - mapstar:44 - scaffold00020 (50, 11)
2020-03-11 18:22:29,069 - syri.scaffold00019 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00019
2020-03-11 18:22:29,069 - syri.scaffold00020 - INFO - mapstar:44 - Identifying Synteny for chromosome scaffold00020

This results directory file is empty.
I did not know what it is the reason.
In fact, all genome assembly were used the same methods with 45Mb size.
Thanks,
Fuyou

from syri.

mnshgl0110 avatar mnshgl0110 commented on August 13, 2024

Hi,
The first error is weird. IndexError: string index out of range implies that there are alignments longer than the chromosome size. Could you please check the chromosome IDs and lengths in the input genomes and the alignments for consistency. This seems to be an input issue.
The second error is expected as there are scaffolds without any directed alignments (for example: scaffold00006, scaffold00008, scaffold00009 etc all have 0 forward alignments and hence, no syntenic regions can be identified). As I mentioned earlier, syri was designed using chromosome level assemblies in mind, so it expects that large portion of the homologous chromosomes would be syntenic. I would suggest that you filter out unplaced contigs and scaffolds and compare only large homologous sequences.
Best
Manish

Edit: OR is it the case that these twenty sequences corresponds to 20 chromosomes that have homologous sequences in both assemblies?

from syri.

mnshgl0110 avatar mnshgl0110 commented on August 13, 2024

Hi Fuyou,
Have you checked the chromosomes whether they have the same strand in the two genomes? If for same chromosome, the two assemblies have different strands then most of the alignments would be inverted. In highly conserved chromosomes, this could result in absence of any directed alignment. I wonder, whether this is the reason that there are six scaffolds that do not have any directed alignment with their homologous counterpart. To check for this, you can check the alignments of the scaffolds, and if you see only (majority) inverted alignments between homologous sequences, then probably they are from different strands. Reversing complementing one of the sequence (and then re-aligning) would solve this issue.
Could you please also tell me the length of these twenty molecules analysed here? That might give some hints about how to solve this problem.
Best
Manish

from syri.

sunnycqcn avatar sunnycqcn commented on August 13, 2024

Hello Manish,
You are right. Some chromosome is inverted with reference genome.
I am thinking how to fix it. Because I have about 160 isolate's genome sequence, they assembled based on one reference genome.
Thanks,
Fuyou

from syri.

mnshgl0110 avatar mnshgl0110 commented on August 13, 2024

Do you know which chromosomes are inverted? If yes, then you can use this https://github.com/schneebergerlab/syri/blob/master/syri/bin/chrrev.py script to generate reverse_complemented genomes.

from syri.

sunnycqcn avatar sunnycqcn commented on August 13, 2024

Only can this one do one scaffold or whole genomes? If only is one, I can do it. I will check it.
Thanks,

from syri.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.