yaozhou89 / tgg Goto Github PK
View Code? Open in Web Editor NEWtomato graph pangenome
License: MIT License
tomato graph pangenome
License: MIT License
In your method, Genotypes of Structural variants (SVs) in 706 accessions with following code:
multigrmpy.py -i /public10/home/sci0011/projects/tomato2/08_paragraph/02_vcf/SV.paragraph.vcf -m /public10/home/sci0011/projects/tomato2/08_paragraph/01_bam/samples_${sample}.txt -r ~/data/ref/SL5.0/SL5.0_chr_number.fa -o . --threads 64
but how to gain the file "SV.paragraph.vcf" ?
Hi, the 1.Genome_asembly/1.contig
part reports that
Flye, Hicanu and Hifiasm are used to assemble primary genome. GALA is used to fillter the potential miss-assembly regions. WGS is manual software intergrated in pipeline.
but I am not able to find all the code/scripts about it. For example, I see that a script called draft_comp.sh
is missing, I don't see where flye
, canu
, gala
are called, etc...
I state that I am a newbie with snakemake, and I apologize if this is the problem.
Hello, Mr. Zhou!
I had a question when I read your code about dedup merged SV. Is this step necessary? Does it mean that if I do long-reads sequencing (HiFi or ONT) on some individuals, I also need to do short-reads sequencing (>20x) on them? This ensures that I can use their short reads for redup after merging SVs?
Yaozhou89,您好,在《Graph pangenome captures missing heritability and empowers tomato breeding》这篇文章的reference的5.1中提到了您使用simug进行simulated genome以及使用 art_illumina 进行 simulated short reads 的相关工作,但是我在 Benchmark study of graph simulation 部分并没有找到更加详细的介绍,simug 部分的shell也是空的,我想知道您是否能够更新补充这一部分内容。
此外,在 http://solomics.agis.org.cn/tomato/ftp 数据库中也没有找到simulation相关的short reds数据,不知道您是否有将相关数据传入其他数据库,或者您是否能够提供相关数据。
感谢您的宝贵时间与帮助。
Hi,Zhou!
I read your article on the tomato pan-genome, fantastic job! But I get confused when I read your method section about genome-wide association study, you said that "After pruning using PLINK (v.2.0) with the parameter ‘-indep-pairwise’ set to ‘50 5 0.2’, the pruned SNPs were used for the kinship matrix (genetic relationship matrix; GRM). For SNPs and indels, the pruned dataset (-indep-pairwise 100, 1, 0.98) was used". Does this sentence mean that the second parameter is used when the sum of snp and indel is in the vcf file? I see that your article also includes other types, sv and indel separately, or SNP+indel+SV, does this need to filter the LD, what parameters do you use?
Sorry to bother u. Thanks a million!
In your pipeline, the heritability assessment with a single variant set was provided, but I want to analyze categories of genetic variants jointly following your article, can you provide me with a more detailed code? thanks~
Dear Teacher Zhou:
Recently, I selected the SL5.0 tomato genome in Graph pangenome captures missing heritability and empowers tomato breeding as a reference genome. I would like to know the centromere position of each chromosome of SL5.0 in Figure 1a. Thank you!
Very much looking forward to your reply !
Hi, there are multiple missing images in the markdown files. The links are broken as they point to a local folder. Here are a few of them:
in https://github.com/YaoZhou89/TGG/blob/main/1.Genome_assembly/1.contig/Readme.md
![image-20220103182607789](/Users/zhiyangzhang/Library/Application Support/typora-user-images/image-20220103182607789.png)
in https://github.com/YaoZhou89/TGG/blob/main/1.Genome_assembly/2.scaffold/Readme.md
![image-20220103183236565](/Users/zhiyangzhang/Library/Application Support/typora-user-images/image-20220103183236565.png)
in https://github.com/YaoZhou89/TGG/blob/main/2.Genome_annotation/Readme.md
![image-20220101234140544](/Users/zhiyangzhang/Library/Application Support/typora-user-images/image-20220101234140544.png)
in https://github.com/YaoZhou89/TGG/blob/main/4.Graph_pangenome/1.construction_graph_genome/Readme.md
![image-20220102154738977](/Users/zhiyangzhang/Library/Application Support/typora-user-images/image-20220102154738977.png)
尊敬的周老师@YaoZhou89
请问在如下步骤中,多个个体以什么方式合并到一个vcf中,多个样本合并还是融合到一个样本?
这里的cleanSV行使了什么功能?
另外,在如下步骤中,如果我的流程中不涉及使用多个SV calling软件,只用了一个软件,我可以不使用这个步骤来对SV去重吗?
敬上
强森
Dear Dr. Zhou,
Thanks a lot for this fantastic work!
I wonder how did you quantify structural variations when constructing a GRM for SVs.
Many thanks!
Sincerely,
Stella.
In your pipeline, The prepare_gene_expression.R script is missing in 5.Genetic_analysis, can you provide this code? Thanks~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.