xunchen85 / vicaller Goto Github PK
View Code? Open in Web Editor NEWA software to detect virome-wide integrations
A software to detect virome-wide integrations
Hi,
We would like to try VIcaller. However, there is no virus files for download from website "www.uvm.edu/genomics/software/VIcaller.html"
This is the statement from page 4 of the VIcaller User Manual :
"Obtain and index the virome-wide library using BWA, Bowtie2, and BLAST+ separately:
a) Download the virus_db_090217.fa, virus_db_090217.taxonomy, virus_db_090217.virus_list
and Vector.fa files from the website: www.uvm.edu/genomics/software/VIcaller.html"
Could you let us know where to download those ?
Thanks,
Sharon
There is error when go through step using RepeatMasker:
"Species 'human' is not known to RepeatMasker."
It looks like I didn't setup the reference database RepBase for RepeatMasker properly.
Therefore, I tried to setup RepeatMasker with reference database RepBase by following RepeatMakser INSTALL file. However, I found out we need to pay for downloading the database file Repbase-derived RepeatMasker libraries:
[RepBaseRepeatMaskerEdition-20181026.tar.gz] (53.48 MB) from https://www.girinst.org/server/RepBase/index.php.
How did you setup the reference database (human) for RepeatMasker?
Dear VIcaller team,
Hi, I'm Oh.
I have a question.
Can I use alignment file derived from paired-end WGS data?
ex) perl VIcaller.pl detect -d WGS -I sample -f .bam -s paired-end
There is no method using alignment file of WGS in user manual.
Thanks.
Oh.
Hi Dr. Xun
Thanks for your works.
Could you give some suggestions to extract spanning reads and junction reads around integration site?
According to the manual, the file seq_1_24020575_24020787_hum an_papillomavirus_type_2189314 04.CS3 might be the one I am looking for. But It was not produced.
Here the codes to run example data:
VIcaller.pl detect -d WGS -i seq -f .fastq.gz -m standard -t 12 -Q 20 -a -r -c /VIcaller_v1.1/VIcaller.config
The seq.output looks fine:
Sample_ID VIcaller_mode QC Reciprocal_alignment Candidate_virus GI Chr. Start End No._chimeric_reads No._split_reads Upstream_breakpoint_on_human Downstream_breakpoint_on_human Upstream_breakpoint_on_virus Downstream_breakpoint_on_virus Information_of_both_upstream_and_downstream_breakpoints Integration_site_in_the_human_genome Integration_allele_fraction No._reads_supporting_nonVI No._reads_supporting_VI Average_alignment_score Is_cell_line_contamination Is_vector Validation_chimeric_confident Validation_chimeric_weak Validation_chimeric_false Validation_split_confident Validation_split_weak Validation_split_false seq standard yes yes human_papillomavirus_type 218931404 1 23694041 23694297 0 0 23694257 23694211 7876 7905 D(+-);D(-+) 23694234 - - 38 64.1578947368421 - - - - - - - -
All Outputs:
.
|-- IlluQC_Filtered_files
|-- seq.3
|-- seq.error
|-- seq.fine_mapped
|-- seq.hmapped
|-- seq.hunmapped
|-- seq.output
|-- seq.repeat2
|-- seq.type
|-- seq.virus_f
|-- seq.virus_f2
|-- seq.visualization
|-- seq_1.1fq
|-- seq_1.1fuq
|-- seq_1.fastq.gz
|-- seq_1sf.fastq
|-- seq_1sf.fuq
|-- seq_1sf.othu
|-- seq_2.1fq
|-- seq_2.1fuq
|-- seq_2.fastq.gz
|-- seq_In.virus_v3
|-- seq_f2
|-- seq_h.bam
|-- seq_h1.sort.bam
|-- seq_hpe.sam
|-- seq_pe.bam
|-- seq_sm.bam
|-- seq_soft.fastq.gz
|-- seq_su.bam
|-- seq_vector.hmap
|-- seq_vector.hmapped
|-- seq_vector.hunmapped
|-- seq_vector.sam
|-- seq_vector_sf.hmap
|-- seq_vector_sf.hunmapped
|-- seq_vector_sf.sam
|-- seq_vsoft_sort.bam
`-- seq_vsu.sort.bam
Best Regards,
Yang
This xlsx file consist of three seq.output.
Row 2 is your's seq.output.
Row 3 is output received seq_WGS.bam file as input
Row 4 is output received seq_1.fastq.gz and seq_2.fastq.gz file as input.
Why are the three output different?
my cmd: perl VIcaller.pl detect -d WGS -i test/seq_WGS -f .bam -s paired-end
my config
export PERL5LIB=/etc/perl:/usr/local/lib/x86_64-linux-gnu/perl/5.22.1:/usr/local/share/perl/5.22.1:/usr/lib/x86_64-linux-gnu/perl5/5.22:/usr/share/perl5:/usr/lib/x86_64-linux-gnu/perl/5.22:/usr/share/perl/5.22:/usr/local/lib/site_perl:/usr/lib/x86_64-linux-gnu/perl-base
export PATH=$PATH:/data/program/bowtie2/bowtie2-2.2.9/
human_genome = /data/program/VIcaller_v1.1/Database/Human/hg38.fa
human_genome_tophat = /data/program/VIcaller_v1.1/Database/Human/hg38.fa
virus_genome = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.fa
virus_taxonomy = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.taxonomy
virus_list = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.virus_list
vector_db = /gpfs2/dli5lab/CAVirus/Database/Vector/Vector.fa
cell_line = /data/program/VIcaller_v1.1/Database/cell_line.list
# bowtie_d = /data/program/bowtie2/bowtie2-2.2.9/
# tophat_d = /data/program/tophat/tophat-2.1.0.Linux_x86_64/
# bwa_d = /data/program/bwa/bwa/
# samtools_d = /data/program/samtools/samtools-1.9/
# repeatmasker_d = /data/program/RepeatMasker/
# meme_d = /data/program/meme/
# NGSQCToolkit_d = /data/program/NGS_QC_Toolkit/
# fastuniq_d = /data/program/FastUniq/
# SE_MEI_d = /data/program/VIcaller_v1.1/Tools/SE-MEI/
# hydra_d = /data/program/Hydra-Version-0.5.3/
# blat_d = /home/hikim/bin/x86_64/
# blastn_d = /data/program/blast/ncbi-blast-2.5.0+
hello Xun,
I encountered a problem when I try to access to the website you described http://www.uvm.edu/genomics/software/VIcaller.html. I cannot find the the virome-wide library under the Database/ folder. I can only access to the http://www.uvm.edu/. Do you know how to solve this?
Thank you very much!
Hi!
When running VIcaller on one stage tophat suggests running itself in --no-coverage-search mode to make the process much faster. The run indeed completes much faster when this option is added to VIcaller.pl, so I wanted to ask if you tested this option before. Does this option (or can it, in principle), in your opinion, prevent VIcaller from finding/validating some viral integrations?
Sergei
Hi Dr. Xun,
I got several candidate virus in my RNA-Seq data sets by the "detect" function of VIcaller.
I read the user manual of VIcaller.
It provide "validate" and "calculate" function of VIcaller to select only one interested virus.
Du to "detect" function detected numerous viruses, how to select those viruses only in one step?
Hi, I have three issue.
I find a strange line at Vicaller.pl
I don't know perl well but line 1164 seems to something wrong.
system ("perl ${directory}Scripts/Extract_fasta.pl $GI
and file name is incorrect: "Scripts/Extract_read_informationpl "--> "Scripts/Extract_read_information.pl"
Are the two changes correct?
I have trouble running validate function.
I get the following warning:
Warning: [blastn] Query is Empty!
Warning: [blastn] Query is Empty!
Warning: [blastn] Query is Empty!
but my config and blastn probably have no problem.
so, I take a look at Vicaller.pl
blastn run using query "${input_sampleID2}_aligned_both.fas"
That means "${input_sampleID2}_aligned_both.fas" is not created.
${input_sampleID2}_aligned_both.fas is made this function.
sub v_obtain_seq {
system ("perl ${directory}Scripts/Extract_specific_loci_final_reads.pl ${input_sampleID}_f2 ${input_sampleID2}.virus_f2 >${input_sampleID2}.information");
my $cmd=q(awk '{print$8}');
system ("$cmd ${input_sampleID2}.information |sort |uniq >${input_sampleID2}.id");
system ("perl ${directory}Scripts/Extract_fastq.pl -f ${input_sampleID}_1.1fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_1.fuq");
system ("perl ${directory}Scripts/Extract_fastq.pl -f ${input_sampleID}_2.1fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_2.fuq");
system ("perl ${directory}Scripts/Extract_fuq_split.pl -f ${input_sampleID}_1sf.fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_sf.fuq");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta_for_split_read.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta_for_split_read.pl $input_sampleID2");
system ("cat ${input_sampleID2}_aligned.fas ${input_sampleID2}_aligned_sf.fas >${input_sampleID2}_aligned_both.fas");
}
In conclusion, i wonder ${input_sampleID2}.virus_f2 is a previously generated file.
Or is there something else wrong?
Dear VIcaller developer,
I want to know meaning of some output columns.
I see output from detect function that No._chimeric_reads and No._split_reads both are zero, but No._reads_supporting_VI have value bigger than 0. supporting reads = chimeric reads + split reads, is it right? And how to explain these columns?
Best,
JY
Dear Developer,
I encounter a error while run VIcaller, error message “Modification of non-creatable array value attempted, subscript -1 at Result_visual3-3.pl”
When I looked into the script “Result_visual3-3.pl” , I found that input file “$ARGV[0]_sf.fuq” maybe not exist in any scripts.
Could you help me check it?
Thanks!
JY
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.