Code Monkey home page Code Monkey logo

vicaller's Issues

where to download

Hi,

We would like to try VIcaller. However, there is no virus files for download from website "www.uvm.edu/genomics/software/VIcaller.html"

This is the statement from page 4 of the VIcaller User Manual :
"Obtain and index the virome-wide library using BWA, Bowtie2, and BLAST+ separately:
a) Download the virus_db_090217.fa, virus_db_090217.taxonomy, virus_db_090217.virus_list
and Vector.fa files from the website: www.uvm.edu/genomics/software/VIcaller.html"

Could you let us know where to download those ?

Thanks,
Sharon

Download RepBase for RepeatMasker

There is error when go through step using RepeatMasker:
"Species 'human' is not known to RepeatMasker."
It looks like I didn't setup the reference database RepBase for RepeatMasker properly.

Therefore, I tried to setup RepeatMasker with reference database RepBase by following RepeatMakser INSTALL file. However, I found out we need to pay for downloading the database file Repbase-derived RepeatMasker libraries:
[RepBaseRepeatMaskerEdition-20181026.tar.gz] (53.48 MB) from https://www.girinst.org/server/RepBase/index.php.

How did you setup the reference database (human) for RepeatMasker?

About input data

Dear VIcaller team,

Hi, I'm Oh.

I have a question.

Can I use alignment file derived from paired-end WGS data?
ex) perl VIcaller.pl detect -d WGS -I sample -f .bam -s paired-end

There is no method using alignment file of WGS in user manual.

Thanks.

Oh.

Extract spanning reads and junction reads around integration site

Hi Dr. Xun

Thanks for your works.

Could you give some suggestions to extract spanning reads and junction reads around integration site?

According to the manual, the file seq_1_24020575_24020787_hum an_papillomavirus_type_2189314 04.CS3 might be the one I am looking for. But It was not produced.

Here the codes to run example data:
VIcaller.pl detect -d WGS -i seq -f .fastq.gz -m standard -t 12 -Q 20 -a -r -c /VIcaller_v1.1/VIcaller.config

The seq.output looks fine:
Sample_ID VIcaller_mode QC Reciprocal_alignment Candidate_virus GI Chr. Start End No._chimeric_reads No._split_reads Upstream_breakpoint_on_human Downstream_breakpoint_on_human Upstream_breakpoint_on_virus Downstream_breakpoint_on_virus Information_of_both_upstream_and_downstream_breakpoints Integration_site_in_the_human_genome Integration_allele_fraction No._reads_supporting_nonVI No._reads_supporting_VI Average_alignment_score Is_cell_line_contamination Is_vector Validation_chimeric_confident Validation_chimeric_weak Validation_chimeric_false Validation_split_confident Validation_split_weak Validation_split_false seq standard yes yes human_papillomavirus_type 218931404 1 23694041 23694297 0 0 23694257 23694211 7876 7905 D(+-);D(-+) 23694234 - - 38 64.1578947368421 - - - - - - - -

All Outputs:
.
|-- IlluQC_Filtered_files
|-- seq.3
|-- seq.error
|-- seq.fine_mapped
|-- seq.hmapped
|-- seq.hunmapped
|-- seq.output
|-- seq.repeat2
|-- seq.type
|-- seq.virus_f
|-- seq.virus_f2
|-- seq.visualization
|-- seq_1.1fq
|-- seq_1.1fuq
|-- seq_1.fastq.gz
|-- seq_1sf.fastq
|-- seq_1sf.fuq
|-- seq_1sf.othu
|-- seq_2.1fq
|-- seq_2.1fuq
|-- seq_2.fastq.gz
|-- seq_In.virus_v3
|-- seq_f2
|-- seq_h.bam
|-- seq_h1.sort.bam
|-- seq_hpe.sam
|-- seq_pe.bam
|-- seq_sm.bam
|-- seq_soft.fastq.gz
|-- seq_su.bam
|-- seq_vector.hmap
|-- seq_vector.hmapped
|-- seq_vector.hunmapped
|-- seq_vector.sam
|-- seq_vector_sf.hmap
|-- seq_vector_sf.hunmapped
|-- seq_vector_sf.sam
|-- seq_vsoft_sort.bam
`-- seq_vsu.sort.bam

Best Regards,

Yang

The seq.output of test differ depending on input.

ss.xlsx

This xlsx file consist of three seq.output.
Row 2 is your's seq.output.
Row 3 is output received seq_WGS.bam file as input
Row 4 is output received seq_1.fastq.gz and seq_2.fastq.gz file as input.
Why are the three output different?

my cmd: perl VIcaller.pl detect -d WGS -i test/seq_WGS -f .bam -s paired-end
my config
export PERL5LIB=/etc/perl:/usr/local/lib/x86_64-linux-gnu/perl/5.22.1:/usr/local/share/perl/5.22.1:/usr/lib/x86_64-linux-gnu/perl5/5.22:/usr/share/perl5:/usr/lib/x86_64-linux-gnu/perl/5.22:/usr/share/perl/5.22:/usr/local/lib/site_perl:/usr/lib/x86_64-linux-gnu/perl-base
export PATH=$PATH:/data/program/bowtie2/bowtie2-2.2.9/
human_genome = /data/program/VIcaller_v1.1/Database/Human/hg38.fa
human_genome_tophat = /data/program/VIcaller_v1.1/Database/Human/hg38.fa
virus_genome = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.fa
virus_taxonomy = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.taxonomy
virus_list = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.virus_list
vector_db = /gpfs2/dli5lab/CAVirus/Database/Vector/Vector.fa
cell_line = /data/program/VIcaller_v1.1/Database/cell_line.list
# bowtie_d = /data/program/bowtie2/bowtie2-2.2.9/
# tophat_d = /data/program/tophat/tophat-2.1.0.Linux_x86_64/
# bwa_d = /data/program/bwa/bwa/
# samtools_d = /data/program/samtools/samtools-1.9/
# repeatmasker_d = /data/program/RepeatMasker/
# meme_d = /data/program/meme/
# NGSQCToolkit_d = /data/program/NGS_QC_Toolkit/
# fastuniq_d = /data/program/FastUniq/
# SE_MEI_d = /data/program/VIcaller_v1.1/Tools/SE-MEI/
# hydra_d = /data/program/Hydra-Version-0.5.3/
# blat_d = /home/hikim/bin/x86_64/
# blastn_d = /data/program/blast/ncbi-blast-2.5.0+

Tophat with --no-coverage-search

Hi!

When running VIcaller on one stage tophat suggests running itself in --no-coverage-search mode to make the process much faster. The run indeed completes much faster when this option is added to VIcaller.pl, so I wanted to ask if you tested this option before. Does this option (or can it, in principle), in your opinion, prevent VIcaller from finding/validating some viral integrations?

Sergei

What is input file of calculate?

calculate parameter
image

What is input file of calculate?
I don't know what to do with parameter -F
inputID_h.bam?
inputID_h1.sort.bam?

and,
Is the parameter -N the column No._reads_supporting_VI?

How to filter the list from the "detect" result

Hi Dr. Xun,
I got several candidate virus in my RNA-Seq data sets by the "detect" function of VIcaller.
I read the user manual of VIcaller.
It provide "validate" and "calculate" function of VIcaller to select only one interested virus.
Du to "detect" function detected numerous viruses, how to select those viruses only in one step?

Vicaller.pl 1164 line is strange.

Hi, I have three issue.

  1. I find a strange line at Vicaller.pl
    I don't know perl well but line 1164 seems to something wrong.
    image
    system ("perl ${directory}Scripts/Extract_fasta.pl $GI $virus_genome >${directory}Database/GI/${GI}.fa");

  2. and file name is incorrect: "Scripts/Extract_read_informationpl "--> "Scripts/Extract_read_information.pl"
    Are the two changes correct?

  3. I have trouble running validate function.
    I get the following warning:
    Warning: [blastn] Query is Empty!
    Warning: [blastn] Query is Empty!
    Warning: [blastn] Query is Empty!

but my config and blastn probably have no problem.
so, I take a look at Vicaller.pl
blastn run using query "${input_sampleID2}_aligned_both.fas"
That means "${input_sampleID2}_aligned_both.fas" is not created.

${input_sampleID2}_aligned_both.fas is made this function.

validate function

sub v_obtain_seq {
system ("perl ${directory}Scripts/Extract_specific_loci_final_reads.pl ${input_sampleID}_f2 ${input_sampleID2}.virus_f2 >${input_sampleID2}.information");
my $cmd=q(awk '{print$8}');
system ("$cmd ${input_sampleID2}.information |sort |uniq >${input_sampleID2}.id");
system ("perl ${directory}Scripts/Extract_fastq.pl -f ${input_sampleID}_1.1fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_1.fuq");
system ("perl ${directory}Scripts/Extract_fastq.pl -f ${input_sampleID}_2.1fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_2.fuq");
system ("perl ${directory}Scripts/Extract_fuq_split.pl -f ${input_sampleID}_1sf.fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_sf.fuq");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta_for_split_read.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta_for_split_read.pl $input_sampleID2");
system ("cat ${input_sampleID2}_aligned.fas ${input_sampleID2}_aligned_sf.fas >${input_sampleID2}_aligned_both.fas");
}

In conclusion, i wonder ${input_sampleID2}.virus_f2 is a previously generated file.
Or is there something else wrong?

  1. Among the parameter values ​​in the validation function, Is the virus name only an abbreviation?
    Should not the full name in the output file 'Candidate_virus' column is possible?

output columns explain

Dear VIcaller developer,

I want to know meaning of some output columns.
I see output from detect function that No._chimeric_reads and No._split_reads both are zero, but No._reads_supporting_VI have value bigger than 0. supporting reads = chimeric reads + split reads, is it right? And how to explain these columns?

Best,
JY

Result_visual3-3.pl ERROR

Dear Developer,
I encounter a error while run VIcaller, error message “Modification of non-creatable array value attempted, subscript -1 at Result_visual3-3.pl”

When I looked into the script “Result_visual3-3.pl” , I found that input file “$ARGV[0]_sf.fuq” maybe not exist in any scripts.

Could you help me check it?

Thanks!
JY

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.