adaptivegenome / repeatseq Goto Github PK
View Code? Open in Web Editor NEWAccurate microsatellite genotypes from high-throughput resequencing data
Home Page: www.mittelmanlab.com
License: Other
Accurate microsatellite genotypes from high-throughput resequencing data
Home Page: www.mittelmanlab.com
License: Other
Thank you for developing repeatseq. It's a very useful tool for my current research project.
I have a question about the .calls output file.
Based on your document, there are three types of genotypes in the .call output:
NA, NhM or N (e.g. NA, 7h6, 17)
Example .call output in your document:
[region] [TRF string] [Genotype][Confidence]
2L:6146-6162 3.8_4_78_21_20_52_0_0_47_1.00_ATTA 17 39.3627
2L:7006-7017 4.0_3_100_0_24_66_0_0_33_0.92_AAT NA NA
2L:10589-10595 7.0_1_100_0_14_0_0_0_100_0.00_T 7h6 17.5857
But in my .call output file, there are also multiple results of nL:50 with no value of confidence (e.g. 15L:50, 20L:50).
Example output for my question about .calls file:
1:879055-879069 4_3.8_4_879055_879069_30_0_80_0_20_0.72_CCCT 15L:50
1:879887-879906 5_4.0_5_879887_879906_40_20_60_20_0_1.37_CCAGC 20L:50
I checked the .vcf output, but the nL:50 is not in the file. So I guess I should discard them.
However I also checked the .repeatseq out put and in that format, they 15L:50 would mean genotype 15 and likelihood of 50. So according to this, I should include them since 50 is a high likelihood.
Therefore, I am kind of confused. Does this nL:50 type of result mean it's the same genotype n as the reference, or it's an invalid result I should discard?
I am using RepeatSeq v0.8.2, with the command:
repeatseq -calls input.bam \
Homo_sapiens_assembly19.fasta \
/repeatseq/regions/hg19.2014.nochr.regions
Thanks!
Hi,
I can get useful results using this command:
repeatseq A47294.bam GRCh37-lite.fa hg19.2014.noChr.regions
But when I try and make my own regions file using a subset of the lines in hg19.2014.noChr.regions, I only get a report in the .vcf for one of the regions I specified. I'm trying to match the same sort order, but I'm not having much luck getting results beyond a region of two from my list. Any ideas?
i followed all the steps:
git clone repeatseq
cd repeatseq
git clone bamtools
git clone fastahack
cd bamtools
mkdir build
cd build
cmake ..
make
cd ../fastahack
make
cd ..
make
But it returns the following error: epeatseq.cpp:1398:9: error: cannot convert ‘std::ifstream {aka std::basic_ifstream}’ to ‘bool’ in return
return ifile;
^~~~~
makefile:13: recipe for target 'repeatseq.o' failed
make: *** [repeatseq.o] Error 1
Any suggestion?
I am a novice in bioinformatics. I want to get the 555 error matrix in your paper.
I downloaded the bam file of exome from 1000 genomes project and splited it into individual bam file by chromsomes. Then I downloaded the reference fasta file of chr1. Then I want to get the repeatseq file of chr1. However, when I ran repeatseq chr1.bam chr1.fa chr1.region
I encountered the SegFault. I want know where I made a mistake.
hi,
I have a problem when building with GCC 6.3
g++ -c -O3 -Ibamtools/src repeatseq.cpp
repeatseq.cpp: In function 'bool fileCheck(std::__cxx11::string)':
repeatseq.cpp:1398:9: error: cannot convert 'std::ifstream {aka std::basic_ifstream}' to 'bool' in return
return ifile;
^~~~~
How can I fix the error?
Thanks
Hi,
I am trying to use repeatseq tool, but I am getting following error in using region file.
improper column two or
terminate called after throwing an instance of 'char const*'
Aborted
How to create region file format, especially column two?
Hi, I run the following sorted BAM (and BAM index) files which were aligned using bowtie2 to hg19.
http://wren.omrf.org/data/repeatseq/SRR057346.bam
http://wren.omrf.org/data/repeatseq/SRR057346.bam.bai
I also used UCSC's hg19 chromFa.zip concatenated into one large hg19.fa file and the provided hg19.max5.regions.
The program runs for about 20 minutes then segfaults. It creates the VCF file and the outputs a header into the VCF file, but no other rows. If I specify -counts, it does seem to output a complete counts file, but still segfaults. This is all on 64-bit Ubuntu compiled with g++ 4.6.1.
Also a few other miscellaneous things:
Thanks!
Hi , I have some problem about installing Repeatseq , the error is /usr/bin/ld: cannot find -lbamtools
collect2: error: ld returned 1 exit status ,make: *** [repeatseq] Error 1
help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.