Code Monkey home page Code Monkey logo

rad_haplotyper's People

Contributors

chollenbeck avatar jpuritz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rad_haplotyper's Issues

Issue with haplotype rescues

It looks the the -z parameter is not functioning properly. Here's a read out of the .hap.log for a particular individual using a -z 0.05 It appears to be failing at both the initial and rescue portion.

dDocent_Contig_4163: Unique Observed Haps:
$VAR1 = [
          'CGATT',
          'CAATT',
          'CGACC'
        ];
dDocent_Contig_4163: Problem- trying to fix...
dDocent_Contig_4163: Corrected Unique Observed Haps:
$VAR1 = [
          'CGACC',
          'CGATT',
          'CAATT'
        ];
dDocent_Contig_4163: Unable to rescue
dDocent_Contig_4163
$VAR1 = {
          'CGACC' => 9,
          'CGATT' => 9,
          'CGATG' => 1,
          'CAATT' => 1
        };
Expected haplotypes: 2
Observed haplotypes: 3

This should only see 2 haplotypes, no?

Failed, trying to recover...
dDocent_Contig_4163: Observed Haps:
$VAR1 = [
          'CGATT',
          'CAATT',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGATC',
          'CGATT',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGATT',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGATT',
          'CGACC',
          'CGATT',
          'CGATT',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGGTT',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGATT',
          'CGATT',
          'CGATT',
          'CGATT',
          'CAATT',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGATT',
          'CGATT',
          'CGGTT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGACC',
          'CAATT',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATT',
          'CGGTT',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGATG',
          'CGACC',
          'CGACC',
          'CGACC',
          'CGACC'
        ];
dDocent_Contig_4163: Unique Observed Haps:
$VAR1 = [
          'CGATT',
          'CAATT',
          'CGACC',
          'CGATC',
          'CGGTT'
        ];
dDocent_Contig_4163: Problem- trying to fix...
dDocent_Contig_4163: Corrected Unique Observed Haps:
$VAR1 = [
          'CGATC',
          'CGACC',
          'CGGTT',
          'CGATT',
          'CAATT'
        ];
dDocent_Contig_4163: Unable to rescue
dDocent_Contig_4163
$VAR1 = {
          'CGATC' => 1,
          'CGACC' => 61,
          'CGGTT' => 3,
          'CGATT' => 30,
          'CGATG' => 1,
          'CAATT' => 3
        };
Expected haplotypes: 2
Observed haplotypes: 5
Failed again...

Again I would only expect to see 2.

cannot install

See attached screenshot. Is there a solution to this? Thanks!!

rad_haplotyper_issue

Feature request: new format

How difficult would it be to output sequences in a fasta multiple sequence alignment of all sequences concatenated together? IMA format is close but it's a bit of a pain to go from ima to a fasta alignment of all loci... This format would be useful for further conversion to a wide variety of other programs (like pgd spider), enabling a wide variety of downstream analysis. This feature would be awesome! Thanks. -Zac

Genepop output from hexploid vcf data

Hi,
we have analyzed RAD-Seq data from hexaploid plants with dDocent. The Freebayes step was modified with '-p 6' to create a vcf file with hexaploid data.
In order to generate data for later use in STRUCTURE, we used rad_haplotyper to output genepop files. However, the genepop file contains only diplod data in the 3-digit form (see short example below).
Is it possible to generate true hexaploid genotype data for Genepop or Structure?

Best regards

POP
Pop1_LPV000007, 001002 001001 001001 001002 001001

vcf

My stats file has several thousand that passed, but my output vcf file is empty. I used the line below.

rad_haplotyper.pl -x 28 -v BIA.10FinalDPQMA.recode.vcf -b mapped.bed --genomic_ref -r reference.fasta --vcfout HT.10FinalDPQMABIA.recode.vcf

Please advise.
HT.10FinalDPQMABIA.recode.txt

Install problems..

I'm having trouble installing the perl modules on my Ubuntu server following your instructions.. I tried several alternative ways of installing the perl modules and many of them worked but the script won't run. I second Jon's suggestion of a miniconda installation... I've been fighting with this for hours. Thanks. -Zac

Feature Request

Having the script work when there are more than one RAD locus per reference contig.

Potentially erroneous "failed" haplotyping calls

I’ve been noticing the following issues with radhaplotyper for a while and am finally getting around to asking about it. I’ve listed one example for each of several different undesirable decisions made by radhaplotyper (in my opinion) with the explanation by rad hap.

My settings are:

perl $RADHAP_SCRIPT -v ${Indexing}.vcf -x 1 -e -d ${THRESHOLDa} -mp ${THRESHOLDb} -u ${THRESHOLDc} -ml ${THRESHOLDd} -h ${THRESHOLDe} -z ${THRESHOLDf} -m ${THRESHOLDg} -r ${REF_FILE} -bp ${BAM_PATH} -p ${PopMap}$CutoffCode -o ${VCF_OUT}.Fltr${FILTER_ID}.Haplotyped.vcf #-g ${VCF_OUT}.Fltr${FILTER_ID}.${PopMap##/}.haps.genepop -a ${VCF_OUT}.Fltr${FILTER_ID}.${PopMap##/}.haps.ima

where the THRESHOLD variables are set, in order, according to:
19 rad_haplotyper -d 50
19 rad_haplotyper -mp 10
19 rad_haplotyper -u 30
19 rad_haplotyper -ml 10
19 rad_haplotyper -h 100
19 rad_haplotyper -z 0.2
19 rad_haplotyper -m 0.6

I added the -bp option to radhaplotyper for convenience. I’m also wrapping this into a gnu parallel command that divides the vcf into $NumProc pieces (by row), thus the -x 1 option. The rest should be self-explanatory.

*Note: I did branch this in github then sent a req for it to be merged into the official radhaplotyper, but it was not. branched v 1.1.9

HOMOZYG with 1 alt read REJECTED due to detection of no haps
dDocent_Contig_11076: Observed Haps:
$VAR1 = {
'CTCTCTCGGCATGATATAT' => 11,
'CTCTCTCGGCAGGATATAT' => 1
};
dDocent_Contig_11076: Unique Observed Haps:
$VAR1 = [];
dDocent_Contig_11076: Unable to rescue
dDocent_Contig_11076
$VAR1 = {
'CTCTCTCGGCAGGATATAT' => 1,
'CTCTCTCGGCATGATATAT' => 11
};
Expected haplotypes: 1
Observed haplotypes: 0
Failed again...

HOMOZYG with 1 alt read REJECTED due to expectation of HETEROZYG:
dDocent_Contig_11314: Unable to rescue
dDocent_Contig_11314
$VAR1 = {
'ATCGAATCTGTCCAG' => 1,
'ATTGAATCTGTCCGG' => 14
};
Expected haplotypes: 2
Observed haplotypes: 1
Failed, trying to recover...

HOMOZYG with more noise REJECTED due to expectation of HETEROZYG:
dDocent_Contig_11383: Unable to rescue
dDocent_Contig_11383
$VAR1 = {
'TCATATATAGGTGTTTGA' => 1,
'TCATATAAAGGTGTTTGA' => 1,
'TCATATAGAGGTGGTTGA' => 1,
'TCATATAGCGGTGTTTGA' => 1,
'TCATATAGAGGTGTTTGT' => 2,
'TCATATAGAGGTGTTTGA' => 61
};
Expected haplotypes: 2
Observed haplotypes: 1
Failed again...

HOMOZYG low cov REJECTED due to expectation of HETEROZYG:
dDocent_Contig_10963: Observed Haps:
$VAR1 = {
'CAGGATGATTCTAGTCCTGCAAGTCCT' => 5
};
dDocent_Contig_10963: Unique Observed Haps:
$VAR1 = [
'CAGGATGATTCTAGTCCTGCAAGTCCT'
];
dDocent_Contig_10963: Unable to rescue
dDocent_Contig_10963
$VAR1 = {
'CAGGATGATTCTAGTCCTGCAAGTCCT' => 5
};
Expected haplotypes: 2
Observed haplotypes: 1
Failed again...

HETEROZYG low cov called HOMOZYG:
dDocent_Contig_11076: Observed Haps:
$VAR1 = {
'CTCCCTCGGCATGATATAT' => 3,
'CCTCCTCGGCATAATATAT' => 3
};
dDocent_Contig_11076: Unique Observed Haps:
$VAR1 = [
'CTCCCTCGGCATGATATAT'
];
dDocent_Contig_11076: Unable to rescue

HETEROZYG low cov w/ noise called TRIPLOID:
dDocent_Contig_11371: Observed Haps:
$VAR1 = {
'GCGCACAGGGTGGTACGAG' => 4,
'TCGCACAGGGTAGTACGAG' => 4,
'GCGCACAGGGTAGTACGAG' => 1
};
dDocent_Contig_11371: Unique Observed Haps:
$VAR1 = [
'GCGCACAGGGTAGTACGAG',
'TCGCACAGGGTAGTACGAG',
'GCGCACAGGGTGGTACGAG'
];
dDocent_Contig_11371: Problem- trying to fix...
dDocent_Contig_11371: haplotype count threshold:1.8
dDocent_Contig_11371: Corrected Unique Observed Haps:
$VAR1 = [
'GCGCACAGGGTGGTACGAG',
'GCGCACAGGGTAGTACGAG',
'TCGCACAGGGTAGTACGAG'
];
dDocent_Contig_11371: Unable to rescue
dDocent_Contig_11371
$VAR1 = {
'GCGCACAGGGTGGTACGAG' => 4,
'GCGCACAGGGTAGTACGAG' => 1,
'TCGCACAGGGTAGTACGAG' => 4
};
Expected haplotypes: 2
Observed haplotypes: 3
Failed again...

HETEROZYG w/ noise called TRIPLOID
dDocent_Contig_11076: Observed Haps:
$VAR1 = {
'CTCCCTCGGCATGATATAT' => 8,
'TCTACTGGGCATGATGCTC' => 9,
'TCTAGTGGGGTTGATGCTC' => 2
};
dDocent_Contig_11076: Unique Observed Haps:
$VAR1 = [
'CTCCCTCGGCATGATATAT',
'TCTACTGGGCATGATGCTC',
'TCTAGTGGGGTTGATGCTC'
];
dDocent_Contig_11076: Problem- trying to fix...
dDocent_Contig_11076: haplotype count threshold:3.8
dDocent_Contig_11076: Corrected Unique Observed Haps:
$VAR1 = [
'CTCCCTCGGCATGATATAT',
'TCTAGTGGGGTTGATGCTC',
'TCTACTGGGCATGATGCTC'
];
dDocent_Contig_11076: Unable to rescue
dDocent_Contig_11076
$VAR1 = {
'TCTAGTGGGGTTGATGCTC' => 2,
'TCTACTGGGCATGATGCTC' => 9,
'CTCCCTCGGCATGATATAT' => 8
};
Expected haplotypes: 2
Observed haplotypes: 3
Failed again...

HETEROZYG w/ noise called HOMOZYG:
dDocent_Contig_11002: Observed Haps:
$VAR1 = {
'GAGGGACCGCCGAACAGCTCTTCGTCTCG' => 17,
'GAGGGCCCGCAGAACGACTCTTCGTCGCG' => 1,
'GAGGGCCCGCAGAACGGCTCTTCGTCGCG' => 20,
'GAGGGACCGCCGAACAACTCTTCGTCTCG' => 1,
'GAGGGACCGCCGAACAGCTCTTCGGCTCG' => 1
};
dDocent_Contig_11002: Unique Observed Haps:
$VAR1 = [
'GAGGGCCCGCAGAACGGCTCTTCGTCGCG'
];
dDocent_Contig_11002: Unable to rescue
dDocent_Contig_11002
$VAR1 = {
'GAGGGCCCGCAGAACGACTCTTCGTCGCG' => 1,
'GAGGGACCGCCGAACAGCTCTTCGTCTCG' => 17,
'GAGGGACCGCCGAACAACTCTTCGTCTCG' => 1,
'GAGGGCCCGCAGAACGGCTCTTCGTCGCG' => 20,
'GAGGGACCGCCGAACAGCTCTTCGGCTCG' => 1
};
Expected haplotypes: 2
Observed haplotypes: 1
Failed, trying to recover...

HETEROZYG called HOMOZYG:
dDocent_Contig_11132: Observed Haps:
$VAR1 = {
'TCAAGC' => 14,
'GAGAGA' => 13
};
dDocent_Contig_11132: Unique Observed Haps:
$VAR1 = [
'TCAAGC'
];
dDocent_Contig_11132: Unable to rescue
dDocent_Contig_11132
$VAR1 = {
'GAGAGA' => 13,
'TCAAGC' => 14
};
Expected haplotypes: 2
Observed haplotypes: 1
Failed again...

PackagesNotFoundError: The following packages are not available from current channels: - rad_haplotyper

Hi, I am trying to phase some ddrad data. Thank you for any help. I am getting the below error when trying to create the rad_haplotyper conda environment:

(base) Jessicas-MacBook-Pro-2:filtering jessicaoswald$ conda create -n rad_haplotyper_env rad_haplotyper
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  • rad_haplotyper

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

"Vcf" perl module changed name to VCF

The package has changed its name, which has broken this script. I went through the code and fixed 3 instances where I thought it was referring to the Vcf Module. Nevertheless, when I try to run this script I get the following error:

usage: GLOB->new() at ./rad_haplotyper.pl line 202.

This has something to do with the VCF package name change I think, but I'm stuck. I've tried using the manual cpan install as well as the conda environment. Both give the same error.

Feature request: phased vcf

I'm hoping this might be a simple request. I'd like an output .vcf that retains information on phase.

I can see that there is already a sub-routine for this, but the comment says "not currently supported".

I tried the sub-routine, and it seemed to give a phased vcf file.

Just wondering what, if any issues I may encounter trying if I try to use this file.

Thanks,
Shea

Script runs but no output files

Hello,

I tried running rad_haplotyper.pl on my data and it works but there are no output files.

Here is my code for running the script:
rad_haplotyper.pl -v SNP.DP3g95p5maf05.HWE.recode.vcf -r reference.fasta -g SNP.recode.genepop -a SNP.recode.ima -p SNP.recode.popmap

These are the last two lines I get after all SNPs have been collapsed into haplotypes and everything is filtered:
Writing Genepop file: SNP.recode.genepop
No such file or directory at rad_haplotyper.pl line 946.

I checked line 946 which mentions a folder named POP, so I created a folder named POP and ran the code again but it still didn't create any output files.

Any help is appreciated! Thanks

Vcf module problem

Hi,

Let me first say I'm a programming roocky, although I'm learning :)
I am having an issue with setting up my laptop (Mac) to run the rad_haplotyper script. I downloaded all Perl modules but when I run the script in the 'snp filtering' tutorial I get this message:

$ perl rad_haplotyper.pl -v SNP.DP3g95p5maf05.HWE.recode.vcf -x 40 -mp 1 -u 20 -ml 4 -n -r reference.fasta
Subroutine validate redefined at /Library/Perl/5.18/Vcf.pm line 94.
Subroutine validate_v32 redefined at /Library/Perl/5.18/Vcf.pm line 117.
Subroutine new redefined at /Library/Perl/5.18/Vcf.pm line 147.
Subroutine throw redefined at /Library/Perl/5.18/Vcf.pm line 175.
Ambiguous call resolved as CORE::warn(), qualify as such or use & at /Library/Perl/5.18/Vcf.pm line 185.
Subroutine warn redefined at /Library/Perl/5.18/Vcf.pm line 181.
Ambiguous call resolved as CORE::open(), qualify as such or use & at /Library/Perl/5.18/Vcf.pm line 222.
Subroutine _open redefined at /Library/Perl/5.18/Vcf.pm line 189.
Subroutine open redefined at /Library/Perl/5.18/Vcf.pm line 249.
Ambiguous call resolved as CORE::close(), qualify as such or use & at /Library/Perl/5.18/Vcf.pm line 269.
Subroutine close redefined at /Library/Perl/5.18/Vcf.pm line 266.
Subroutine next_line redefined at /Library/Perl/5.18/Vcf.pm line 286.
Subroutine _unread_line redefined at /Library/Perl/5.18/Vcf.pm line 316.
Subroutine next_data_array redefined at /Library/Perl/5.18/Vcf.pm line 334.
Subroutine set_samples redefined at /Library/Perl/5.18/Vcf.pm line 358.
Subroutine _set_version redefined at /Library/Perl/5.18/Vcf.pm line 386.
Removed 0 loci (0 SNPs) with more than 20 SNPs at a locus
Can't locate object method "new" via package "Vcf" (perhaps you forgot to load "Vcf"?) at rad_haplotyper.pl line 176.

When I check whether I have the Vcf module, this returns:

$ perl -e "use Vcf "
Subroutine validate redefined at /Library/Perl/5.18/Vcf.pm line 94.
Subroutine validate_v32 redefined at /Library/Perl/5.18/Vcf.pm line 117.
Subroutine new redefined at /Library/Perl/5.18/Vcf.pm line 147.
Subroutine throw redefined at /Library/Perl/5.18/Vcf.pm line 175.
Ambiguous call resolved as CORE::warn(), qualify as such or use & at /Library/Perl/5.18/Vcf.pm line 185.
Subroutine warn redefined at /Library/Perl/5.18/Vcf.pm line 181.
Ambiguous call resolved as CORE::open(), qualify as such or use & at /Library/Perl/5.18/Vcf.pm line 222.
Subroutine _open redefined at /Library/Perl/5.18/Vcf.pm line 189.
Subroutine open redefined at /Library/Perl/5.18/Vcf.pm line 249.
Ambiguous call resolved as CORE::close(), qualify as such or use & at /Library/Perl/5.18/Vcf.pm line 269.
Subroutine close redefined at /Library/Perl/5.18/Vcf.pm line 266.
Subroutine next_line redefined at /Library/Perl/5.18/Vcf.pm line 286.
Subroutine _unread_line redefined at /Library/Perl/5.18/Vcf.pm line 316.
Subroutine next_data_array redefined at /Library/Perl/5.18/Vcf.pm line 334.
Subroutine set_samples redefined at /Library/Perl/5.18/Vcf.pm line 358.
Subroutine _set_version redefined at /Library/Perl/5.18/Vcf.pm line 386.

So it seems I do have the module, but it doesn't work.
I was wondering whether people have similar issues with the Vcf module on Mac?

Thanks in advance, Thijs

Ima2 file error

In the ima2 formatted output files, the listed length of each haplotype sequence is generally 30 bp longer than they actually are. However, a very small number of these values are correct.

I figured this out because in writing a script to convert the ima2 out file to arlequin, I used the haplotype length values to determine how many? To insert or the missing haplotypes by locus. This resulted in a file that arlequin couldn't digest.

Once I identified the issue, I used the actual lengths of the haps in each locus to determine the number of? And I now have a working arlequin file.

Truncated names in IMA format still too long

Chris: It looks like the name truncation needs to be one less character. Once the "A" or "B" is added, the name becomes 11 characters and the data starts at position 12. But when names are less than 10 characters, the proper spacing is kept (data starts pos. 11).

e.g.
l_SCW2290BNCATGCTTTTC...
mel_T07642ANCATGCTTTTC...
mel_B159A NCATGCTTTTC...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.