Comments (5)
Thanks Chris, I've made some change over the week end, might have broken something. Will check this afternoon.
Cheers
Thierry
from radiator.
Hi Chris,
It works on my end, but then I have the latest version of radiator
from this week end (0.0.6).
I've also made a recent push. So might want to have this latest commit.
Best
Thierry
from radiator.
I took the most recent version this morning to be sure, but I'm still having the problem. I've now identified the source of the error in my dataset (see below). The issue involves the change_alleles() function with multiallelic datasets are input, particularly when the genotypes in the input data frame do not have a separator between the alleles (e.g. 001001).
I've also proposed a fix as well if this makes sense to you. I can submit a PR, but this function deals with so many input types that I wanted to make sure with you that I'm not fixing one case but breaking others.
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Warning: package 'dplyr' was built under R version 3.4.2
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag(): dplyr, stats
# A tribble that captures the state of the input df at this point in the function
input <- tibble::tribble(
~INDIVIDUALS, ~POP_ID, ~MARKERS, ~GT_HAPLO,
"IND1", "POP1", "loc1", "000000",
"IND2", "POP1", "loc1", "001002",
"IND3", "POP1", "loc1", "003004"
)
# This is the original bit of code - lines 303-312 (0.0.6)
input <- input %>%
dplyr::mutate(
GT_VCF_NUC = dplyr::if_else(
stringi::stri_detect_fixed(
str = GT_HAPLO, pattern = "/"),
GT_HAPLO,
stringi::stri_join(GT_HAPLO, GT_HAPLO, sep = "/")),
GT_VCF_NUC = stringi::stri_replace_na(str = GT_VCF_NUC, replacement = "./.")
) %>%
dplyr::select(-GT_HAPLO)
# Note that the alleles are not split. Instead the entire 6 character genotype is treated as an allele.
# Also, missing data '000000' is not converted to NA.
input
#> # A tibble: 3 x 4
#> INDIVIDUALS POP_ID MARKERS GT_VCF_NUC
#> <chr> <chr> <chr> <chr>
#> 1 IND1 POP1 loc1 000000/000000
#> 2 IND2 POP1 loc1 001002/001002
#> 3 IND3 POP1 loc1 003004/003004
# Reset the input
input <- tibble::tribble(
~INDIVIDUALS, ~POP_ID, ~MARKERS, ~GT_HAPLO,
"IND1", "POP1", "loc1", "000000",
"IND2", "POP1", "loc1", "001002",
"IND3", "POP1", "loc1", "003004"
)
# A modification that converts missing data to NA and splits the string into the two alleles
input <- input %>%
dplyr::mutate(
GT_HAPLO = replace(GT_HAPLO, which(GT_HAPLO == "000000"), NA),
GT_VCF_NUC = dplyr::if_else(
stringi::stri_detect_fixed(
str = GT_HAPLO, pattern = "/"),
GT_HAPLO,
stringi::stri_join(stringi::stri_sub(str = GT_HAPLO, from = 1, to = 3), stringi::stri_sub(str = GT_HAPLO, from = 4, to = 6), sep = "/")),
GT_VCF_NUC = stringi::stri_replace_na(str = GT_VCF_NUC, replacement = "./.")
) %>%
dplyr::select(-GT_HAPLO)
# This is the desired output, and the function performs as it should after this point
input
#> # A tibble: 3 x 4
#> INDIVIDUALS POP_ID MARKERS GT_VCF_NUC
#> <chr> <chr> <chr> <chr>
#> 1 IND1 POP1 loc1 ./.
#> 2 IND2 POP1 loc1 001/002
#> 3 IND3 POP1 loc1 003/004
from radiator.
Hi Chris, ok now I better understand the problem...
Most of these functions were recently modified but my test files for multi allelic datasets comes exclusively from RADseq data analyzed by DArT or within stacks (haplotypes file and vcf haplotype files) and all of these file format have nucleotide info, not 6 digits integers. But I used to be able to do it, as last year I was working a bit with microsatellite data.
Will have a look at your code and will also make some test with more input file, as you said they are many input file type, probably too many
Best
Thierry
from radiator.
Hi Chris, it should work now. I tested the function with a variety of multiallelic data set with/without nucleotide coding, with/without REF/ALT fields.
Let me know if this fixed your problem.
Best
Thierry
from radiator.
Related Issues (20)
- SeqArray newest version: unused argument (.progress = TRUE) HOT 6
- error in genomic_converter() HOT 2
- Stacks vcf to rubias HOT 4
- genomic_converter Error HOT 4
- filter_rad error: Column `READ_DEPTH` is a `SeqVarDataList` object. HOT 1
- filter_rad issue error/dplyr:::mutate_error ! & DynamicClusterCall()
- error in reading VCF file HOT 1
- The GDS node "$ref" does not exist. HOT 5
- Genomic converter error 'no more individuals in your data' HOT 9
- Error in file(con, "r") : cannot open the connection HOT 1
- Error when using genomic_converter genlight --> pcadapt HOT 3
- Potential help error with DArT HOT 3
- The GDS node "$ref" does not exist. HOT 4
- genomic_converter errors HOT 5
- Error in `dplyr::mutate()` - No Variants Selected HOT 5
- genomic converter() Error:>! `everything()` must be used within a *selecting* function. HOT 1
- Error Converting VCF to Genepop HOT 12
- read_vcf file access error with parallel.core = 1L argument HOT 3
- ERROR with detect_duplicate_genome + tidy format obtained from genomic_converter HOT 3
- Error with genomic_converter HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from radiator.