uqrmaie1 / admixtools Goto Github PK

View Code? Open in Web Editor NEW

71.0 71.0 14.0 62.5 MB

Home Page: https://uqrmaie1.github.io/admixtools

R 92.01% C++ 7.99%

admixtools's People

Stargazers

Watchers

Forkers

razb ulasisik jagarinart jaisonjseq aalbrechtsen clwgg matteobolner rossibarra wook2014 joshuamschmidt itsmestewart

admixtools's Issues

[feature request] Add support for VCF files

Thanks a lot for a great package!

I think adding support for genotypes in VCF format would be hugely helpful. There are a number of issues going from VCF to plink or eigenstrat formats, and sticking with VCF combined with a population file would achieve the same results.

D statistic estimation

Hi,

I am trying to figure out the right order to set up populations for calculating D statistics with the f4 function. The interpretation of these results will depend on how D is defined in the package. In the documentation of the function it suggests that it is doing something equivalent to ABBA - BABA on the numerator. However, in Patterson 2012 the numerator is estimated as BABA - ABBA. Some of our results suggest that the program can be doing BABA - ABBA, as opposed to what is found in the documentation of f4. Is this the case?

Thank you in advance!

Is extract_f2 maxmem per core?

It would be great if the documentation tells you this.

genotype data with read_eigenstrat

Dear Robert,
I'm reading some .geno, .ind, .snp data from Reich's lab. read_eigenstrat reads data OK, however when I try to get the $geno data, then I see something like
$ geno: num [1:597573, 1:4168] -48 -49 -49 -49 -49 -49 -49 -49 -49 -49 ...

what are these negative values?

I'm reading data like this:
library(admixtools)
prefix <- "v44.3_HO_public"
pops <- read.table("european.pops")
data <- read_eigenstrat(prefix, pops=pops[,1])

One more question:
what are the $cm and the $A1 and $A2 values.

snp : spec_tbl_df [597,573 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
..$ SNP: chr [1:597573] "rs3094315" "rs7419119" "rs13302957" "rs6696609" ...
..$ CHR: chr [1:597573] "1" "1" "1" "1" ...
..$ cm : num [1:597573] 0.0201 0.0225 0.0241 0.0245 0.0257 ...
..$ POS: num [1:597573] 752566 842013 891021 903426 949654 ...
..$ A1 : chr [1:597573] "G" "T" "G" "C" ...
..$ A2 : chr [1:597573] "A" "G" "A" "T" ...

all the best,
pavlos

Error in satisfies_constraints(...) when using admix and event constraints with find_graphs

Hi,
I've been getting the error below when I use both admix_constraints and event_constraints with find_graphs(). It typically appears after the constraints have been satisfied.
Do you have any solutions to this?

Error in filter(., satisfies_constraints(g, nzf4, admixc, eventc)) :
ℹ In argument: satisfies_constraints(g, nzf4, admixc, eventc).
ℹ In row 3.
Caused by error in mutate():
ℹ In argument: ok = satisfies_oneevent(...).
ℹ In row 3.
Caused by error in satisfies_oneevent():
! satisfies_oneevent error

Zero drift edges

Hi,

Some of the drift edges after optimization (using GUI) show exactly 0 length drift edges. I know that in the classic qpgraph, such situations were to be avoided as it implied a non-existent edge. Setting the constraints in the shiny GUI to minimum drift length =1 for all edges does not change the outcome.

If qpgraph_multi_resampling is done with 100 bootstraps and it shows a range of say (0-3) for that particular edge, does it mean that the problem is irrelevant?

Is this important, and if yes how can it be fixed?

Thanks.

No SNPs remain!

I keep getting a warning message when running extract_f2 via the command line or using run shiny tools
I am using a eigenstrat file which was converted from a vcf using convertVCFtoEigenstrat.sh
warning:Error in discard_from_aftable: No SNPs remain! Increase maxmiss or select fewer populations

I have tried selecting maxmiss =1
and only 3 populations with around 10 individuals in but for some reason no success, do you have any recommendations?

Do I need to run LD Prune before running ADMIXTOOLS 2?

$f4 missing from qpAdm

Whenever I run qpAdm, the option to view f4 stats is missing; $weights, $popdrop, and $rankdrop are present, but $f4 just returns NULL.

Fst calculation inconsistency

extract_f2(pref = "Sh_FB_AbiNoHyb_gtDP3to100.fTag.ExcHet.atom_Q20_noMiss.noC.IDs.pop",
           pops = unique(mf$V1),
           maxmiss = 0,minmaf=0.02,maxmem=2000,
           format = "plink",
           auto_only = FALSE,
           outdir = "manySNPs_strict",
           apply_corr = FALSE)
my_f2_dir="manySNPs_strict"
f2_blocks = f2_from_precomp(my_f2_dir)

fst_dir=fst(my_f2_dir)

fst_block=fst(f2_blocks)

plot(fst_dir$est~fst_blocks$est)

When I run it using the directory, I get warnings:

Warning message:
In read_f2(dir, pops, pops2, type = type, remove_na = remove_na,  :
  Discarding 2449 block(s) due to missing values!
Discarded block(s): 1, 2, 4, 5, 10, 11, 13, [etc...]

This doesn't make sense as my plink file has no missing data.

What gives?

p.s. This is a truly brilliant package :)

graph_plusone() error: Error in degree(graph, mode = "in") : Not a graph object

Hello,

I'm not sure what is going on here with this error. I'm simply trying to run the example code in the documentation:

newgraphs = graph_plusone(example_graph)

and I'm getting the above error:

Error in degree(graph, mode = "in") : Not a graph object

When I look at example_graph, I see:

# A tibble: 27 x 2
   from  to                  
   <chr> <chr>               
 1 R     Chimp.REF           
 2 R     N4N                 
 3 N0N   Altai_Neanderthal.DG
 4 N0N   N3N0                
 5 N1N   N0N                 
 6 N1N   N3N1                
 7 N2N0  N1N                 
 8 N2N0  N3N6                
 9 N2N1  Vindija.DG          
10 N2N1  N2N3                
# ... with 17 more rows

I've installed admixtools following the commands here:

devtools::[install_github](https://devtools.r-lib.org/reference/remote-reexports.html)("uqrmaie1/admixtools", dependencies = TRUE)

And I'm using RStudio v1.3, R v4.0.3 on a Windows 10 machine.

Error in running the f2_from_msprime function

I run the tutorial of the "admixture graphs" https://uqrmaie1.github.io/admixtools/articles/graphs.html
with the following code:

library(magrittr)
library(tidyverse)
library(admixtools)
# create g object from the identifiability section
g = matrix(c('R', 'O', 'R', 'n1', 'n1', 'n4', 'n1', 'n3',
           'n3', 'a', 'n3', 'C', 'n4', 'a', 'n4', 'D', 'a', 'A'), ncol=2, byrow = T) %>%
  edges_to_igraph()
# run simulation
example_f2sim1 = f2_from_msprime(g, nsnps = 1e5)

The above code resulted in the following error

ℹ Simulating data...
ℹ Reading data...
Error in anygeno_to_aftable(pref, inds = inds, pops = pops, format = format,  : 
  Genotype files not found!
In addition: Warning message:
In max(file.info(paste0(pref, ".geno"))$size, file.info(paste0(pref,  :
  no non-missing arguments to max; returning -Inf

I'm stuck at this step of the tutorial, could you help me with what I'm missing here?

qpgraph

Hello
When I click on qpgraph, the graph is not drawn
But if I click on Evaluate graph, a graph is drawn ?
Warning in f2_from_precomp(global$countdir, inds = inds, pops = pops, apply_corr = input$f2corr, : Some f2-statistic estimates are negative across blocks. This is probably caused by too many missing or rare SNPs in populations with low sample size, which makes the f2 bias correction unreliable. Consider running 'f2_from_precomp' with 'afprod = TRUE'. [1] "get f2 done" [1] "dashboardbody" [1] "expanded!" [1] "qpgraph" [1] "switch away from data" [1] "switch to qpGraph" [1] "observe global$graph" [1] "global$bod change detected!" [1] "graphplot" [1] "plotly_graph" [1] "get_pdat" [1] "get_fit" [1] "navbar detected" [1] "f2_options" [1] "navbar selected" [1] "f2_options" [1] "navbar:" [1] "f2_options" [1] "multiprocess" [1] "dashboardbody" [1] "expanded!" [1] "data" [1] "navbar == data!" [1] "load_data" [1] "global$bod change detected!" Warning: The 'plotly_click' event tied a source ID of 'src' is not registered. In order to obtain this event data, please add event_register(p, 'plotly_click') to the plot (p`) that you wish to obtain event data from.
[1] "get_fit"
[1] "qpgraphfun"
[1] "qpgraphfun: 10 1e-04 FALSE 1"
[1] "add_graph_to_history"
[1] "graphplot"
[1] "plotly_graph"
[1] "get_pdat"
[1] "get_pdat2"
[1] "init"
[1] "plotly_graph 2"
[1] "plotly_graph 3"
[1] "plotly_graph 4"
[1] "plotly_graph 5"
[1] "graphplot2"

evaluation nested too deeply: infinite recursion / options(expressions=)? in find_graphs() when using initgraph

Hi, I am trying to run find_graphs() but if I submit an initgraph, I keep getting this error

> find_graphs("./data_pops_qpGraph",initgraph=igraph_to_optimize)
i Reading precomputed data for 13 populations...
i Reading f2 data for pair 91 out of 91...
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
In addition: Warning message:
In read_f2(dir, pops, pops2, type = type, remove_na = remove_na,  :
  Discarding 1 block(s) due to missing values!
Discarded block(s): 25

Why is this happening? :(

Thanks!

graph_minusplus() error

Hello,

After running find_graphs() and extracting my winner result as shown in the tutorial, I'm trying to run the graph_minusplus() function, but I'm running into an error that I do not know how to troubleshoot. Here are my commands:

winner_graph <- winner$edges[[1]] %>% select(from, to)
winner_igraph <- edges_to_igraph(winner_graph)

newgraphs <- graph_minusplus(winner_igraph)
Error in `slice_sample()`:
! Problem while computing indices.
Caused by error in `sample.int()`:
! vector size cannot be infinite

I've tried figuring out what I have done wrong, but I'm lost. I'm running this in RStudio v2022.02.3 Build 492 with R v4.0.3 on Windows 10.

what is the output of compare_fits and what does it mean?

Hey,

I tried using Admixtools2 and I used the function compare_fits to test which qpgraph is a betterfit for my data.
I used it in the following way:
graph1=winner$edges[[1]]
graph2=winner2$edges[[1]]
fits = qpgraph_resample_multi(f2_blocks, list(graph1, graph2), nboot = 100)
compare_fits(fits[[1]]$score_test, fits[[2]]$score_test)

I got the output:

$diff
[1] 69.8988

$se
[,1]
[1,] 33.6833

$z
[,1]
[1,] 2.075177

$p
[1] 0.03797017

$p_emp
[1] 0.04

$p_emp_nocorr
[1] 0.04

$ci_low
[1] 5.121868

$ci_high
[1] 144.4687

What does this mean? I am not able to find any explanation for this function the manuals. It would be very helpful if you could please explain. Thanks for so much the wonderful method.

Anubhab

Error when running eigenstrat_to_plink

Hi,

I'm trying to extract a sample from multi-sample eigenstrat files (.geno, .ind, .snp) from David Reich's lab and convert to plink with $eigenstrat_to_plink, but ran into some difficulties.

the command I use is
$eigenstrat_to_plink("full230", "CHIMP", inds=c("Chimp"))

At first it seems that it's working, but then I get an error:

ℹ full230.geno has 235 samples and 1233632 SNPs.
ℹ Reading data for 1 samples and 1233632 SNPs
ℹ Expected size of genotype data: 148 MB
1233k SNPs read...
✔ 1233632 SNPs read in total
Error in if (nrow(bim) != m) stop("Num loci in X (", m, ") and bim (", :
argument is of length zero

I'm having trouble understanding what is wrong, and it could be fixed.

Many thanks,
Chen

How to determine whether the score is reasonable in qpgraph

I wound like to know how can I make sure the score is reasonable in qpgraph.
My best score is 694.18 after run 10000 generations. Is that reliable？

Feature request: set locale before writing/reading f2-statistics

In our small but international team we keep running into an issue with locale when writing or reading precomputed f2-statistics shared with other team members.

This is because in some locales there are differences in how alphabetical sorting is done. For example, the Czech language has a letter "ch", which follows "h" in the alphabet. This causes "chimp" to be sorted after e.g. "Denisovan", while in e.g. English locale "chimp" would sort before "Denisovan". Users trying to read data produced by user with different locale get errors, similar to the following:

Error in read_f2(dir, pops, pops2, type = type, remove_na = remove_na,  :
  File /some/path/to/project/HGDP_HQLQref_9pop/Chagyrskaya/Denisova11_f2.rds not found!

(In this case f2 statistics were precomputed in an environment with Czech locale and then read in an environment with different locale afterwards. In Czech locale, Chagyrskaya would sort after Denisova11.)

My understanding is that when admixtools2 saves f2-statistics to disk, the directories for each population contain data for the following populations (in alphabetical order) while they don't contain data for populations sorted before them, so the order of reading and writing the populations really matters.

As far as I can tell, this issue applies at least to these functions:

extract_f2
f2_from_precomp
read_f2

My setup

I installed the package inside conda environment:

conda create -n admix admixtools r-base r-irkernel r-rcpp r-tidyverse r-igraph r-plotly r-devtools
source activate admix

Then inside R I used devtools to install the package itself, like this:

devtools::install_github("uqrmaie1/admixtools", dependencies = TRUE)

System info

I use elementaryOS 5.1.7, based on Ubuntu 18.04.4 LTS. I have these browsers in my PATH:

firefox
epiphany
com.github.cassidyjames.ephemeral (default)

I also have org.gnome.Epiphany installed as flatpak. Changing default browser doesn't have any effect.

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: elementary OS 5.1.7 Hera

Matrix products: default
BLAS/LAPACK: /home/jena/miniconda3/envs/admix/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=cs_CZ.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=cs_CZ.UTF-8        LC_COLLATE=cs_CZ.UTF-8    
 [5] LC_MONETARY=cs_CZ.UTF-8    LC_MESSAGES=cs_CZ.UTF-8   
 [7] LC_PAPER=cs_CZ.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] shinydashboard_0.7.1 shinyWidgets_0.5.4   shinyalert_2.0.0    
 [4] shinyFiles_0.9.0     shinyBS_0.61         shinyjs_2.0.0       
 [7] RColorBrewer_1.1-2   magrittr_2.0.1       forcats_0.5.0       
[10] stringr_1.4.0        dplyr_1.0.2          purrr_0.3.4         
[13] readr_1.4.0          tidyr_1.1.2          tibble_3.0.4        
[16] tidyverse_1.3.0      plotly_4.9.2.1       ggplot2_3.3.2       
[19] shinythemes_1.1.2    htmlwidgets_1.5.2    igraph_1.2.6        
[22] shiny_1.5.0          admixtools_2.0.0    

loaded via a namespace (and not attached):
 [1] httr_1.4.2        jsonlite_1.7.1    viridisLite_0.3.0 modelr_0.1.8     
 [5] assertthat_0.2.1  cellranger_1.1.0  pillar_1.4.7      backports_1.2.0  
 [9] glue_1.4.2        uuid_0.1-4        digest_0.6.27     promises_1.1.1   
[13] rvest_0.3.6       colorspace_2.0-0  htmltools_0.5.0   httpuv_1.5.4     
[17] pkgconfig_2.0.3   broom_0.7.2       haven_2.3.1       xtable_1.8-4     
[21] scales_1.1.1      later_1.1.0.1     generics_0.1.0    ellipsis_0.3.1   
[25] withr_2.3.0       repr_1.1.0        lazyeval_0.2.2    cli_2.2.0        
[29] crayon_1.3.4      readxl_1.3.1      mime_0.9          evaluate_0.14    
[33] ps_1.4.0          fs_1.5.0          fansi_0.4.1       xml2_1.3.2       
[37] tools_4.0.3       data.table_1.13.2 hms_0.5.3         lifecycle_0.2.0  
[41] munsell_0.5.0     reprex_0.3.0      compiler_4.0.3    rlang_0.4.9      
[45] grid_4.0.3        pbdZMQ_0.3-3.1    IRkernel_1.1.1    rstudioapi_0.13  
[49] base64enc_0.1-3   gtable_0.3.0      abind_1.4-5       DBI_1.1.0        
[53] R6_2.5.0          lubridate_1.7.9.2 fastmap_1.0.1     stringi_1.5.3    
[57] IRdisplay_0.7.0   Rcpp_1.0.5        vctrs_0.3.5       dbplyr_2.0.0     
[61] tidyselect_1.1.0

Qpgraph: opt_worst_residual

Running find_graphs with opt_worst_residual = TRUE generates this error message.

Error: Problem with mutate() column res.
i res = map(g, qpgfun).
x 'f2_blocks' and 'f3precomp' can't both be provided!

I wanted to to see if the same issue occurs with find_graphs_old but the function doesn't seem to have exported.

The order of right population impact the qpAdm result

Hi,

It is a very convenient tool to implement "rotation" strategy when running qpAdm.

I notice some detail description about the order of right population in the supplementary materials of this paper "Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture":
"In order to simplify calculations, qpAdm uses the first population that is specified in the right population list as a base for all f4-statistics that it calculates. It is therefore important to choose this population with care."

So firstly, I test the different order of right population input to qpadm() and the result do have a little difference. Then I test qpadm_rotation() and find the logic of leftright like this:

leftright=(A, B, C), rightfix=(D, E, F) #Actually I want to use the D as the first population
#So we get following models
m1: left=(A), right=(B, C, D, E, F) #different result with model left=(A), right=(D, E, F, B, C) 
m2: left=(B), right=(A, C, D, E, F)
...
m4: left=(A, B), right=(C, D, E, F)
...

Does that mean the first of right population change? If so, the result will be impacted.

Do we need to consider this order?

Best wishes.
Xiaobo

slice_sample error in find_graphs

In running find_graphs() I regularly get this slice_sample error. It is not always after the first graph, and sometimes it runs through to completion with no error with exactly the same command/f2/etc. I note that I can calculate f4 stats, use qpgraph to estimate edges/admixtures, and qpgraph_resample_multi to compare trees all with no problem with the same set of populations and f2_blocks.

> bob<-find_graphs(
+            f2_blocks,
+            max_admix=5,
+            outpop="diploperennis",
+            verbose=TRUE
+          ) %>% slice_min(score, with_ties = FALSE)
✔ 1: score 338.978	best score 338.978	none	tot 1
Error in `slice_sample()`:
! `n` must be a round number, not the number 3.5.

running find_graphs with the initgraph parameter

Hi, first of all, many thanks for developing and maintaining this software.
I have been using the find_graphs command in AdmixTools2 for analyzing population data. However, I have noticed significant score differences between repeated runs of find_graphs, and it appears that some results with lower scores align better with my expectations. I recently discovered the initgraph parameter, which I would like to use when adding additional populations to the previous best graph result. However, I encountered difficulties in saving the previous graph as a graph file. Could you please provide guidance on the command line for accomplishing this task?

Thanks.

Error in parse_qpadm_output

Hi Robert,

There is an error in the parse_qpadm_output command because the command is printing the weights again in the se column in $weights.

> test                                                                                                             
$weights
# A tibble: 3 x 6
  target         left                weight  mean    se     z
  <chr>          <chr>                <dbl> <dbl> <dbl> <dbl>
1 STUHG03645_KGP IPC                  0.522 0.522 0.522     1
2 STUHG03645_KGP Onge                 0.386 0.386 0.386     1
3 STUHG03645_KGP Central_Steppe_MLBA  0.092 0.092 0.092     1

$popdrop
# A tibble: 7 x 9
  pat      wt   dof  chisq        p   IPC  Onge Central_Steppe_MLBA feasible
  <chr> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl>               <dbl> <lgl>   
1 000       0     5   11.1 4.88e- 2 0.522 0.386               0.092 TRUE    
2 001       1     6   21.4 1.53e- 3 0.644 0.356               0     TRUE    
3 010       1     6  204.  0        1.11  0                  -0.113 FALSE   
4 100       1     6  148.  2.54e-29 0     0.632               0.368 TRUE    
5 011       2     7  219.  0        1     0                   0     TRUE    
6 101       2     7  517.  0        0     1                   0     TRUE    
7 110       2     7 1574.  0        0     0                   1     TRUE

Best

only estimated f2 in qpGraph result

Dear author,

I noticed that I do not obtain the fitted f2 values(as well as no diff and Z) in the results when running qpGraph() but only the estimated one and the standard error. On the other hand I get both fitted, estimated, se and Z in the f3 and f4; but in the early phases of "graph exploration" it is not easy to use just those to understand what needs to be corrected.
Am I missing some optional parameters to return the fitted f2 values or did you forget to implement it (the help of the function says I should get both)?

Thanks for your help,

Leonardo

Parallel version of extract_f2

Hi Robert,

I really enjoy your new implementation of admixtools! One thing I was hoping one could add is (optional) support for parallel calculation of f-statistics, since that can take a very long time. The attached changes seem to work fine, and for my test case sped up calculations from ~a day to 20min; but some more work would be required to make this dependency optional. Any thoughts on this?

Ben

[16:59:53@bio39:admixtools]: [master] git diff R/fstats.R
diff --git a/R/fstats.R b/R/fstats.R
index af51d96..07c900e 100644
--- a/R/fstats.R
+++ b/R/fstats.R
@@ -34,7 +34,7 @@
 afs_to_f2_blocks = function(afdat, maxmem = 8000, blgsize = 0.05,
                             pops1 = NULL, pops2 = NULL, outpop = NULL, outdir = NULL,
                             overwrite = FALSE, afprod = TRUE, fst = TRUE, poly_only = c('f2'),
-                            apply_corr = apply_corr, verbose = TRUE) {
+                            apply_corr = apply_corr, verbose = TRUE, n_cores=1) {
 
   # splits afmat into blocks by column, computes snp blocks on each pair of population blocks,
   #   and combines into 3d array
@@ -52,7 +52,7 @@ afs_to_f2_blocks = function(afdat, maxmem = 8000, blgsize = 0.05,
   if(verbose) alert_info(paste0('Allele frequency matrix for ', nrow(afmat), ' SNPs and ',
                          length(pops), ' populations is ', round(mem/1e6), ' MB\n'))
 
-  chunks = make_chunks(pops, mem, maxmem, pops1, pops2, verbose = verbose)
+  chunks = make_chunks(pops, mem, maxmem/n_cores, pops1, pops2, verbose = verbose)
   popvecs1 = chunks$popvecs1
   popvecs2 = chunks$popvecs2
 
@@ -100,7 +100,10 @@ afs_to_f2_blocks = function(afdat, maxmem = 8000, blgsize = 0.05,
     block_lengths_fst = block_lengths_n
   }
 
-  for(i in 1:length(popvecs1)) {
+  library(doParallel)
+  registerDoParallel(n_cores)
+  foreach(i=1:length(popvecs1)) %dopar% {
     if(length(popvecs1) > 1 & verbose) cat(paste0('\rpop pair block ', i, ' out of ', length(popvecs1))
)
     s1 = popvecs1[[i]]
     s2 = popvecs2[[i]]

diff --git a/R/io.R b/R/io.R
index 2860df1..0937f6b 100644
--- a/R/io.R
+++ b/R/io.R
@@ -1021,7 +1021,7 @@ extract_f2 = function(pref, outdir, inds = NULL, pops = NULL, blgsize = 0.05, ma
                       transitions = TRUE, transversions = TRUE,
                       auto_only = TRUE, keepsnps = NULL, overwrite = FALSE, format = NULL,
                       adjust_pseudohaploid = TRUE, cols_per_chunk = NULL, fst = TRUE, afprod = TRUE,
-                      poly_only = c('f2'), apply_corr = TRUE, verbose = TRUE) {
+                      poly_only = c('f2'), apply_corr = TRUE, verbose = TRUE, n_cores=1) {
 
   if(!is.null(cols_per_chunk)) {
     stopifnot(is.null(pops2))
@@ -1056,7 +1056,7 @@ extract_f2 = function(pref, outdir, inds = NULL, pops = NULL, blgsize = 0.05, ma
   arrs = afs_to_f2_blocks(afdat, outdir = outdir, overwrite = overwrite, maxmem = maxmem, poly_only = poly_only,
                           pops1 = pops, pops2 = pops2, outpop = if(outpop_scale) outpop else NULL,
                           blgsize = blgsize, afprod = afprod, fst = fst, apply_corr = apply_corr,
-                          verbose = verbose)
+                          verbose = verbose, n_cores=n_cores)
 
   if(is.null(outdir)) return(arrs)

Stop quickly when package is required but not installed.

Just a simple suggestion, e.g. in packedancestrymap_to_plink(), package {genio} is required

admixtools/R/io.R

Line 2100 in 0a7ca2f

genio::write_plink(normalizePath(outpref, mustWork = F),

but the user get the error that this package is not installed after having read the full data in memory (which can take time).

It would be useful to test if this package is installed at the very beginning of the function.

find.graphs error in 'slice_sample'

Hello,

Could you help me with resolving the following issue?

I am trying to create admixture graphs using the following commands on a >50K biallelic SNP dataset:

extract_f2(pref="snpdata","admixtoolsfolder",maxmiss=0.1,overwrite=TRUE,auto_only=FALSE)
f2_blocks <- f2_from_precomp("admixtools2folder")
find_graphs(data=f2_blocks,numadmix=0)

However, when running the find_graphs function, I encounter the following error:

√ 1: score 7183.76 best score 7183.76 none tot 1
√ 2: score 6500.704 best score 6500.704 rearrange_negadmix3 tot 9
√ 3: score 6427.452 best score 6427.452 swap_negdrift tot 19
√ 4: score 6427.451 best score 6427.451 rearrange_negdrift tot 25
Error in `slice_sample()`:
! Problem while computing indices.
Caused by error in `sample.int()`:
! cannot take a sample larger than the population when 'replace = FALSE'

Backtrace:

global findgraph(nadmixture = 3)
base::local(...)
base::eval.parent(substitute(eval(quote(expr), envir)))
[ base::eval(...) ] with 1 more call
[ base::eval(...) ] with 1 more call
dplyr:::sample_int(n, size(n), replace = replace, wt = weight_by)
base::sample.int(n, size, prob = wt, replace = replace)
Run rlang::last_trace() to see the full context.

I tried removing certain populations, and setting different values for the numadmix parameter, but the error is persistent, sometimes occurring after a few steps only (like in the example above), sometimes after many steps (i.e, >30).

Thanks in advance for your help!

Best,

Menno

Question: The difference of qpAdm/qpWave model fitting between Admixtools 1 and 2.

Hello. My name is Donghee Kim. I’m a phD student of Jeong lab, of which the project instructor is Choongwon Jeong, in Seoul National University.
I have a question about the difference of qpAdm/qpWave model fitting between Admixtools 1 and 2.

I wanted to make a comparison with previous f4 and qpAdm results made by Admixtools1.
First, I modified fstatistics calculation option (poly_only=F, jacknife block=5e6) for making the result same between Admixtools 1 and 2. Then I got a same f4 results with previous ones. However, chi-square values of qpAdm/qpWave model fitting are still different from each other.
I read the manual and paper of Admixtools 1 and 2 but couldn’t find any difference for qpAdm/qpWave modelling mechanism. I guess maybe that R packages for estimating error matrix E could make a difference.

To summarize, I wonder what makes the subtle difference of the qpAdm/qpWave model fit between Admixtools 1 and 2. Also, I wonder if there are any options to make the model fit the same for comparisons with previous results.

Thank you.

Sincerely,
Donghee kim.

qpfstats error

Hello,

I am getting the following error when running qpfstats:

f2_blocks = qpfstats(genotype_data,mypops)
ℹ Reading metadata...
ℹ Computing block lengths for 1099436 SNPs...
ℹ Computing 55 f4-statistics for block 706 out of 706...
ℹ Constructing matrix...
Error in chr_as_locations():
! Can't subset columns that don't exist.
✖ Column name doesn't exist.
Run rlang::last_error() to see where the error occurred.
rlang::last_error()
<error/vctrs_error_subscript_oob>
Error in chr_as_locations():
! Can't subset columns that don't exist.
✖ Column name doesn't exist.

Backtrace:

admixtools::qpfstats(genotype_data, mypops)
tidyr:::pivot_wider.data.frame(., c(1:4), block, values_from = est)
tidyr::build_wider_spec(...)
tidyselect::eval_select(enquo(names_from), data)
tidyselect:::eval_select_impl(...)
tidyselect:::vars_select_eval(...)
tidyselect:::walk_data_tree(expr, data_mask, context_mask, error_call)
tidyselect:::as_indices_sel_impl(...)
tidyselect:::as_indices_impl(x, vars, call = call, strict = strict)
tidyselect:::chr_as_locations(x, vars, call = call)
Run rlang::last_trace() to see the full context.

The error might be in the pivot_wider() function below, or maybe the f4blockdat does not match the format expected by pivot_wider().

ymat = f4blockdat %>%
select(pop1:pop4, block, est) %>%
pivot_wider(c(1:4), block, values_from = est) %>% select(-1:-4) %>% as.matrix

Thanks.

What should I do with "The 'plotly_click' event tied a source ID of 'src' is not registered"?

After I entered "run_shiny_admixtools()" and got admixtools 2 on my web browser. On the R console, there is a message saying "Warning: The 'plotly_click' event tied a source ID of 'src' is not registered. In order to obtain this event data, please add event_register(p, 'plotly_click') to the plot (p) that you wish to obtain event data from", as shown below. What should I do to fix this?

Thank you very much,

Siripong

######################################################

run_shiny_admixtools()

Listening on http://127.0.0.1:7621
Warning: Good news!
You don't need to call useShinyalert() anymore. Please remove this line from your code.
If you really want to pre-load {shinyalert} to the UI for any reason, use:
useShinyalert(force = TRUE)
[1] "navbar detected"
[1] "f2_options"
[1] "navbar selected"
[1] "f2_options"
[1] "navbar:"
[1] "f2_options"
[1] "multiprocess"
[1] "dashboardbody"
[1] "expanded!"
[1] "data"
[1] "navbar == data!"
[1] "load_data"
[1] "global$bod change detected!"
Warning: The 'plotly_click' event tied a source ID of 'src' is not registered. In order to obtain this event data, please add event_register(p, 'plotly_click') to the plot (p) that you wish to obtain event data from.

######################################################

error when running extract_f2 and qpfstats set as TRUE

Getting this error when running extract_f2 with qpfstats as TRUE.

i Reading metadata...
i Computing block lengths for 589494 SNPs...
i Computing 3081 f4-statistics for block 549 out of 549...
i Constructing matrix...
i Running regression...
NULL
Warning messages:
1: In extract_f2("SriLanka_HONaka_ALG_posrefine.nohardfiltering.norm.onlybiallelicsnps_forqpgraph",  :
  Most `extract_f2` options will be ignored when using the `qpfstats` option!
2: In is.character(pop1) && file.exists(pop1) :
  'length(x) = 13 > 1' in coercion to 'logical(1)'
3: In is.character(pop1) && file.exists(pop1) :
  'length(x) = 13 > 1' in coercion to 'logical(1)'
4: Specifying the `id_cols` argument by position was deprecated in tidyr 1.3.0.
i Please explicitly name `id_cols`, like `id_cols = 1:4`.
i The deprecated feature was likely used in the admixtools package.
  Please report the issue at <https://github.com/uqrmaie1/admixtools/issues>.

How to constrain find_graphs search?

Hello
I have a set of populations related to wheat, some are domesticated and some not. Therefore, I am almost cretin which populations should be with early split and donors of admixture. However, when I run:
opt_results = find_graphs('./MANCH',initgraph = g, max_admix=3, stop_gen = 100)
using a graph (g)

that I constructed manually based on prior knowledge, the graph with the lowest score is placing the ancestral populations (based on other solid data) at the bottom of the graph like this.

Is there a way I constrain find_graphs to fit prior knowledge?
Does the results that I am getting now suggest that my data may have some bias?

Wish you happy holidays
Hanan

admixtools GUI

Hello
A gray screen appears when running and no files can be added

`> library("admixtools")

run_shiny_admixtools()
Loading required package: shiny

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:

decompose, spectrum

The following object is masked from ‘package:base’:

union

Loading required package: ggplot2

Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

last_plot

The following object is masked from ‘package:igraph’:

groups

The following object is masked from ‘package:stats’:

filter

The following object is masked from ‘package:graphics’:

layout

-- Attaching packages --------------------------------------- tidyverse 1.3.1 --
<U+221A> tibble 3.1.3 <U+221A> dplyr 1.0.7
<U+221A> tidyr 1.1.3 <U+221A> stringr 1.4.0
<U+221A> readr 2.0.1 <U+221A> forcats 0.5.1
<U+221A> purrr 0.3.4
Warning: package ‘tibble’ was built under R version 4.2.0
Warning: package ‘readr’ was built under R version 4.1.1
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
x purrr::compose() masks igraph::compose()
x tidyr::crossing() masks igraph::crossing()
x dplyr::filter() masks plotly::filter(), stats::filter()
x dplyr::groups() masks plotly::groups(), igraph::groups()
x dplyr::lag() masks stats::lag()
x purrr::simplify() masks igraph::simplify()

Attaching package: ‘magrittr’

The following object is masked from ‘package:purrr’:

set_names

The following object is masked from ‘package:tidyr’:

extract

Find out advanced usage of shinyjs:
https://deanattali.com/shinyjs/advanced

Attaching package: ‘shinyjs’

The following object is masked from ‘package:shiny’:

runExample

The following objects are masked from ‘package:methods’:

removeClass, show

Attaching package: ‘shinyalert’

The following object is masked from ‘package:shinyBS’:

closeAlert

The following object is masked from ‘package:shinyjs’:

runExample

The following object is masked from ‘package:shiny’:

runExample

Attaching package: ‘shinyWidgets’

The following object is masked from ‘package:shinyjs’:

alert

Attaching package: ‘shinydashboard’

The following object is masked from ‘package:graphics’:

box

Listening on http://127.0.0.1:7731
Warning: Error in loadNamespace: there is no package called ‘DT’
55:
Error in loadNamespace(x) : there is no package called ‘DT’
[1] "navbar detected"
[1] "f2_options"
[1] "navbar selected"
[1] "f2_options"
[1] "navbar:"
[1] "f2_options"
[1] "multiprocess"
[1] "dashboardbody"
[1] "expanded!"
[1] "data"
[1] "navbar == data!"
[1] "load_data"
[1] "global$bod change detected!"
Warning: The 'plotly_click' event tied a source ID of 'src' is not registered. In order to obtain this event data, please add event_register(p, 'plotly_click') to the plot (p) that you wish to obtain event data from.

Errors when reject_f4z applied in find_graphs

Hi,

When the reject_f4z option was applied in find_graphs, I encountered the following error:

find_graphs(f2_blocks, numadmix=5, outpop="Out", stop_gen=200, stop_gen2=30, reject_f4z=3)
✔ 1: score 7065.869 best score 7065.869 none tot 1
Error: Repaired names have length 4 instead of length 8.

Is there something wrong with my arguments?

Thanks!

find_graphs eror

I am able to create and test graphs fine, but when I try to use the find_graphs function I get an error

mypops = c('Pop1', 'Pop2', 'Pop3', 'Pop4')
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, outpop = Pop4)
newgraph = random_admixturegraph(mypops, numadmix = 0, outpop = Pop4)
qpg_results = qpgraph(f2_blocks, newgraph)
qpg_results$score

This all works fine up until now, but when I run:

find_graphs(f2_blocks, out = Pop4)

I get the following error that keeps repeating in the console until R crashes:

"the condition has length >1 and only the first element will be usedthe condition has length >1 and only the first element will be usedthe condition has length >1 and only the first element will be usedthe condition has length >1 and only the first element will be used..."

Update tutorial page 'parallelization'

Hi,

First of all, many thanks for developing and maintaining this software.

I wanted to use batchtools to run multiple find_graphs(). However, it seems that the find_graphs() function used in the example is now called find_graphs_old(). I realized this after getting to know how to work with batchtools, and now it turns out I cannot use the function the tutorial page refers to.

I will try to implement paralellization outside of R, but it would be considerate to edit/take offline that page of tutorial in the meantime.

Many thanks.

Feasibility inconsistency between qpadm_rotate with full_results and without full_results

I have run qpadm_rotate with full_results TRUE and FALSE. After inspecting the results, I get the following output:

full_results = FALSE

1 <chr [4]> <chr [17]> 3 13 16.9 0.204 TRUE

full_results = TRUE
1 0000 0 13 16.9 2.04e- 1 3 0.179 2.57 -1.73 -0.0248 FALSE NA

obviously, the four weights are 0.179 2.57 -1.73 and -0.0248.
Thus, results are not feasible, I guess.
However, when full_results = FALSE, the report is TRUE for feasible.

Counterintuitive sign for z-scores in qpgraph

Hello,

The sign of the z-scores reported by qpgraph() for the various f-stats seems to be incorrect. Generally z-scores are defined as (observed - expected) / se. It appears that in the output of running qpgraph that the z-scores have the opposite sign, as they seem to be: (fit - est) / se. This is in contrast to the output of e.g., f4 where the z-score is just est/se.

Thanks!

extarct_f2_subset does not copy block_lengths file

Hello
I am running:
extract_f2('./SAR_90K','./f2',overwrite = T)
extract_f2_subset('./f2', './f2sub', c('2','3','5','8','9'), verbose = TRUE)
and then:
f2_blocks = f2_from_precomp('./f2sub')
I am getting this error:

Error in read_f2(dir, pops, pops2, type = type, remove_na = remove_na, :
block_lengths file not found. Please run extract_f2() again, as the file format has recently changed.

I see in the f2 directory 3 rds files that are not present in the f2sub directory.
Can i just copy them? or should they be subsetted by the function?
Please help
Thank you
Hanan

Outgroup f3 statistics not working with pseudohaploidized data.

Hi!

I'm running some Ougroup f3 comparisons on a dataset of modern and ancient data. Ancient data, being of varying quality and coverage, has been pseudohaploidized from the start. To match this, I also haploidized the modern genotypes of our reference samples. The issue is that when I run f3(out,pop1,pop2) with the completely pseudohaploid dataset all runs fail (nas in all comparisons). However, when I use the dataset with only ancient pseudohaploid samples it works.

A solution to run Outgroup f3 stats we found is to tun f4(out,pop1;out,pop2), and that works with the completely pseudohaploidized dataset, but I wanted to let you know this issue exist.

Best,
Pedro.

Suggestion: Running extract_f2 for a very large dataset (600 ind. x 18,000,000 snp)

Dear Robert,

I like to extract f2 data from my large genetic data for 9 populations. So far, I am using 8 cpus and not finishing yet. Do you have any suggestions for a run to finish at the quickest time?

Thank you,
Weerachai

saves the image in low resolution

Hello
Dear Maier
This option saves the image in low resolution
Can it be stored at a higher resolution?

Distances between populations.

Good day!
I have a question regarding Admixtools 2. I am interested in the following: how to calculate the genetic distance between populations or individuals so that this distance is linear and relatively additive and proportional to the modulus of the allele frequency difference? For example, we have three populations a, b and c. Suppose the distances between these populations are Dab, Dbc and Dac respectively. By linearity, I also mean that each of these distances does not exceed the sum of the other two. For example: Dac <= (Dab + Dbc). By proportionality I mean that, for example, if the modulus of the difference in allele frequencies between a and c is twice as large as the modulus of the difference between a and b, then the distance ratio should be 2:1.
How to achieve this? If I understand correctly, the Fst is non-linear. Also non-linear is simple f2 (f2(A,B)=(∑ (aj-bj)2 )/M). Maybe something like this is needed?:
(∑|aj-bj| )/M
That is, instead of squares, use modules, as it is used in linear deviation? Are there any corresponding functions in Admixtools 2? Is it possible that I am wrong and other formulas are used for this?

Best regards

Warning: solve(): system is singular

Hello,

There are various instances while running the qpadm() function on Admixtools2 wherein the R console shows me warnings like these, but continues processing and delivers the output as required.

warning: solve(): system is singular (rcond: 2.94665e-17); attempting approx solution

warning: solve(): system is singular (rcond: 3.35258e-17); attempting approx solution

warning: solve(): system is singular (rcond: 3.31962e-17); attempting approx solution

Many times, I do not get such warnings.

What is the reason for this, and how does this impact the validity of the output?

Thank you.

no DLL was created

Trying to compile the latest version results in the following error message.

C:/rtools40/mingw32/bin/../lib/gcc/i686-w64-mingw32/8.3.0/../../../../i686-w64-mingw32/bin/ld.exe: RcppExports.o:RcppExports.cpp:(.text+0xcbf): undefined reference to `timesTwo(Rcpp::Vector<14, Rcpp::PreserveStorage>)'
collect2.exe: error: ld returned 1 exit status
no DLL was created
ERROR: compilation failed for package 'admixtools'

block_size_issues

Hi,
I am exploring this useful package admixtools2, I have a problem/doubt when I run extract_f2 all work well, but when I read my genotype files with f2_from_geno the number of blocks is just one. I have used blgsize in the two ways: 0.05 and 4000000. But the number of blocks is equal to 1. So, I can work with one block, I think that something that I am doing is wrong. Please, I will appreciate any advice.

genotype_data = "/media/hd1/danielrivadeneira/UCES_PhD/6-riv-snps/30_admix2/admix2.all.fs.bi.het.md100.mss75.ld50_10_05.rename.sort"
f2_dir = "/media/hd1/danielrivadeneira/UCES_PhD/6-riv-snps/30_admix2/fstats/"
extract_f2(genotype_data, f2_dir,minac2 = 2, adjust_pseudohaploid = T)
Reading allele frequencies from EIGENSTRAT files...
admix2.all.fs.bi.het.md100.mss75.ld50_10_05.rename.sort.geno has 41 samples and 23574 SNPs
Calculating allele frequencies from 41 samples in 7 populations
! 22916 SNPs remain after filtering. 22916 are polymorphic.
Allele frequency matrix for 22916 SNPs and 7 populations is 4 MB
Computing pairwise f2 for all SNPs and population pairs requires 51 MB RAM without splitting
omputing without splitting since 51 < 8000 (maxmem)...
Data written to /media/hd1/danielrivadeneira/UCES_PhD/6-riv-snps/30_admix2/fstats/
f2_blocks = f2_from_precomp(f2_dir)
Reading precomputed data for 7 populations...
Reading f2 data for pair 28 out of 28...
dim(f2_blocks)
[1] 7 7 1

install

Optimization unsuccessful (constraints inconsistent). Try increasing 'diag', when running qpgraph with a graph and upper and lower bounds

I am getting this error when running qpgraph() and using a graph file with lower and upper bounds.

> qpgraph("./data_pops_qpGraph",graph=graph_nobounds)
Error in multistart(parmat, optimweightsfun, args = arglist, method = "L-BFGS-B",  : 
  Optimization unsuccessful (constraints inconsistent). Try increasing 'diag'.
In addition: Warning message:
In read_f2(dir, pops, pops2, type = type, remove_na = remove_na,  :
  Discarding 1 block(s) due to missing values!
Discarded block(s): 25

This is part of my graph file

from	to	lower	upper
R	N1	178	178
R	N2	178	178
N2	N3	72	72
N2	N4	130	130
N3	Altai_Neanderthal_snpAD.DG	139	139
N4	Denisova3_snpAD.DG	103	103
N21	N4N21	NA	NA
N4	N4N21	0.04	0.04
N1	Mbuti.DG	0	0
N1	N5	78	78
N5	N6	2	2
N5	N3N5	0.92	0.92
N3	N3N5	NA	NA

Thanks a lot for your help!

uqrmaie1 / admixtools Goto Github PK

admixtools's People

Stargazers

Watchers

Forkers

admixtools's Issues

Suggested solution

Disclaimer

Further notes

My setup

System info

Recommend Projects

Recommend Topics

Recommend Org