The rcpi from nanxstats

Issue with the convMolFormat function

Hello,
I used many time the convMolFormat function with great success. Thank you again for this useful package.
Right now, I am using it with a bench of input (10000 mol file) coming from a commercial predictive in-silico tool from Bruker. I wanted to generate a smile table to match the smile for further comparison with other data.
However, in some point (after 520 loops), I get the message "Too many open files". So I tried the common advice given in some forum which is closeAllConnections(). It seams that is not where came from the problem. I check with showConnections(all=TRUE) and only 0,1,2 which are standard connections are open.

I will really appreciate any idea to debug this.

Below the dummy code to see the problem if necessary

Thank you very much

Boris

## get file path
  fns <- list.files(fdir[i],pattern=".mol$",full.names = TRUE)  

for (j in 1:length(fns)) # mol loop
      {
      # read mol file or other drawing file
      convMolFormat(infile= fns[j], outfile= 'temp.smi'
                    , from='mol', to='smiles')
      # read smile text
      t.smile <- readMolFromSmi(smifile='temp.smi', type = "text")
## then I put t.smile in a data frame to latter save it
}

getFASTAFromUniProt and getSeqFromUniProt errors

id = c('P00750', 'P00751', 'P00752')

getFASTAFromUniProt(id) 
#gives "" "" ""

from R documentations of getFASTAFromUniProt does not run properly. the second line gives an empty character string, which will result in errors in other functions dependent on getFASTAFromUniProt such as getSeqFromUniProt.

Also,

id = c('P00750', 'P00751', 'P00752')

getSeqFromUniProt(id)
#ERROR

leads to error.

(Error in FUN(X[[i]], ...) : no line starting with a > character found)

which is obviously resulted from problems in getFASTAFromUniProt.

It seems that getURLAsynchronous in the implementation of getFASTAFromUniProt is causing the problem somehow. It's better to replace it with some equivalent base-R function.

extractDrugVABC(molecules, silent = TRUE) getting errors

I want to calculate volume of molecules . Its not working. and giving NA values for all.

I am trying to get molecular volume. But I am getting all values as NA from your package.Is there any way to fix it?

#biomedR/Rcpi
mols <- parse.smiles(pep[1:30,SMILES])
dat = extrDrugVABC(mols)
#head(dat)
head(dat)
VABC
NC@@(C)C(=O)NC@@(C)C(=O)NC@@(C)C(=O)NC@@(CC(=O)N)C(=O)NC@@(CCC(=O)N)C(=O)NC@@(CCSC)C(=O)NC@@(CCCNC(=N)N)C(=O)O NA
NC@@(C)C(=O)NC@@(C)C(=O)NC@@(CC(=O)O)C(=O)NC@@(CC(=O)O)C(=O)NC@@(C@(O)C)C(=O)NC@@(CC(=CN2)C1=C2C=CC=C1)C(=O)NC@@(CCC(=O)O)C(=O)N1C@@(CCC1)C(=O)NC@@(Cc1ccccc1)C(=O)NC@@(C)C(=O)NC@@(CO)C(=O)NCC(=O)NC@@(CCCCN)C(=O)O NA
NC@@(C)C(=O)NC@@(C)C(=O)NC@@(Cc1ccccc1)C(=O)NCC(=O)NC@@(CCC(=O)N)C(=O)NCC(=O)NC@@(CO)C(=O)NCC(=O)N1C@@(CCC1)C(=O)NC@@(C@(CC)C)C(=O)NC@@(CCSC)C(=O)NC@@(CC(C)C)C(=O)NC@@(CC(=O)O)C(=O)NC@@(CCC(=O)O)C(=O)NC@@(C(C)C)C(=O)NC@@(CCC(=O)N)C(=O)NC@@(CS)C(=O)NC@@(C@(O)C)C(=O)NCC(=O)NC@@(C@(O)C)C(=O)NC@@(CCC(=O)O)C(=O)NC@@(C)C(=O)NC@@(CO)C(=O)NC@@(CC(C)C)C(=O)NC@@(C)C(=O)NC@@(CC(=O)O)C(=O)NC@@(CS)C(=O)NC@@(CCCCN)C(=O)O NA
NC@@(C)C(=O)NC@@(C)C(=O)NC@@(Cc1ccccc1)C(=O)NC@@(C@(O)C)C(=O)NC@@(CCC(=O)O)C(=O)NC@@(CS)C(=O)NC@@(CS)C(=O)NC@@(CCC(=O)N)C(=O)NC@@(C)C(=O)NC@@(C)C(=O)NC@@(CC(=O)O)C(=O)NC@@(CCCCN)C(=O)O NA
NC@@(C)C(=O)NC@@(C)C(=O)NC@@(Cc1ccccc1)C(=O)NC@@(C@(O)C)C(=O)NC@@(CCC(=O)O)C(=O)NC@@(CS)C(=O)NC@@(CS)C(=O)NC@@(CCC(=O)N)C(=O)NC@@(C)C(=O)NC@@(C)C(=O)NC@@(CC(=O)O)C(=O)NC@@(CCCCN)C(=O)NC@@(C)C(=O)NC@@(C)C(=O)NC@@(CS)C(=O)NC@@(CC(C)C)C(=O)NC@@(CC(C)C)C(=O)N1C@@(CCC1)C(=O)NC@@(CCCCN)C(=O)O NA
NC@@(C)C(=O)NC@@(C)C(=O)NC@@(C@(CC)C)C(=O)NC@@(CCC(=O)N)C(=O)NC@@(C)C(=O)NC@@(CC(C)C)C(=O)NC@@(CCCNC(=N)N)C(=O)O NA

Returns empty string when using getSmiFromPubChem

Hi,

the function getSmiFromPubChem returns an empty string ("") when the id is a singular string, but when the id is a list of string, it works.

error with tutorial

Hi, I am learning the Rcpi package, I was successfully able to install and run the package, and I attempting to replicate section 3.4 (Structure-Based Chemical Similarity Searching), I have loaded mol, and moldb files, DB00530.sdf and tyrphostin.sdf respectively, but when I run drug similarity search using code from the tutorial:

rank1 = searchDrug(
mol, moldb, cores = 4, method = "fp",
fptype = "maccs", fpsim = "tanimoto")

I encounter the error:

Error in order(..., decreasing = decreasing) :
unimplemented type 'list' in 'orderVector1'

I did some searching but I don't understand why this error occurs, any help would be appreciated, thanks

extractDrugOBFP4 or similar, consumes too much RAM

Hi there,
I'm trying to calculate fingerprints for ~50,000 molecules. However, I notice that the RAM usage only increases, to the point of completely depleting it. I don't understand how it is possible given that the matrix created by the function extractDrugOBFP4 to store the fingerprints is previously created, with the correct dimensions. Upon review, the size of the matrix is constant (~1.6gb) in each loop, however, the RAM usage by the R session increases as the loop continues. Furthermore, the process is sequential, molecule by molecule, which should not increase RAM usage.
This is the code of the function, and the section of the function that increases RAM usage. I know that this is the problematic section because when I change it to any vector of size 512 (not the fingerprint returned by ChemmineOB), the process does not consume more ram.
I'm not R expert, any help will be useful.
Thanks, and sorry about my english.

function (molecules, type = c("smile", "sdf")) 
{
  check_ob()
  if (type == "smile") {
    if (length(molecules) == 1L) {
      molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', molecules, identity)"))
      fp = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
    }
    else if (length(molecules) > 1L) {
      fp = matrix(0L, nrow = length(molecules), ncol = 512L)
      for (i in 1:length(molecules)) {
        molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', molecules[i], identity)"))
###########################################################
####### This is the step which increases RAM usage in each loop step
        fp[i, ] = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
###########################################################
      }
    }
  }
  else if (type == "sdf") {
    smi = eval(parse(text = "ChemmineOB::convertFormat(from = 'SDF', to = 'SMILES', source = molecules)"))
    smiclean = strsplit(smi, "\\t.*?\\n")[[1]]
    if (length(smiclean) == 1L) {
      molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', smiclean, identity)"))
      fp = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
    }
    else if (length(smiclean) > 1L) {
      fp = matrix(0L, nrow = length(smiclean), ncol = 512L)
      for (i in 1:length(smiclean)) {
        molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', smiclean[i], identity)"))
        fp[i, ] = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
      }
    }
  }
  else {
    stop("Molecule type must be \"smile\" or \"sdf\"")
  }
  return(fp)
}

Error when training three classification models

Hi,
I'm trying to run the whole example script.
Unfortunately, I stacked on the step where we train three classification models
After I run the command:

svm.fit1 <- train(
x1.tr, y.tr,
method = "svmRadial", trControl = ctrl,
metric = "ROC", preProc = c("center", "scale")
)

I get this error message:
Error: Please use column names for x

I'm quite new in programming and I don't know how to resolve this problem.
Can you help me? I will be grateful.

Best regards,
Arek

Some SMILEs crash the entire R

Some SMILEs break extractDrugLongestAliphaticChain

library(rcdk)
library(Rcpi)
library(magrittr)
"[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%
     parse.smiles() %>% .[[1]] %>%
    extractDrugLongestAliphaticChain()

#> Error: segfault from C stack overflow

Then, if you don't run extractDrugLongestAliphaticChain but run with other random Rcpi functions, the entire R session crashes

"[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%
     parse.smiles() %>% .[[1]] %>%
     extractDrugXLogP()

 *** caught segfault ***
address 0x311000006, cause 'memory not mapped'

Traceback:
 1: .jcheck()
 2: .jcall(dval, "Lorg/openscience/cdk/qsar/result/IDescriptorResult;",     "getValue")
 3: FUN(X[[i]], ...)
 4: lapply(descvals, .get.desc.values, nexpected = length(dnames))
 5: eval.desc(molecules, "org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor",     verbose = !silent)
 6: extractDrugXLogP(.)
 7: "[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%     parse.smiles() %>% .[[1]] %>% extractDrugXLogP()

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

The first issue may be CDK java issue, but can we do something in the second case to prevent R crash?

failed to set 'from' and 'to' formats: inchi inchikey

Hi Nan,
I come across a problem when using convertMolFormat Function. This function work well when using example data. But the errors occured when I hope to convet from inchi to inchikey. The file was attached. test.zip

The code I used:

convMolFormat(infile = './test.inchi', outfile = 'test.inchikey', from = 'inchi', to = 'inchikey')

The error:

Error in ChemmineOB::convertFormatFile(from = from, to = to, fromFile = infile,  : 
  failed to set 'from' and 'to' formats: inchi inchikey

I would appeciate a lot for your kindness helps

Best,

Zhiwei

Rcpi extractDrugAIO take long time

Dear Nan:
I am trying with Rcpi to calculate descriptors for my 160 small molecules.
However, my R seems like hanging there. I checked your manual and I did use the 3D structures.
Your test dataset of OptAA3d.sdf seems no problem.
Shall I do further clean up of structures? I prepared my sd file from ChemFinder and convert to 3D using chemAxon.
Please kindly suggest,
Xiannghui

Error on Rcpi package

Hi

I got this error when I tied to rerun your tutorial on Rcpi

drugseq <- getSmiFromKEGG(drugid, parallel = 5)
java.lang.NullPointerException
at org.guha.rcdk.util.Misc.loadMolecules(Misc.java:169)
Error in load.molecules(tmpfile) :
org.openscience.cdk.exception.CDKException: java.lang.NullPointerException

Error in documentation

In the documentation, under
3.1 Regression Modeling in QSRR Study of Retention Indices

There appears to be an error in the following code:

library("Rcpi")

RI.smi = system.file(
"vignettedata/FDAMDD.smi", package = "Rcpi")
RI.csv = system.file(
"vignettedata/RI.csv", package = "Rcpi")

Shouldn't the first file to be loaded be RI.smi instead of FDAMDD.smi? The train step does not work otherwise.

Best wishes.

getSmiFromPubChem error

Hi, thank you for the package. However I encountered an issue as follows:

library("Rcpi")
id = c('7847562', '7847563') # Penicillamine
getSmiFromPubChem(id)

and I got this error
Error in FUN(X[[i]], ...) : argument 'x' must be a raw vector

May I clarify?

Error in library(Rcpi)

I installed Rcpi with the following:
source("http://bioconductor.org/biocLite.R")
biocLite("Rcpi")
and I get error there is no package called ‘Rcpi’??

getFASTAFromUniProt(id)

The example returns an empty character string

id = c('P00750', 'P00751', 'P00752')

getFASTAFromUniProt(id)
[1] "" "" ""

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252

attached base packages:
[1] grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] Rcpi_1.17.2 bindrcpp_0.2.2 knitr_1.20 gridExtra_2.3 umap_0.2.0.0 Rtsne_0.15
[7] forcats_0.3.0 stringr_1.3.1 dplyr_0.7.8 purrr_0.2.5 readr_1.1.1 tidyr_0.8.2
[13] tibble_1.4.2 ggplot2_3.1.0 tidyverse_1.2.1

loaded via a namespace (and not attached):
[1] nlme_3.1-137 bitops_1.0-6 lubridate_1.7.4 bit64_0.9-7
[5] doParallel_1.0.14 httr_1.3.1 rprojroot_1.3-2 tools_3.5.1
[9] backports_1.1.2 R6_2.3.0 DT_0.5 DBI_1.0.0
[13] lazyeval_0.2.1 BiocGenerics_0.28.0 colorspace_1.3-2 withr_2.1.2
[17] tidyselect_0.2.5 bit_1.1-14 compiler_3.5.1 cli_1.0.1
[21] rvest_0.3.2 Biobase_2.42.0 xml2_1.2.0 labeling_0.3
[25] scales_1.0.0 digest_0.6.18 rmarkdown_1.10 XVector_0.22.0
[29] base64enc_0.1-3 pkgconfig_2.0.2 htmltools_0.3.6 itertools_0.1-3
[33] highr_0.7 htmlwidgets_1.3 rlang_0.3.0.1 readxl_1.1.0
[37] rstudioapi_0.8 RSQLite_2.1.1 bindr_0.1.1 jsonlite_1.5
[41] GOSemSim_2.8.0 RCurl_1.95-4.11 magrittr_1.5 GO.db_3.7.0
[45] Matrix_1.2-14 Rcpp_1.0.0 munsell_0.5.0 S4Vectors_0.20.1
[49] reticulate_1.10 stringi_1.2.4 yaml_2.2.0 zlibbioc_1.28.0
[53] plyr_1.8.4 blob_1.1.1 parallel_3.5.1 crayon_1.3.4
[57] rcdklibs_2.0 lattice_0.20-35 Biostrings_2.50.1 haven_1.1.2
[61] hms_0.4.2 pillar_1.3.0 rjson_0.2.20 codetools_0.2-15
[65] reshape2_1.4.3 stats4_3.5.1 ChemmineR_3.34.1 rcdk_3.4.7.1
[69] glue_1.3.0 evaluate_0.12 modelr_0.1.2 foreach_1.4.4
[73] png_0.1-7 cellranger_1.1.0 gtable_0.2.0 assertthat_0.2.0
[77] broom_0.5.0 rsvg_1.3 rJava_0.9-10 fingerprint_3.5.7
[81] iterators_1.0.10 AnnotationDbi_1.44.0 memoise_1.1.0 IRanges_2.16.0
[85] fmcsR_1.24.0

extractDrug...(mol) does _not_ work with one-liner .smi files containing one SMILES

RI2.smi contains only one SMILE (the first line of vignettedata/RI.smi):
line 1: CCCCCCCCCCCCCCCCCCCCCCC

We got:

smi = system.file('vignettedata/RI2.smi', package = 'Rcpi')
mol = readMolFromSmi(smi, type = 'mol')
fp = extractDrugKR(mol)
Error in get.fingerprint(molecules, type = "kr", verbose = !silent) :
Must supply an IAtomContainer or something coercable to it

Env:
R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Platform: x86_64-pc-linux-gnu (64-bit)

nanxstats / rcpi Goto Github PK

rcpi's People

Contributors

Stargazers

Watchers

Forkers

rcpi's Issues

Recommend Projects

Recommend Topics

Recommend Org