Code Monkey home page Code Monkey logo

monetdblite-r's Introduction

This repository has been archived. Please see MonetDB/e (code examples here) for the replacement project.

MonetDBLite for R

Build Status Build Status CRAN_Status_Badge

MonetDBLite for R is a SQL database that runs inside the R environment for statistical computing and does not require the installation of any external software. MonetDBLite is based on free and open-source MonetDB, a product of the Centrum Wiskunde & Informatica.

MonetDBLite is similar in functionality to RSQLite, but typically completes queries blazingly fast due to its columnar storage architecture and bulk query processing model. Since both of these embedded SQL options rely on the the R DBI interface, the conversion of legacy RSQLite project syntax over to MonetDBLite code should be a cinch.

MonetDBLite works seamlessly with the dplyr grammar of data manipulation. For a detailed tutorial of how to work with database-backed dplyr commands, see the dplyr databases vignette. To reproduce this vignette using MonetDBLite rather than RSQLite, simply replace the functions ending with *_sqlite with the suffix *_monetdblite instead.

Installation

  • the latest released version from CRAN with

    install.packages("MonetDBLite")
  • the latest development version from github using devtools

    devtools::install_github("hannesmuehleisen/MonetDBLite-R")
    

If you encounter a bug, please file a minimal reproducible example on github. For questions and other discussion, please use stack overflow with the tag monetdblite. The development version of MonetDBLite endures sisyphean perpetual testing on both unix and windows machines.

Speed Comparisons

MonetDBLite outperforms all other SQL databases currently accessible by the R language and ranks competitively among other High Performace Computing options available to R users. For more detail, see Szilard Pafka's benchmarks.

Painless Startup

If you want to store a database permanently (or to reconnect to a previously-initiated one), set the dbdir to some folder path on your local machine. A new database that you would like to store permanently should be directed to an empty folder:

library(DBI)
dbdir <- "C:/path/to/database_directory"
con <- dbConnect(MonetDBLite::MonetDBLite(), dbdir)

To create a temporary server, create a DBI connection as follows:

library(DBI)
con <- dbConnect(MonetDBLite::MonetDBLite())

Note that the above temporary server command is equivalent to initiating the server in the tempdir() of your R session:

library(DBI)
dbdir <- tempdir()
con <- dbConnect(MonetDBLite::MonetDBLite(), dbdir)

Note that MonetDB may hiccup when using network drives, use servers stored on the same machine as the R session.

Versatile Data Importation

To efficiently copy a data.frame object into a table within the MonetDBLite database, use dbWriteTable:

# directly copy a data.frame object to a table within the database
dbWriteTable(con, "mtcars", mtcars)

To load a CSV file into a table within the database, provide the local file path of a .csv file to dbWriteTable:

# construct an example CSV file on the local disk
csvfile <- tempfile()
write.csv(mtcars, csvfile, row.names = FALSE)

# directly copy a csv file to a table within the database
dbWriteTable(con, "mtcars2", csvfile)

# append the same table to the bottom of the previous table
dbWriteTable(con, "mtcars2", csvfile, append=TRUE)

# overwrite the table with a new table
dbWriteTable(con, "mtcars2", csvfile, overwrite=TRUE)

The SQL interface of MonetDBLite can also be used to manually create a table and import data:

# construct an example CSV file on the local disk
csvfile <- tempfile()
write.csv(mtcars, csvfile, row.names = FALSE)

# start a SQL transaction
dbBegin(con)

# construct an empty table within the database, using a manually-defined structure
dbSendQuery(con, "CREATE TABLE mtcars3 (mpg DOUBLE PRECISION, cyl INTEGER, disp DOUBLE PRECISION, hp INTEGER, drat DOUBLE PRECISION, wt DOUBLE PRECISION, qsec DOUBLE PRECISION, vs INTEGER, am INTEGER, gear INTEGER, carb INTEGER)")

# copy the contents of a CSV file into the database, using the MonetDB COPY INTO command
dbSendQuery(con, paste0("COPY OFFSET 2 INTO mtcars3 FROM '", csvfile, "' USING DELIMITERS ',','\n','\"' NULL as ''"))

# finalize the SQL transaction
dbCommit(con)

Note how we wrap the two commands in a transaction using dbBegin and dbCommit. This creates all-or-nothing semantics. See the MonetDB documentation for details on how to create a table and how to perform bulk input.

Reading and Writing (Queries and Updates)

This section reviews how to pass SQL queries to an embedded server session and then pull those results into R. If you are interested in learning SQL syntax, perhaps review the w3schools SQL tutorial or the MonetDB SQL Reference Manual.

The dbGetQuery function sends a SELECT statement to the server, then returns the result as a data.frame:

# calculate the average miles per gallon, grouped by number of cylinders
dbGetQuery(con, "SELECT cyl, AVG(mpg) FROM mtcars GROUP BY cyl" )

# calculate the number of records in the _mtcars_ table
dbGetQuery(con, "SELECT COUNT(*) FROM mtcars" )

The dbSendQuery function can open a connection to some read-only query. Once initiated, the res object below can then be accessed repeatedly with a fetch command:

res <- dbSendQuery(con, "SELECT wt, gear FROM mtcars")
first_sixteen_records <- fetch(res, n=16)
dbHasCompleted(res)
second_sixteen_records <- fetch(res, n=16)
dbHasCompleted(res)
dbClearResult(res)

The dbSendQuery function should also be used to make edits to tables within the database:

# add a new column of kilometers per liter
dbSendQuery(con, "ALTER TABLE mtcars ADD COLUMN kpl DOUBLE PRECISION" )

# populate that new column with kilometers per liter
dbSendQuery(con, "UPDATE mtcars SET kpl = mpg * 0.425144" )

Glamorous Data Export

The contents of an entire table within the database can be transferred to an R data.frame object with dbReadTable. Since MonetDBLite is most useful for the storage and analysis of large datasets, there might be limited utility to copying an entire table into working RAM in R. The dbReadTable function and a SQL SELECT * FROM tablename command are equivalent:

# directly copy a table within the database to an R data.frame object
x <- dbReadTable(con, "mtcars")

# directly copy a table within the database to an R data.frame object
y <- dbGetQuery(con, "SELECT * FROM mtcars" )

Special database informational functions

Certain administrative commands can be sent using either dbSendQuery or with a custom DBI function:

# remove the table `mtcars2` from the database
dbSendQuery(con, "DROP TABLE mtcars2" )

# remove the table `mtcars3` from the database
dbRemoveTable(con, "mtcars3" )

Other administrative commands can be sent using dbGetQuery or with a custom DBI function:

# list the column names of the mtcars table within the database
names(dbGetQuery(con, "SELECT * FROM mtcars LIMIT 1" ))

# list the column names of the mtcars table within the database
dbListFields(con, "mtcars" )

Still other administrative commands are much easier to simply use the custom DBI function:

# print the names of all tables within the current database
dbListTables(con)

Shutdown

MonetDBLite allows multiple concurrent connections to a single database, but does not allow more than one concurrent embedded server session (actively-running database). This is not an issue for most users since a single database can store thousands of individual tables. To switch between databases, however, the first server must be shut down before the second can be opened. To shutdown a server, include the shutdown=TRUE parameter:

dbDisconnect(con, shutdown=TRUE)

To globally shut down the embedded server session without the con connection object, use:

MonetDBLite::monetdblite_shutdown()

MonetDBLite does not allow multiple R sessions to connect to a single database concurrently. As soon as a single R session loads an embedded server, that server is locked down and inaccessible to other R consoles.

monetdblite-r's People

Contributors

ajdamico avatar gergness avatar hannes avatar kutsurak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monetdblite-r's Issues

Fix valgrind

==39140== Conditional jump or move depends on uninitialised value(s)
==39140==    at 0x504A73: checkValues (svn/R-devel/src/main/logic.c:424)
==39140==    by 0x504A73: do_logic3 (svn/R-devel/src/main/logic.c:480)
==39140==    by 0x4CB5F7: bcEval (svn/R-devel/src/main/eval.c:6736)
==39140==    by 0x4D9E3F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==39140==    by 0x4DB72E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==39140==    by 0x515B87: dispatchMethod.isra.3 (svn/R-devel/src/main/objects.c:419)
==39140==    by 0x515E7D: Rf_usemethod (svn/R-devel/src/main/objects.c:455)
==39140==    by 0x5162E0: do_usemethod (svn/R-devel/src/main/objects.c:535)
==39140==    by 0x4CB42F: bcEval (svn/R-devel/src/main/eval.c:6756)
==39140==    by 0x4D9E3F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==39140==    by 0x4DB72E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==39140==    by 0x4D9FCC: Rf_eval (svn/R-devel/src/main/eval.c:747)
==39140==    by 0x4DDE4F: do_set (svn/R-devel/src/main/eval.c:2745)
==39140==  Uninitialised value was created by a client request
==39140==    at 0x5113D7: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2797)
==39140==    by 0x16950942: monetdb_r_dressup (packages/tests-vg/MonetDBLite/src/embeddedr/converters.c.h:147)
==39140==    by 0x169528CA: bat_to_sexp (packages/tests-vg/MonetDBLite/src/embeddedr/converters.c.h:236)
==39140==    by 0x16953549: monetdb_query_R (packages/tests-vg/MonetDBLite/src/embeddedr/embeddedr.c:158)
==39140==    by 0x4D0282: bcEval (svn/R-devel/src/main/eval.c:7243)
==39140==    by 0x4D9E3F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==39140==    by 0x4DB72E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==39140==    by 0x4D1E95: bcEval (svn/R-devel/src/main/eval.c:6704)
==39140==    by 0x4D9E3F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==39140==    by 0x4DB72E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==39140==    by 0x4D9FCC: Rf_eval (svn/R-devel/src/main/eval.c:747)
==39140==    by 0x4DCCBC: do_begin (svn/R-devel/src/main/eval.c:2362)

R crashing when trying to establish connection to existing database

I recently updated to R 3.5.1 and when I try to connect to my existing database it causes R to crash.

I can create a new database without a problem:

require(DBI)
require(RMySQL)
require(MonetDBLite)

test_db <- src_monetdblite("test_db_Monet", create = TRUE)

str(test_db)
List of 1
 $ con:Formal class 'MonetDBEmbeddedConnection' [package "MonetDBLite"] with 1 slot
  .. ..@ connenv:<environment: 0x7f9d6c79ab00> 
 - attr(*, "class")= chr [1:3] "src_monetdb" "src_sql" "src"

dbDisconnect(test_db$con, shutdown = TRUE)
rm(test_db)

I can reconnect to it as well without crashing:

test_db <- src_monetdblite("test_db_Monet", create = FALSE)
str(test_db)
List of 1
 $ con:Formal class 'MonetDBEmbeddedConnection' [package "MonetDBLite"] with 1 slot
  .. ..@ connenv:<environment: 0x7f9d6b30e270> 
 - attr(*, "class")= chr [1:3] "src_monetdb" "src_sql" "src"

The only thing that seems to work is to establish a connection to another MonetDBLite database first, and then try connecting to my existing database:

require(DBI)
require(RMySQL)
require(MonetDBLite)

test_db <- src_monetdblite("test_db_Monet", create = FALSE)
dbDisconnect(test_db$con, shutdown = TRUE)
rm(test_db)

statcast_db <- src_monetdblite("/Users/williampetti/statcast_database/statcast_db_Monet", create = FALSE)

dbListTables(statcast_db$con)

[1] "boxscore_gameday_info" "player_info"           "statcast_17"          
[4] "temp"                  "umpires_games"   

I should note the crash only occurs if I run the code in RStudio or from the R GUI. If I source a file, for example, the issue does not seem to occur.

The old database was created with a prior version of MonetDBLite and R.

Any idea what may be causing this?

dplyr::union with mismatch variables

Hi,

I found a minor issue, when taking the union of two tables where variables are not present in one of the two tables using dplyr::union, an error occurs when rendering the query:

suppressPackageStartupMessages({
  require(DBI)
  require(MonetDBLite)
  require(dplyr)
})
conn <- dbConnect(MonetDBLite(), ":memory:")
dbWriteTable(conn = conn, name = "x", value = data.frame(x = 1))
dbWriteTable(conn = conn, name = "y", value = data.frame(y = 1))
union(tbl(conn, "x"), tbl(conn, "y")) %>% show_query

#> Error: Cannot pass NA to dbQuoteIdentifier()

I would expect a similar result as when using RSQLite, namely,

suppressPackageStartupMessages({
  require(DBI)
  require(RSQLite)
  require(dplyr)
})
conn2 <- dbConnect(SQLite(), ":memory:")
dbWriteTable(conn = conn2, name = "x", value = data.frame(x = 1))
dbWriteTable(conn = conn2, name = "y", value = data.frame(y = 1))
union(tbl(conn2 , "x"), tbl(conn2 , "y")) %>% show_query
#> <SQL>
#> SELECT `x`, NULL AS `y`
#> FROM (SELECT *
#> FROM `x`)
#> UNION
#> SELECT NULL AS `x`, `y`
#> FROM (SELECT *
#> FROM `y`)

The issue seems to occur because

dplyr::sql_escape_ident(conn, NA_character_)
#> Error: Cannot pass NA to dbQuoteIdentifier()

whereas

dplyr::sql_escape_ident(conn2, NA_character_)
#> [1] "NULL"

Concurrent read access by R processes

Hello,

At the end of the monetDB lite page I read:

"MonetDBLite does not allow multiple R sessions to connect to a single database concurrently. As soon as a single R session loads an embedded server, that server is locked down and inaccessible to other R consoles."

I was wondering what is the technical reason for this limitation and whether this limitation could be removed by, e.g., having the accessed tables be read-only. The reason I am interested is that monetDB with concurrent read access by R processes would be a killer "frontend db" for R powered web apps using, e.g., openCPU, which allows for api calls to an R package which are served by R processes (it would be useful for shiny too, of course). Any thoughts?

Riccardo.

[request] support for integer64 from bit64 package

Would be great to have support for mapping of R's bit64::integer64 to monetdb BIGINT. (I believe this is quite simple since bit64::integer64 is just long long and I'm pretty sure same as BIGINT)

I can't find a solution how to keep integer64. Is there any workaround?

Weird arithmetic (types?)

The following code

library(dplyr)
dbdir <- file.path(tempdir(), "dplyrdir")
my_db <- MonetDBLite::src_monetdblite(dbdir)
my_iris  <- copy_to(my_db, iris)
my_iris <- mutate(my_iris, set=ifelse(Species=="setosa",1,0))
my_iris %>% mutate(x= (set-.3)/987654321) %>% summarise(min(x),max(x)) %>% collect()

gives zero for the maximum and minimum (the entire column x is equal to zero). Since both set and x show up as double when my_iris is printed, x should be non-zero floating point numbers.

I suspect this is related to something I don't have a minimal example of, where I'm getting math overflow errors in summing a (quite long) column of numbers that should also be double.

import txt format data

I noticed that MonetDBLite can only import csv file data, could you add support for txt (fixed width and tab delimited) format data?

compiler error installing MonetDBlite 0.5.0 on Ubuntu 17.10

Hi,

I upgraded Ubuntu to 17.10 recently and installed MonetDBLite 0.4.1 with no problem.
Today, attempting to install 0.5.0 from within RStudio I get a compiler error:

> install.packages("MonetDBLite")
Installing package into ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/MonetDBLite_0.5.0.tar.gz'
Content type 'application/x-gzip' length 1381343 bytes (1.3 MB)
==================================================
downloaded 1.3 MB

* installing *source* package ‘MonetDBLite’ ...
** package ‘MonetDBLite’ successfully unpacked and MD5 sums checked
** libs
gcc -std=gnu99  -g -O2 -fdebug-prefix-map=/build/r-base-8tMiyU/r-base-3.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -I/usr/share/R/include -DNDEBUG -fpic -Wno-format -DLIBGDK -DLIBMAL -DLIBOPTIMIZER -DLIBSTREAM -DHAVE_EMBEDDED_R -DMONETDBLITE_COMPILE -Imonetdblite/src/ -Imonetdblite/src/common -Imonetdblite/src/embedded -Imonetdblite/src/gdk -Imonetdblite/src/mal/mal -Imonetdblite/src/mal/modules -Imonetdblite/src/mal/optimizer -Imonetdblite/src/mal/sqlbackend -Imonetdblite/src/sql/include -Imonetdblite/src/sql/common -Imonetdblite/src/sql/server -Imonetdblite/src/sql/storage -Imonetdblite/src/sql/storage/bat -MMD -MF build/deps/monetdblite/common/stream.d -c monetdblite/src/common/stream.c -o build/objects/monetdblite/common/stream.o
cc1: error: -Wformat-security ignored without -Wformat [-Werror=format-security]
cc1: some warnings being treated as errors
Makevars:194: recipe for target 'build/objects/monetdblite/common/stream.o' failed
make: *** [build/objects/monetdblite/common/stream.o] Error 1
ERROR: compilation failed for package ‘MonetDBLite’
* removing ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.4/MonetDBLite’
* restoring previous ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.4/MonetDBLite’
Warning in install.packages :
  installation of package ‘MonetDBLite’ had non-zero exit status

The downloaded source packages are in
	‘/tmp/Rtmp7rYzho/downloaded_packages’

I don't have a Makevars file, so presumably the compiler flags are the defaults for installation. I have so far avoided tinkering with the compiler flags as I don't know what the maintainers would regard as essential.

Is this something the maintainers would care about or more likely something idiosyncratic about my set-up?

Install dev version from github fails

Tried to install dev version from here by following the instructions, seems like something is missing when cloning the repo.

davidjankoski@davidjankoski-ubuntu:~⟫ git clone https://github.com/hannesmuehleisen/MonetDBLite-R.git --depth 1 --recursive

Cloning into 'MonetDBLite-R'...
remote: Counting objects: 53, done.
remote: Compressing objects: 100% (52/52), done.
remote: Total 53 (delta 0), reused 39 (delta 0), pack-reused 0
Unpacking objects: 100% (53/53), done.
Submodule 'src/monetdblite' (https://github.com/hannesmuehleisen/MonetDBLite-C) registered for path 'src/monetdblite'
Cloning into '/home/davidjankoski/MonetDBLite-R/src/monetdblite'...
remote: Counting objects: 474, done.
remote: Compressing objects: 100% (432/432), done.
remote: Total 474 (delta 90), reused 128 (delta 41), pack-reused 0
Receiving objects: 100% (474/474), 2.24 MiB | 718.00 KiB/s, done.
Resolving deltas: 100% (90/90), done.
error: no such remote ref 86c9c223716fedb4545ee1edc58fc2a92cd63de3
Fetched in submodule path 'src/monetdblite', but it did not contain 86c9c223716fedb4545ee1edc58fc2a92cd63de3. Direct fetching of that commit failed.

davidjankoski@davidjankoski-ubuntu:~⟫ R CMD INSTALL MonetDBLite-R

* installing to library ‘/home/davidjankoski/R/x86_64-pc-linux-gnu-library/3.4’
* installing *source* package ‘MonetDBLite’ ...
** libs
make: *** No rule to make target 'build/debug/monetdblite/common/stream.o', needed by 'MonetDBLite.so'.  Stop.
ERROR: compilation failed for package ‘MonetDBLite’
* removing ‘/home/davidjankoski/R/x86_64-pc-linux-gnu-library/3.4/MonetDBLite’
* restoring previous ‘/home/davidjankoski/R/x86_64-pc-linux-gnu-library/3.4/MonetDBLite’

thanks. david.

MonetDBLite installation

Hi all,

I'm trying to install monetdblite in a fresh Linux Mint machine following your pointers at https://www.monetdb.org/blog/monetdblite-r, but lots of issues came up.
I actually solved them, but I'm not sure it was the best solution. I'll sum them up next.
$ sudo -i R

Fresh MonetDB-Lite install with the following errors:

install.packages("MonetDBLite")
---- error: "zutil.h:21:22: fatal error: string.h: No such file or directory"
According to link: https://stackoverflow.com/questions/25184167/
It appears to be missing stdlib.h as well as stdio.h
These are standard C headers, that are found in the libc package.

SOLUTION:
$ sudo apt-get install libc6-dev

back to R
$ sudo -i R

install.packages("dplyr")
---- Error in library(dplyr) : there is no package called ‘dplyr’

Issue solved in: tidyverse/dplyr#2261

install.packages("devtools")
install.packages("Rcpp")
---- ERROR: compilation failed for package ‘Rcpp’ (no g++ at this point. Remember the fresh install)

$ apt-get install zlib1g-dev libssl-dev libssh2-1-dev libcurl4-openssl-dev g++

back to R
$ sudo -i R

install.packages("dplyr")

Now I figured that you meant "dbplyr", instead of "dplyr" when I got this:

ms <- MonetDBLite::src_monetdblite(dbdir)
---- Error in loadNamespace(name) : there is no package called ‘dbplyr’
install.packages("dbplyr")

Now, the example works at monetdblite-r link

Hope this helps you improving monetdb (great project, btw).

cheers,
Eduardo

string manipulations

Monetdblight would be even greater if it would support string manipulations like replacements out of the box. You have implemented a function for the LIKE operator without pcre, how about implementing string replacements as with REPLACE? That would be very handy for ad hoc data cleaning of larger datasets. Now, you would have to setup monetdb with pcre / an r string replacement function.

Data set is getting corrupted when put in MonetDB

I'm trying to put a really large dataset of tweets into MonetDB using MonetDBLite. I have already preprocessed the data and I have over 500M rows and 123 columns, which are a mixture of string, double, logical and date columns. The data are stored in chunks in .rds format which row_bind() just fine, indicating colnames and col_types are uniform. However, when I put the data in MonetDB, I realised it gets corrupted. For instance, id_str are tweet IDs in string format and they should have same nchar() across all tweets. When I count the nchars() before putting the data in MonetDB, it works just fine

> temp_data %>%
+   select(id_str) %>%
+   mutate(id_nchar=nchar(id_str)) %>%
+   count(id_nchar)
# A tibble: 1 x 2
  id_nchar     n
     <int> <int>
1       18 34403

However, when I try the same after putting the data in MonetDB, data gets corrupted.


dbdir <- "~/data/monetdb_identity_dataset" #directory of the Monet database, should be empty
con <- DBI::dbConnect(MonetDBLite::MonetDBLite(), dbdir)
dbWriteTable(con, "identity_dataset",temp_data, append=T)

> dplyr::tbl(con, "identity_dataset") %>%
+   select(id_str) %>%
+   mutate(id_nchar=nchar(id_str)) %>%
+   count(id_nchar)
# Source:   lazy query [?? x 2]
# Database: MonetDBEmbeddedConnection
   id_nchar     n
      <int> <dbl>
 1       18 32245
 2        0  2088
 3       NA    62
 4        9     1
 5       15     1
 6        2     1
 7        8     1
 8       14     1
 9        1     1
10        7     1

Seeing this, I tried decided to check the id_str to see if everything is OK but unfortunately, I get this, which is super weird:


> dplyr::tbl(con, "identity_dataset") %>%
+   select(id_str) %>%
+   mutate(id_nchar=nchar(id_str)) %>%
+   arrange(id_nchar)
# Source:     lazy query [?? x 2]
# Database:   MonetDBEmbeddedConnection
# Ordered by: id_nchar
   id_str     id_nchar
   <chr>         <int>
 1 "\xe0\x8e"       NA
 2 "\xe0\x8e"       NA
 3 "\xe0\x8e"       NA
 4 "\xe0\x8e"       NA
 5 "\xe0\x8e"       NA
 6 "\xe0\x8e"       NA
 7 "\xe0\x8e"       NA
 8 "\xe0\x8e"       NA
 9 "\xe0\x8e"       NA
10 "\xe0\x8e"       NA
# ... with more rows

This is very interesting as I checked the source .rds files and these weird characters in id_str column certainly do not appear in the source data. They only appear after I put the data in MonetDB and query.

Is there anything I can do to debug this problem? There are certain columns (like tweet_text, user_description) which can contain newlines or commas, emojis etc and I am not sure if MonetDB's handling these special characters might be causing the issue or not (drawing on my prior experience with multiple csv parsers). Or could this be an encoding issue. I really like to use MonetDB for this project and hope there is a simple solution I might be missing.

Adding sessioninfo::session_info()


> sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 3.5.1 (2018-07-02)
 os       Red Hat Enterprise Linux Server 7.4 (Maipo)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_GB.UTF-8
 ctype    en_GB.UTF-8
 tz       Europe/London
 date     2018-12-29

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source
 assertthat    0.2.0   2017-04-11 [2] CRAN (R 3.5.1)
 bindr         0.1.1   2018-03-13 [2] CRAN (R 3.5.1)
 bindrcpp    * 0.2.2   2018-03-29 [2] CRAN (R 3.5.1)
 cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.1)
 codetools     0.2-16  2018-12-24 [1] CRAN (R 3.5.1)
 crayon        1.3.4   2017-09-16 [2] CRAN (R 3.5.1)
 DBI         * 1.0.0   2018-05-02 [2] CRAN (R 3.5.1)
 dbplyr        1.2.2   2018-07-25 [1] CRAN (R 3.5.1)
 digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)
 dplyr       * 0.7.8   2018-11-10 [1] CRAN (R 3.5.1)
 fansi         0.4.0   2018-10-05 [1] CRAN (R 3.5.1)
 furrr       * 0.1.0   2018-05-16 [1] CRAN (R 3.5.1)
 future      * 1.10.0  2018-10-17 [1] CRAN (R 3.5.1)
 globals       0.12.4  2018-10-11 [1] CRAN (R 3.5.1)
 glue          1.3.0   2018-07-17 [2] CRAN (R 3.5.1)
 hms           0.4.2   2018-03-10 [1] CRAN (R 3.5.1)
 janitor       1.1.1   2018-07-31 [1] CRAN (R 3.5.1)
 listenv       0.7.0   2018-01-21 [1] CRAN (R 3.5.1)
 lubridate   * 1.7.4   2018-04-11 [1] CRAN (R 3.5.1)
 magrittr      1.5     2014-11-22 [2] CRAN (R 3.5.1)
 MonetDBLite * 0.6.0   2018-07-27 [1] CRAN (R 3.5.1)
 pillar        1.3.1   2018-12-15 [1] CRAN (R 3.5.1)
 pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.5.1)
 purrr         0.2.5   2018-05-29 [2] CRAN (R 3.5.1)
 R6            2.3.0   2018-10-04 [1] CRAN (R 3.5.1)
 Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.5.1)
 readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.5.1)
 rlang         0.3.0.1 2018-10-25 [1] CRAN (R 3.5.1)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)
 snakecase     0.9.2   2018-08-14 [1] CRAN (R 3.5.1)
 stringi       1.2.4   2018-07-20 [1] CRAN (R 3.5.1)
 stringr     * 1.3.1   2018-05-10 [2] CRAN (R 3.5.1)
 tibble        1.4.2   2018-01-22 [2] CRAN (R 3.5.1)
 tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.5.1)
 utf8          1.1.4   2018-05-24 [2] CRAN (R 3.5.1)
 withr         2.1.2   2018-03-15 [2] CRAN (R 3.5.1)

The limit of data size

I am writing a data I am retrieving from an remote Postgresql server to MonetDBLite. I got the error message: Error: cannot allocate vector of size 712.8 Mb. I thought as long as the data is smaller than my disk, MonetDBLite will be able to handle it. Is there any limit for the size of a single data to be able to fit into MonetDBLite?

Error connecting to database when src_monetdb object exists in Global Environment

I've been very intrigued by the promise of the MonetDBLite with dplyr combo. However, problems with fatal error messages have kept me from using it more routinely. Picking it back up today, I thought I had figured out my problem and posted the following StackOverflow answer:

https://stackoverflow.com/a/53307673/3705612

To save people from needing to read that response, the error messages I receive are similar to this one:

Error in UseMethod("db_query_fields") : no applicable method for 'db_query_fields' applied to an object of class "MonetDBEmbeddedConnection"

I've seen more than just "db_query_fields" come up. For example, if I try and run a dplyr query on using the tbl object, I get an Error in UseMethod("db_sql_render") message instead.

After digging into it more, I've lost confidence in my theory that the order of package loading was the culprit. I'm now thinking the error arises simply from the fact that I load an R session with a src_monetdb object already in the Global Environment. As long as I load an R session with an empty Environment, I can load MonetDBLite before dplyr and dbplyr.

I use monet_shutdown() before restarting the R Session. Yet that doesn't seem to help. If the src_monetdb object is in the Global Environment on startup, I get the error.

So while I'm guessing this is not a bug, I was hoping to track down why this is happening. I've been using R, and very gratefully, for a few years now but the magic behind DBI , MonetDBILite, dbplyr, etc. is exactly that to me---I don't understand the intricacies of how they work. Please excuse me if I'm missing something obvious.

My session info is pasted below. I'm been using RStudio Version 1.2.1060 while investigating the issue. I also tried using the standard R Gui, making sure to save the src_monetdb object into the workspace, but could not reproduce the error there.

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dbplyr_1.2.2 dplyr_0.7.8 DBI_1.0.0
[4] MonetDBLite_0.6.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 codetools_0.2-15 crayon_1.3.4 packrat_0.4.9-3
[5] digest_0.6.18 assertthat_0.2.0 R6_2.3.0 magrittr_1.5
[9] pillar_1.3.0 rlang_0.3.0.1 rstudioapi_0.8 bindrcpp_0.2.2
[13] tools_3.5.1 glue_1.3.0 purrr_0.2.5 compiler_3.5.1
[17] pkgconfig_2.0.2 tidyselect_0.2.5 bindr_0.1.1 knitr_1.20
[21] tibble_1.4.2

grep like commands are not translated correctly

I can't get the string detection to work through dbplyr:

tbl(con,"results") %>% 
   filter(grepl("position",name))

'ParseException:SQLparser:42000!SELECT: no such binary operator 'grepl(char,clob)''

or the stringr package:

tbl(con, "results" %>% 
   filter(str_detect(pattern = "position",string = name))

'ParseException:SQLparser:42000!SELECT: no such binary operator 'instr(clob,char)''

While it does work if I create the SQL:

DBI::dbGetQuery(con,"select * from results where name like '%position%'")

Where is this translation implemented? Is this a DBI issue, dbplyr issue? Where should I look?

computer becomes unresponsive with MonetDBLite working hard

On my laptop (MacBook Air, four years old) when I execute a query in MonetDBLite (from R.app) that creates a table much larger than memory, the computer becomes unresponsive while the query executes. About half the time there's the spinning beachball cursor, the other half there's a normal cursor but other applications are extremely slow to respond (eg, typing takes multiple seconds per character).

The behaviour looks similar to what you get with memory overuse and heavy paging, but it isn't: there's still free RAM. The 'Activity Monitor' still indicates 20-40% idle CPU time, but it shows that most of the CPU time being used is in the system, not in user space.

Is there some sort of checking for interrupts that MonetDBLite could do more often, or something?

Support for compressed CSV files

Most CSV readers (notably, base R readers) support compressed (gz, bzip2...) CSV files transparently. It would be a nice addition to MonetDBLite::monetdb.read.csv, because big CSVs are commonly gzip'ed.

failed initialization

I'm trying to re-connect to a previously-initiated permanently-stored database on my local machine.

library(package = DBI)
library(package = MonetDBLite)

dbdir <- "C:/path/to/database_directory"

con <- DBI::dbConnect(drv = MonetDBLite::MonetDBLite(), dbdir)

Establishment of the con fails due to the following error:

Error in monetdb_embedded_startup(embedded, !getOption("monetdb.debug.embedded", : 
Failed to initialize embedded MonetDB !FATAL: 
Fatal error during initialization:
SQLException:SQLinit:Catalogue initialization failed

The sessionInfo is as follows:

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2      zhaoy_0.0.1       purrr_0.2.4       MonetDBLite_0.5.1 lubridate_1.7.2  
[6] dplyr_0.7.4       DBI_0.7          

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15     knitr_1.20       bindr_0.1        feather_0.3.1    magrittr_1.5     hms_0.4.1       
 [7] R6_2.2.2         rlang_0.1.6      stringr_1.3.0    tools_3.4.3      readxl_1.0.0     yaml_2.1.16     
[13] assertthat_0.2.0 digest_0.6.15    rprojroot_1.3-2  tibble_1.4.2     codetools_0.2-15 glue_1.2.0      
[19] stringi_1.1.6    cellranger_1.1.0 compiler_3.4.3   pillar_1.1.0     backports_1.1.2  pkgconfig_2.0.1

Thank you!

Option to suppress messages about reserved SQL characters?

While I appreciate the attention to detail in the messages about reserved characters needing to be quoted in SQL queries, it would be great if there was an option to suppress these messages (either via an options() setting and/or function argument).

Many R users will use MonetDBLite as a database backend with dplyr / dbplyr, which already automatically quotes reserved characters, making these messages unnecessary and potentially confusing. I'm also using MonetDBLite as a backend in a few packages (for which I have a huge Thank You! being able to access a modern, fast, column-oriented database system that supports windowing functions without any server-side installation is completely game-changing), and would rather hide these messages from users since the quoting is taken care of.

Also wondering if they should be typed as warnings instead of messages? I suppose a case could be made either way.

Anyway, thanks for considering this request, and thanks again for this excellent package. I recently read your arxiv piece which I now send to pretty much everyone.

Error messages being squelched

Hi!

I'm developing an R package that creates a MonetDBLite database of biological sequence data (https://github.com/ropensci/restez). Users can run automated functions for downloading large amounts of sequence data and store it in the database.

Recently, one user encountered the following error:

: Invalid argument
simpleError in gsub("\n", " ", res, fixed = TRUE): Unable to execute statement 'ROLLBACK'.
Server says 'SQLException:sql.trans:2DM30!ROLLBACK: not allowed in auto commit mode'.>
 
Error in gsub("\n", " ", res, fixed = TRUE) : 
  input string 1 is invalid UTF-8

The error appears to be due to an unexpected encoding for the incoming character string to gsub. I suspected it was due to foreign language symbols (the user's computer is in French). Initially, I was able to recreate the error on a Windows machine in Swedish. I was then able to fix the gsub error by re-encoding the incoming string for gsub in the monetdb_embedded_query function using iconv (see updated function below). Now when re-running the original user's code (you can recreate the error with this script), I get a message about what is exactly failing (partly in Swedish).

ftruncate: Invalid argument
Error in .local(conn, statement, ...) : 
  Unable to execute statement 'COMMIT'.
 ].ver says 'ERROR: HEAPextend: failed to extend for 05\516.theap: GDKmremap() failed ' [#OS: Enheten k‰nner inte igen kommandot.

.... which indicates that we're asking too much of our machine.

If I have diagnosed the problem correctly, could the monetdb_embedded_query function be updated to handle foreign language characters? Below is my updated version of the function, note the addition of iconv lines.

Many thanks,
Dom

monetdb_embedded_query <- function(conn, query, execute=TRUE, resultconvert=TRUE, int64=FALSE) {
  print('My function')
  if (!inherits(conn, MonetDBLite:::classname)) {
    stop("Invalid connection")
  }
  query <- as.character(query)
  if (length(query) != 1) {
    stop("Need a single query as parameter.")
  }
  if (!MonetDBLite:::monetdb_embedded_env$is_started) {
    stop("Call monetdb_embedded_startup() first")
  }
  if (MonetDBLite:::monetdb_embedded_env$started_dir != ":memory:" &&
      !dir.exists(file.path(MonetDBLite:::monetdb_embedded_env$started_dir, "bat"))) {
    stop("Someone killed all the BATs! Call Brigitte Bardot!")
  }
  execute <- as.logical(execute)
  if (length(execute) != 1) {
    stop("Need a single execute flag as parameter.")
  }
  resultconvert <- as.logical(resultconvert)
  if (length(resultconvert) != 1) {
    stop("Need a single resultconvert flag as parameter.")
  }
  int64 <- as.logical(int64)
  if (length(resultconvert) != 1) {
    stop("Need a single int64 flag as parameter.")
  }
  if (int64 && !requireNamespace("bit64", quietly = TRUE)) {
    stop("Need bit64 package for integer64 support")
  }
  
  # make sure the query is terminated
  query <- paste(query, "\n;", sep="")
  res <- .Call(MonetDBLite:::monetdb_query_R, conn, query, execute, resultconvert, interactive() &&
                 getOption("monetdb.progress", FALSE), int64)
  resp <- list()
  if (is.character(res)) { # error
    resp$type <- "!" # MSG_MESSAGE
    res <- iconv(res, to = 'UTF-8')
    resp$message <- gsub("\n", " ", res, fixed=TRUE)
  }
  if (is.numeric(res)) { # no result set, but successful
    resp$type <- 2 # Q_UPDATE
    resp$rows <- res
  }
  if (is.list(res)) {
    resp$type <- 1 # Q_TABLE
    if ("__prepare" %in% names(attributes(res))) {
      resp$type <- Q_PREPARE
      resp$prepare = attr(res, "__prepare")
      attr(res, "__prepare") <- NULL
    }
    attr(res, "row.names") <- c(NA_integer_, as.integer(-1 * attr(res, "__rows")))
    class(res) <- "data.frame"
    names(res) <- gsub("\\", "", names(res), fixed=T)
    resp$tuples <- res
  }
  resp
}

dplyr/dbplyr mutate/case_when

The code

library(DBI)
library(dplyr)
library(dbplyr)
con <- dbConnect(MonetDBLite::MonetDBLite())
#con <- dbConnect(RSQLite::SQLite())
data(iris)
names(iris)<-tolower(sub("\\.","_",names(iris)))
dbWriteTable(con,"iris_table",iris)
irisdb<-tbl(con,"iris_table")

mutate(irisdb, sepal_category = case_when(sepal_length>5.5~"long", sepal_length>5~"medium", TRUE~"short")) 

gives the error

Error: Columns `sepal_length`, `sepal_width`, `petal_length`, `petal_width`, `species` must be length 1 or 150, not 10, 10, 10, 10, 10

This could also be a dplyr/dbplyr bug, but using RSQLite instead of MonetDBLite (as with the commented-out line above) doesn't give the error, so I thought I should at least mention it here.

Insert vector from R into existing table column

I would like to use monetDB lite as an alternative to SAS for my medium data problems. To that end I need to add new columns consisting of transformations of existing columns done in R. I have searched the internet and documentation for a solution to this and have not found an answer.

I would like to do something like this:

con <- dbConnect( MonetDBLite::MonetDBLite() , ":memory:" )
dbWriteTable(con, "mtcars", mtcars)

dbSendQuery(con, "ALTER TABLE mtcars ADD COLUMN mpg_per_cyl double")
dbListFields(con, "mtcars")

To update the new column I have tried using databinding, but it seems this only works for one value:

## ?? how to populate new column from R?
insert <- dbSendQuery(con, 'PREPARE INSERT INTO mtcars (mpg_per_cyl) VALUES (?)')

> dbBind(insert, list(mpg_per_cyl=mtcars$mpg/mtcars$cyl))
Error in vapply(params, function(x) { : values must be length 1,
 but FUN(X[[1]]) result is length 32
In addition: Warning message:
In if (is.na(x)) "NULL" else if (is.numeric(x) || is.logical(x)) { :
  the condition has length > 1 and only the first element will be used

The documentation for dbSendUpdate has the only reference to placeholders in the MonetDBLite package I could find. I am a bit new to working with DBs so forgive me if I've overlooked something obvious. Does this capability already exist?

String concatenation

Does the library support any string concatenation function?
I have seen that MonetDB offer support to string concatenation methods, but I've tried using paste, paste0, str_c and cat and no one seems to be supported by the library.

As an example, using paste inside a mutate throws an error.

This is the code I'm executing:

elections <- tbl(monet_connection, 'elections') %>%
  distinct()
elections <- elections %>%
  rename(name = election_description)
elections <- elections %>%
  mutate(name = toupper(name))
elections <- elections %>%
  mutate(name = paste('test', year)) %>% 
  collect()

The query generated by explain is:

SELECT "year", PASTE('test', "year") AS "name"
FROM (SELECT "year", UPPER("name") AS "name"
FROM (SELECT "year", "election_description" AS "name"
FROM (SELECT DISTINCT *
FROM "elections") "xjjnkzodlb") "hfvbvasaux") "yhkoodzuan"

And the resulting error is

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT "year", PASTE('test', "year") AS "name" FROM (SELECT "year", UPPER("name") AS "name" FROM (SE...'. Server says 'ParseException:SQLparser:42000!SELECT: no such binary operator 'paste(char,int)''.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.