Code Monkey home page Code Monkey logo

rojure's Introduction

Welcome to rojure

About

Main difference from original rincanter is that this project does not require playing with native libraries as it uses a socket connection to the R interpreter.

Rincanter was offering translation between the incanter 1.5.6 data types (matrix,dataset). These datatype have now been moved to clojure.core.matrix (and incanter > 1.9.0 uses them internally now) and hence this project could be made independent from incanter, so I changed the name from rincanter to rojure.

Rojure can be used from the incanter irepl, GorillaRepl or the normal clojure repl. It only has depedencies towards clojure.core.matrix. (apart from JRI for R interoperability)

As the original version, it also offers translation between Clojure and R data types such as R dataframe to core.matrix.matrix.dataset.

Installation

Install R for your platform

The directions for installing R are outside the scope of this document, but R is well supported on most platforms, and has great documentation: R Project Page

Install and launch Rserve

From R execute following lines:

install.packages("Rserve")
library(Rserve)
Rserve()

Add rojure dependency to project.clj

[rojure "0.2.0"]

Example Usage

The main entry points are the functions:

For the higher level use case of dataset->dataset transformation in R, I added a function ‘r-transform-ds’, which takes a clojure dataset and an R file as input, executes the R file and returns a new dataset

r-eval

You can play around with Clojure and R in the same REPL session:

 (use '(rojure core))

;; define connection to R (needs running R with RServe started)
 (def r (get-r))

 (r-eval r "data(iris)")

 ;;eval's the iris dataframe object, converts into
 ;;incanter dataset
 (r-eval r "iris")
 
 ;;create vector on R side
 (r-eval r "vec_in_r = c(1,2,3)")
 
 ;;now retrieve it, converting to Clojure vector
 (r-get r "vec_in_r")

plotting:

(use '(rojure core))

 ;; define connection to R (needs running R with RServe started)
(def r (get-r))

(r-eval r "data(iris)")

;;initialize the R graphics device for your system:
;;For Mac OS X
(r-eval "quartz()")
;;windows: 
(r-eval "windows()")
;;unix/linux
(r-eval r "x11()")

;;create the plot using values from the iris dataset
(r-eval r "plot(Sepal.Length ~ Sepal.Width, data = iris)")
;;alter this existing plot
(r-eval r "title(main = \"Iris Sepal Measurements\")")

;; close graphic device
 (r-eval r "dev.off()")

with-r-eval

Using with-r-eval, it is even easier. Within this form, all forms enclosed in parenthesis are evaluated as normal Clojure forms, strings are evaluated in R using r-eval:

(use '(rojure core))

(with-r-eval 
  "data(iris)"

  ;;eval's the iris dataframe object, converts into
  ;;incanter dataset
  "iris"
 
  ;;create vector on R side
  "vec_in_r = c(1,2,3)"

  ;;now retrieve it, converting to Clojure vector
  (r-get "vec_in_r"))

r-transform-ds

This use-case has in mind to allow seamlessly editing of Clojure code side-by-side with R code. As the R code is in it’s own .R file, it can be edited by whatever R IDE (Emacs, Rstudio)

I assume that a lots of uses cases of integrating R into Clojure can be expressed as dataframe->dataframe transformations executed in R. I believe this is general enough to do arbitrary computations in R, the result needs just to be transformed to a data.frame at the end.

In the future version I might add a similar function for matrix->matrix transformations.

The R script executed by ‘r-transfrom-ds’ just needs to follow this conventions:

  • It need to be able to run standalone
  • It assumes that a variable in_ is present in R session (and nothing else)
  • It needs to set an variable out_ into the R session (probably at the end)

When working with the R script standalone, the user just needs to make sure that ‘in_’ is present in his development R session.

To ease debugging, the r-transform-ds function writes both R variables (“in_” and “out_”) to disk in rds format, so they can be read in the development R session easily with “readRDS(‘in_.rds’)” for inspection. This allows to keep a rather smooth work flow for working in Clojure and R together.

(use '(rojure core))
(use '(clojure.core.matrix dataset))

;; define connection to R (needs running R with RServe started)
(def r (get-r))

;; define the input ds to transform
(def ds (dataset [[1 2 3][4 5 6]]))
 
;; sent input ds to R and execute R script 
;; (R script receives ds in variable "in_" and needs to produce a variable "out_")
;; both in_ and out_ are serialised to disc, to ease debuging
(def out-ds (r-transform-ds r ds "./count.R"))

;;out-ds is an core.matrix dataset
out-ds

;;count.R looks like this:
library(tidyverse)

out_ <- in_ %>%
  count

 ;; in an separate R session the user could now test / develop the R code, by executing
 in_ <- readRDS("in_.rds")
 source("./count.R")   ;; or step interactively over the lines of the R script
 

For matrices it work in the same way:

(use '(rojure core))
(use '(clojure.core.matrix dataset))

;; define connection to R (needs running R with RServe started)
(def r (get-r))

;;define matrix to transform 
(def m (clojure.core.matrix/matrix [[1 2] [3 4]]))

;; transform matrix with R
(def eig (r-transform-ds r m "./eigen.R"))
eig 

;;eigen.R looks like this:
out_ <- eigen(in_)$vectors

Documentation

API Documentation

API Documentation for rincanter is located at: Rincanter API

rojure's People

Contributors

behrica avatar grav avatar jolby avatar svarcheg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rojure's Issues

add support for data frame based script execution

A higher level of abstraction over "R-eval" could be to execute a full ".R" file including
"input data frame" and "output data frame".

This implies that some (lots ?) of operations you want to use with R can be expressed
as "transforming a data frame A to on other data frame B, using R code".

This could be realized in a function like

(transform-ds-with-R r input-ds r-code-file)
-> returns dataset

with parameters:

  • 'r' - the r session to use
  • 'input-ds' of type clojure.core.matrix dataset - the input data
  • r code file - file to be executed by R interpreter

This simplest function, would then make 3 assumption on the R file:

  • it uses as input data set a fixed name of an R object present in the session "df_in"
  • it sets a data.frame of a fixed name: "df_out" in the r session at the end
  • it is full self contained, so can run in its own R session

Then the above function would return "df_out" and transform it back to a clojure data structure of type core.matrix.dataset

The function would do the following:

  1. start a new r interpreter
  2. transform the "input-ds" from clojure.matrix.daset to a R(REXP) data structure and sets in it the r-session
  3. "source" the R code file into the R session
  4. ... the R code does it work transforming "df_in" to "df_out"
  5. The function returns variable df_out, assuming it is a R data frame and transforming it from R(REXP)
  6. closes the R session

This is rather different the to work with function "r-eval" in an long running R session.

It has in my view a big advantage, that it allows working on the standalone R file in an other normal r session and interactively develop it, using whatever tool for editing / running the R code (emacs ess, Rstudio)

In this the user just needs to make sure, that the separate R session, has a data frame called "df_in", with (idealy) identical data then the data sent to teh file from clojure.

This synchronization can be eased a lot, by making the above function doing two additional steps:

  1. Writing the "df_in" automatically to disk (as an R "df_in.rds" file) before starting the execution of the R script
  2. Writing the out_df to an other rds file ("df_out.rds")

This can the used for debugging and should be optional for performance reasons.

This should allow in my view a rather seamless developing experience on the R and Clojure code at the same time together in he same project.

One drawback might be the costs of start and stop of the R session.
This could be mitigated by using an exiting R session and try to "clean it".

But in general this approach is not meant for "R one liners", but for data transformations which takes some seconds at least.

How this behaves on larger data frames need to be seen as well.

This approach is an alternative to "make" file based multi language data pipelines, in which the pipeline steps can be written in different languages and they communicate between each other via files on disk.
Emacs and its org-mode with "babel" has as well the concept of "blocks" of code
in different languages which can pass (untyped) data frames between each other.

This approach might be extended to work with other data structures:

  • matrix -> matrix
  • vector -> vector
  • list -> list

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.