rlesur / crrri Goto Github PK

View Code? Open in Web Editor NEW

157.0 10.0 12.0 1.19 MB

A Chrome Remote Interface written in R

Home Page: https://rlesur.github.io/crrri/

License: Other

R 100.00%

chrome-headless chrome-devtools rstats r r-package

crrri's People

Contributors

Stargazers

Watchers

Forkers

ktaranov johndharrison hrbrmstr aden2018 colinfay da505819 cderv wangbinzjcc tdeenes jimsforks mstei4176

crrri's Issues

separate functions from their R6 set assignment

From #56 (comment)

Could offer better test and easier maintenance.

No so urgent.

Separate perform_with_chrome in two functions

one for multiple functions provided and returning a list
one for one function and returning a value

Clarify generator code with templating like logic ?

I think it is not an heavy dependency and that maybe it could help clarify the generator code.
Clarifying the generation for R file could help for maintenance in the long run.
I'll try to look into it and PR something to see how it looks.

Remove run_now() in CDPSession disconnect() method

later::run_now() must be removed here (return a promise that is fulfilled when the client is disconnected):

crrri/R/CDPSession.R

Lines 256 to 261 in 260b2bb

    
           disconnect = function() { 
        
             if(self$readyState() < 2L) private$.CDPSession_con$close() 
        
             while(self$readyState() < 3L) { 
        
               later::run_now() 
        
             } 
        
           }

check if bin exist chr_init

if not, an error should be thrown as soon as we know it will fail.
Right now, it fails late

chr_connect()

throws

Cannot find the websocket entrypoint of Chrome headless.
Closing headless Chrome...
<Promise [rejected: simpleError]>
Warning messages:
1: In open.connection(con, "rb") :
  InternetOpenUrl failed: 'Impossible d'établir une connexion avec le serveur'
2: In chr_kill(chr_process, work_dir) : attempt to apply non-function
Unhandled promise error: Command not found

I would expect a nice message about google-chrome not found

Add tests in the package

Next very important task !

Add and improve manual test because it is useful to have at least those
Add non regression test where we can to have some coverage and a better fail safe.

Rethink API toward OOP ?

OOP was on the table but not chosen for this first draft. Mentioned in #1

Advantage of OOP

More secure
More suited for low-level api
Adapted to Domain and methods per domain (ex Page$captureSnapshot())

Advantage of current approach

The closest from the protocol and how it is used in JS
Better documentation in R
Easier to generate "automagically" 😉
works well with Promises and Websockets

Main Question about OOP :

How would it work with promises and websocket ?

Leaving as is for now - we'll see on the run how this first draft is doing.

add an option to deactivate chrome echo_cmd

I think this is not required in all case and it would allow to deactivate echoing during test.

We are talking about this

crrri/R/Chrome.R

Line 258 in 9337a19

tryCatch(processx::process$new(bin, chrome_args, echo_cmd = TRUE),

and not use TRUE but getOption("crrri.echo_cmd", TRUE) or a more generic crrri.verbose option.

Implement callback functions in commands

Confusion between arguments and values in events man pages ?

Hi,

When I look at the man page of a function of type "Event", such as Network.requestIntercepted, it seems to me that the elements described in Arguments are in fact the elements returned as Values. In this case, for example, the interceptionId element is returned after the event is fired, but it is not an argument of Network.requestIntercepted.

It's possible I'm completely misunderstanding, my apologies if this is the case.

Export all promises functions or Depends ?

Currently, we need systematically to do

library(crrri)
library(promises)

in order to access finally() for example, or promice_race.

crrri works heavily with promises and require this other 📦 . Should we

leave as is
make crrri depends on promises
export a few more functions we know are required for good use of crrri (like finally)

do not write work dir for chrome in R work dir?

currently work_dir is generated with a sample name in the current working directory. Is this necessary to be in the R working dir or can we move it ?

I think it would be better to put this folder, either in temporary directory for Rsession if not persistence is needed or in user apps directory (using rappdir

I think about that because after playing a few times with the package I have plenty of folder I need to remove and that are in the middle of other folder.

Temp directory would be nice but it was tried for decapitated and seems to be issue with it. see
https://github.com/hrbrmstr/decapitated#working-around-headless-chrome--os-security-restrictions
We could also borrow decapitated choice to write everything in a special folder in user home.

I can work on something with rappdir otherwise

add await function in crrri for vignettes with promises

Integrate inside the 📦 the functions await() and knit_print.promise() mentioned in #24 so that we can create some vignettes

Switch to markdown format for Rd documentation

I find it easier to write and cleaner at the end. I thought we were using it but not thoroughly in the code. I will clean that.

reject or promise_reject ?

@RLesur I do not find where reject comes from ?

crrri/R/chr_connect.R

Line 61 in 0f0b611

reject("Failed to launch Chrome.")

There is a promise_reject in promises but no more reject. Is this the same ?

I think there are some namespace missing but I am not sure.

I hope it is not obvious...

Fix ids generation for commands

When I run the example in the README:

r_project <- 
  chrome %>% 
  Page.enable() %>%
  Page.navigate(url = "https://www.r-project.org/")

The same id is used for the Page.enable and for the Page.navigate commands.

Add removeListener for event emitter

All the function for an event emitter object has not yet be implemented. Especially, removeListener.
See spec: https://nodejs.org/api/events.html#events_emitter_removelistener_eventname_listener

it is included in once() because the listener is removed at first occurrence. But when register with on(), there is no way yet to remove the listener.

Easy way :

Consider one listener per event only. Removing the listener is equivalent to removing the event.

Hard way:

Consider several listeners per event. removeListener should be able to remove the correct listener.

Ideas:

Currently callbacks objects return a rm function at registration.
This function could be kept somewhere (in a list or env) and called when needed to remove the listener.
This would require a key so that the correct listener is removed. 🤔 use digest::digest() ?

examples

myEmitter <- EventEmitter$new()
myEmitter$on('event',
    function() {
        message("an event occured")
    }
)

how to remove this listener with anonymous function ?

myEmitter$removeListener('event", function() { message("an event occured") }) ?

Only allow it with named function?

myEmitter <- EventEmitter$new()
my_fun <-  function() {
        message("an event occured")
    }
myEmitter$on('event', my_fun)

myEmitter$removeListener('event', "myfun") ?
myEmitter$removeListener('event', myfun) ?

IDEA: add a onFinally arg in hold

That would be passed to promises::finally to execute before returning the promise value

promises::then(
    promise,
    onFulfilled = function(value) {
      state$pending <- FALSE
      state$fulfilled <- TRUE
      state$value <- value
    },
    onRejected = function(error) {
      state$pending <- FALSE
      state$fulfilled <- FALSE
      state$reason <- error
    }
  ) %>%
  promises::finally(onFinally = fun)

I have thought about that while working on chrome_execute thinking that chrome could be closed at the very end with

invisible(hold(results_available, timeout = total_timeout, onFinally = ~ chrome$close()))

Thoughts ?

Can't access `.$result$root$nodeId` from `DOM.getDocument()`

The following code aims to dump the DOM of a web page :

url <- "https://www.r-project.org/"
chrome <- chr_connect(bin = "google-chrome", headless = FALSE)
gd <- chrome %>% 
  Page.enable() %>% 
  DOM.enable() %>%
  Page.navigate(url) %>% 
  Page.loadEventFired() %>%
  DOM.getDocument()

gd %>%   
  DOM.getOuterHTML(nodeId = .$result$root$nodeId)

But it fails at DOM.getOuterHTML(nodeId = .$result$root$nodeId) with Error: Invalid parameters(code -32602)

It does work when giving a nodeId directly :

gd %>%   
  DOM.getOuterHTML(nodeId = 1)

And the .$result$root$nodeId value is correct :

gd %...>% {
  print(.$result$root$nodeId)
}

Offer functions to setup and find chrome

inspired by

pagedown::find_chrome() (would remove a suggest just for that in vignette)
decapitated 📦 that offers to download a portable chrome. I use that and it works very well.

For crrri, as a low level 📦 we could also assume that installation and configuration must be done by the user without any help from crrri. Just some documentation in README.

Add a setTimeout mechanism

It seems useful to have a timeout to reject a promise. Currently, the use seems to be like in the readme example:

promise_race(
    timeout(delay),
    chrome %>% 
      Page.enable() %>%
      Page.navigate(url = url) %>% 
      Page.frameStoppedLoading(frameId = ~ .res$frameId) %>%  
      Page.printToPDF() %...T>% { 
        .$result$data %>% base64_dec() %>% writeBin(paste0(names(url), ".pdf")) 
      }
  )

we may be able to provide a wrapper function to provide this feature more easily.

support chrome devtools protocol http endpoint

the CDP allows for one websocket adress by target - a target is a new tab

This is describe in the home page of https://chromedevtools.github.io/devtools-protocol/

We should consider at some point allowing to use these endpoints to control chrome headless.

This is related to #15 where currently we can't deal with several ws enpoint.

Consider using fastmap instead of environment

https://r-lib.github.io/fastmap/

https://twitter.com/winston_chang/status/1128713388224200705

Errors for Browser domain methods

In the current version, the chr_connect() function connects to the websocket entrypoint page found at http://localhost:9222/json, see

crrri/R/chr_connect.R

Lines 241 to 246 in c873002

    
           open_debuggers <- tryCatch( 
        
             jsonlite::read_json(sprintf("http://localhost:%s/json", debug_port), simplifyVector = TRUE), 
        
             error = function(e) list() 
        
           ) 
        
           address <- open_debuggers$webSocketDebuggerUrl[open_debuggers$type == "page"]

However, methods of the Browser domain can only be sent at the browser websocket entrypoint that can be found at http://localhost:9222/json/version

I don't know whether other domains are concerned.

Add a wrapper for base64 encoded data

Currently, this is possible using jsonlite::base64_dec but you need to know how to write this

Page.printToPDF() %...T>% { 
        .$result$data %>% base64_dec() %>% writeBin(paste0(names(url), ".pdf")) 
      }

  Page.captureScreenshot(format = "png", fromSurface = TRUE) %...T>% {
    .$result$data %>% jsonlite::base64_dec() %>% writeBin("test.png")
  }

Something like crrri:::write_base64

write_base64 <- function(promise, con) {
    promise %...T>% {
        .$result$data %>% jsonlite::base64_dec() %>% writeBin(con)
    }
}

or without pipe

write_base64 <- function(promise, con) {
    data <- promise$result$data
    write <- promises::then(promise,
        raw <-  jsonlite::base64_dec(data) 
        writeBin(raw, con)
   promise::then(write, 
       function(value) promise
    )
}

It would be a side effect function, (like purrr::walk) and return the promise from the LHS.

Use purrr instead of lapply / sapply to improve readibility and stability

Change the name of the await() function

I regret the name I gave to the await() function.

In JS, await is a keyword that can only be used inside an async function (an async function is a function that always returns a promise). Using await outside an async function is illegal.
I think that makes sense.

Here, the await() function is a wrapper over later::run_now(). For instance, the httpuv package also wraps later::run_now() with the httpuv::service() function.

I think we should rename the await() function.
The only proposal I have would be hold(promise). We could add a delay argument here: hold(promise, delay = 30).

[Idea] navigateToFile

Right now the Page object allows to navigate to a url, and if you want to navigate to a file you have to Page$navigate(url = sprintf("file://%s", normalizePath(file_path)) ).

It could be nice to have a $navigateToFile method that takes a relative path, normalize it, and dooes the sprintf("file://%s").

Better Identify file created automatically

I find it difficult to identify easily which files are created by the generator and which files are for our functions. We can't use subdirectories inside the R folder so it needs to be in the name.

This is small improvement I know, but what do you think ?

We could either reverse the naming scheme from the generated files:

***_commands ➡️ command_***
***_events ➡️ events_***

That way we'll have files regrouped in explorer.

Chrome doesn't pick proxy setting automatically

See if there something wrong.
It would be nice to not have to deal with that in the 📦

gives information about the connexion state to chrome

We create a connexion using chrome <- chr_connect(). If we use it then call sometime chr_disconnect(chrome) the connexion is closed. Currently, chrome object does have this new information. It would be nice to know quickly if the chrome connexion we have created is still usable.

A parallel can be made with DBI and the object con. After dbDisconnect the con has a new state DISCONNECTED

Use rlang to manage environment stuff

Example:

get("method_to_be_sent", envir = parent.env(environment())) will become rlang::env_get(nm = "method_to_be_sent", inherit = TRUE)

As we are importing rlang, I think it could be better to use it where needed to take advantage of consistency and helper functions.

Use of Network.setRequestInterception

Hi,

If I want to check a page and get some informations about network operations, I can do something like the following :

url <- "https://www.r-project.org/"

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.responseReceived() %...T>% {
      print("received")
      print(.$result$response$url)
    }
)

However, I'd like to use Network.setRequestInterception to be able to capture only certain requests. I tried to do it this way, but it doesn't seem to work :

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
    Network.requestIntercepted() %...T>% {
      print("intercepted")
    }
)

Would you have any idea of what I'm doing wrong ?

Thanks !

eventemitter on and once should no more return self

This will be difficult to implement removing a listener and we don't really need that it returns self...
this was interesting for chaining method, but this is something that we can live without.

Offer examples based on CRI wiki

There are a lot of example in CRI wiki.

The aim is to be able to reproduce them all (more or less) using crrri.

The way to do it is still to determine. Ideas:

Another repo like knitr examples
A demo folder, from R package structure, and as httr
Vignettes - but with a workaround to use promises
just a folder in the github repo, ignore by .Rbuildignore

We'll begin with a choice easy to change if needed.

Message when loading crrri (R CMD check NOTE)

There is a note in check https://travis-ci.org/RLesur/crrri/builds/524212680#L1012-L1017

* checking R code for possible problems ... NOTE
File ‘crrri/R/zzz.R’:
  .onLoad calls:
    packageStartupMessage("It seems you have this package in your DEBUGME variable,",     "but you have not install the debugme package yet.\n", "You need to install it to the the log messages.")
See section ‘Good practice’ in '?.onAttach'.

I have no idea for a turnaround.

pkgdown website

Implement an API for event listeners

Implementing event listeners is not so easy.

Assume that we want to navigate on a website and ensure that the frame is loaded. Since multiple frames can be opened, we want to ensure that the main frame is loaded.
In order to achieve this task, we have to send Page.navigate, retrieve the frameId of the response and register a listener on the event Page.frameStoppedLoading for the given frameId.

With the current version of crrri, we have to write this (ugly) code:

library(crrri)
library(promises)
con <- chr_connect()

page <- con %>% Page.enable() # active the events

google_loaded <- page %>%
  Page.navigate(url="http://google.fr") %>% 
  then(onFulfilled = function(value) {promise(function(resolve, reject) {
    ws <- value$cnx$ws
    ws$onMessage(function(event) {
      data <- jsonlite::fromJSON(event$data)
      if (!is.null(data$method) & !is.null(data$params$frameId))
        if (data$method == "Page.frameStoppedLoading" & data$params$frameId == value$result$frameId)
          resolve(list(cnx = value$cnx, result = data$params))
    })
  })})

So, we need to implement high level functions in order to register callbacks on events.
For instance, we could do that:

Page.navigate(url="htttp://google.fr") %>%
  Page.frameStoppedLoading(frameId = ~frameId) # or frameId = ~ result$frameId

I wonder what would be the best API?

WIP: Rewrite crrri based on EventEmitter - a puppeeter like

This issue is there to follow work based on rewriting crrri to change API toward a more puppeeter-like 📦 .

This follows and relates to #8, #15, #27 .

The first idea is to have a 📦 that do not use promises at all.

The steps are in order :

1.Implement an EventEmitter class (Done in branch feature/eventemitters)
2. Use EventEmitter to implement a connnection class called CDPSession (see branch feature/CDPSession-with-eventemitter)
3. Use this CDPSession class to connect to chrome and as the base for crrri usage (ongoing in build-on-cdp-session

There is still a choice to make on what features from puppeeter we like included in crri. The puppeeter code is rather complex with a mix between EventEmitter inherited class and use of promises.
This is something to discuss.

Refactor add_names function

In relation with #41

Commands results are not always list

Step to reproduce:

create a second tab
client$Target$getTargets()

Solution: always use from_json()

IDEA: Separate EventEmitter Class in an independant package

As discussed before, the eventemitter feature is generic and still non existing in the R ecosystem. We could separate this class to make it portable so that it could live in its own 📦

No timing on this but I open issue to know we thought about it.

Wrong code to stop httpuv server

In utils.R, is_available_port function, this line:
on.exit(srv$stop())

should be:
on.exit(httpuv::stopServer(srv))

otherwise, it gives error $ can't be used on atomic something, because srv is a string.

After fixed this one, I got an error that says:
Error: 'current_env' is not an exported object from 'namespace:rlang'
Called from: getExportedValue(pkg, name)

No idea how to fix.

Implement a verbose option

In the current version, a lot of messages are written to the log. It could be annoying for higher level development. We have to implement a verbose option.

Improve verbosity for user about what is happening

We have DEBUGME output for developer. See what we can do to improve verbosity on what is going on for the user and where to put it.

Async mode and promise is not the easier, we may improve the verbosity.

Idea on how

A wrapper with a global option to deactivate easily

get_verbose <- function(msg) {
    is_verbose <- getOption("crrrri.verbose", FALSE)
    if (is_verbose) message(msg)
}

Debugme log in double sometimes

Following #10, there seems to be an issue with how !DEBUG line are called.

> chrome <- chr_connect() 
crrri Trying to launch Chrome  
crrri Trying to launch Chrome in headless mode ... +2ms 
Running "C:/Users/chris/Documents/Chrome/chrome-win32/chrome.exe" \
  --no-first-run --headless "--user-data-dir=chrome-data-dir-jtfwycqz" \
  "--remote-debugging-port=9222" --disable-gpu --no-sandbox
crrri +-Chrome succesfully launched  +15ms 
crrri Chrome succesfully launched in headless mode. +1ms 
crrri +-It should be accessible at http://localhost: +0ms 
crrri It should be accessible at http://localhost:9222 +1ms 
crrri Trying to find  +1ms 
crrri Trying to find http://localhost:9222 +1ms 
crrri +-attempt  +0ms 
crrri attempt 1... +1ms 
crrri +- +310ms 
crrri ... http://localhost:9222 found +1ms 
crrri Retrieving Chrome websocket entrypoint at http://localhost: +1ms 
crrri Retrieving Chrome websocket entrypoint at http://localhost:9222/json ... +1ms 
crrri +-...found websocket entrypoint  +1387ms 
crrri ...found websocket entrypoint ws://localhost:9222/devtools/page/AB2AC45BB31BB240C8B28DCF725F331B +1ms 
crrri Configuring the websocket connexion... +1ms 
crrri Configuring the websocket connexion... +1ms 
crrri +-...websocket connexion configured. +6ms 
crrri ...websocket connexion configured. +0ms 
crrri Connecting R to Chrome... +4ms 
crrri Connecting R to Chrome... +0ms 
crrri ...R succesfully connected to headless Chrome through DevTools Protocol. +1110ms 
crrri ...R succesfully connected to headless Chrome through DevTools Protocol. +1ms

some messages are duplicated and I don't know why... Some are printed before the value in backtick is evaluated. It is not really important regarding how the 📦 works but an improvement to look into on the long run. And see if it happens to other people too...

There are other solution to this if debugme as an issue...

Use an env var to store chrome bin

Like decapitated 📦 - I find it useful and it does not require to set up in PATH.
Should we use the same env var ?

Remove stringi dependency

It is only used here:

crrri/R/chr_connect.R

Line 181 in e1118a3

urls <- do.call(c, stringi::stri_split_fixed(env_var, ";"))

I think we could use base R alternative

Create other Remotes classes

It shouldn't be difficult now to create classes inheriting from CDPRemote for Opera, Node.js, Safari and Edge.

Add parameters to the events listeners

When #52 will be resolved, we should be able to implement parameters in the events listeners as in the previous API.

	disconnect = function() {
	if(self$readyState() < 2L) private$.CDPSession_con$close()
	while(self$readyState() < 3L) {
	later::run_now()
	}
	}

	open_debuggers <- tryCatch(
	jsonlite::read_json(sprintf("http://localhost:%s/json", debug_port), simplifyVector = TRUE),
	error = function(e) list()
	)

	address <- open_debuggers$webSocketDebuggerUrl[open_debuggers$type == "page"]

rlesur / crrri Goto Github PK

crrri's People

Contributors

Stargazers

Watchers

Forkers

crrri's Issues

Advantage of OOP

Advantage of current approach

Easy way :

Hard way:

Ideas:

examples

Recommend Projects

Recommend Topics

Recommend Org