Code Monkey home page Code Monkey logo

crrri's People

Contributors

cderv avatar colinfay avatar rlesur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crrri's Issues

Clarify generator code with templating like logic ?

I think it is not an heavy dependency and that maybe it could help clarify the generator code.
Clarifying the generation for R file could help for maintenance in the long run.
I'll try to look into it and PR something to see how it looks.

check if bin exist chr_init

if not, an error should be thrown as soon as we know it will fail.
Right now, it fails late

chr_connect()

throws

Cannot find the websocket entrypoint of Chrome headless.
Closing headless Chrome...
<Promise [rejected: simpleError]>
Warning messages:
1: In open.connection(con, "rb") :
  InternetOpenUrl failed: 'Impossible d'établir une connexion avec le serveur'
2: In chr_kill(chr_process, work_dir) : attempt to apply non-function
Unhandled promise error: Command not found

I would expect a nice message about google-chrome not found

Add tests in the package

Next very important task !

  • Add and improve manual test because it is useful to have at least those
  • Add non regression test where we can to have some coverage and a better fail safe.

Rethink API toward OOP ?

OOP was on the table but not chosen for this first draft. Mentioned in #1

Advantage of OOP

  • More secure
  • More suited for low-level api
  • Adapted to Domain and methods per domain (ex Page$captureSnapshot())

Advantage of current approach

  • The closest from the protocol and how it is used in JS
  • Better documentation in R
  • Easier to generate "automagically" 😉
  • works well with Promises and Websockets

Main Question about OOP :

  • How would it work with promises and websocket ?

Leaving as is for now - we'll see on the run how this first draft is doing.

add an option to deactivate chrome echo_cmd

I think this is not required in all case and it would allow to deactivate echoing during test.

We are talking about this

tryCatch(processx::process$new(bin, chrome_args, echo_cmd = TRUE),

and not use TRUE but getOption("crrri.echo_cmd", TRUE) or a more generic crrri.verbose option.

Confusion between arguments and values in events man pages ?

Hi,

When I look at the man page of a function of type "Event", such as Network.requestIntercepted, it seems to me that the elements described in Arguments are in fact the elements returned as Values. In this case, for example, the interceptionId element is returned after the event is fired, but it is not an argument of Network.requestIntercepted.

It's possible I'm completely misunderstanding, my apologies if this is the case.

Export all promises functions or Depends ?

Currently, we need systematically to do

library(crrri)
library(promises)

in order to access finally() for example, or promice_race.

crrri works heavily with promises and require this other 📦 . Should we

  • leave as is
  • make crrri depends on promises
  • export a few more functions we know are required for good use of crrri (like finally)

do not write work dir for chrome in R work dir?

currently work_dir is generated with a sample name in the current working directory. Is this necessary to be in the R working dir or can we move it ?

I think it would be better to put this folder, either in temporary directory for Rsession if not persistence is needed or in user apps directory (using rappdir

I think about that because after playing a few times with the package I have plenty of folder I need to remove and that are in the middle of other folder.

Temp directory would be nice but it was tried for decapitated and seems to be issue with it. see
https://github.com/hrbrmstr/decapitated#working-around-headless-chrome--os-security-restrictions
We could also borrow decapitated choice to write everything in a special folder in user home.

I can work on something with rappdir otherwise

Fix ids generation for commands

When I run the example in the README:

r_project <- 
  chrome %>% 
  Page.enable() %>%
  Page.navigate(url = "https://www.r-project.org/")

The same id is used for the Page.enable and for the Page.navigate commands.

Add removeListener for event emitter

All the function for an event emitter object has not yet be implemented. Especially, removeListener.
See spec: https://nodejs.org/api/events.html#events_emitter_removelistener_eventname_listener

it is included in once() because the listener is removed at first occurrence. But when register with on(), there is no way yet to remove the listener.

Easy way :

Consider one listener per event only. Removing the listener is equivalent to removing the event.

Hard way:

Consider several listeners per event. removeListener should be able to remove the correct listener.

Ideas:

  • Currently callbacks objects return a rm function at registration.
  • This function could be kept somewhere (in a list or env) and called when needed to remove the listener.
  • This would require a key so that the correct listener is removed. 🤔 use digest::digest() ?

examples

myEmitter <- EventEmitter$new()
myEmitter$on('event',
    function() {
        message("an event occured")
    }
)

how to remove this listener with anonymous function ?

  • myEmitter$removeListener('event", function() { message("an event occured") }) ?

Only allow it with named function?

myEmitter <- EventEmitter$new()
my_fun <-  function() {
        message("an event occured")
    }
myEmitter$on('event', my_fun)
  • myEmitter$removeListener('event', "myfun") ?
  • myEmitter$removeListener('event', myfun) ?

IDEA: add a onFinally arg in hold

That would be passed to promises::finally to execute before returning the promise value

promises::then(
    promise,
    onFulfilled = function(value) {
      state$pending <- FALSE
      state$fulfilled <- TRUE
      state$value <- value
    },
    onRejected = function(error) {
      state$pending <- FALSE
      state$fulfilled <- FALSE
      state$reason <- error
    }
  ) %>%
  promises::finally(onFinally = fun)

I have thought about that while working on chrome_execute thinking that chrome could be closed at the very end with

invisible(hold(results_available, timeout = total_timeout, onFinally = ~ chrome$close()))

Thoughts ?

Can't access `.$result$root$nodeId` from `DOM.getDocument()`

The following code aims to dump the DOM of a web page :

url <- "https://www.r-project.org/"
chrome <- chr_connect(bin = "google-chrome", headless = FALSE)
gd <- chrome %>% 
  Page.enable() %>% 
  DOM.enable() %>%
  Page.navigate(url) %>% 
  Page.loadEventFired() %>%
  DOM.getDocument()

gd %>%   
  DOM.getOuterHTML(nodeId = .$result$root$nodeId)

But it fails at DOM.getOuterHTML(nodeId = .$result$root$nodeId) with Error: Invalid parameters(code -32602)

It does work when giving a nodeId directly :

gd %>%   
  DOM.getOuterHTML(nodeId = 1)

And the .$result$root$nodeId value is correct :

gd %...>% {
  print(.$result$root$nodeId)
}

Offer functions to setup and find chrome

inspired by

  • pagedown::find_chrome() (would remove a suggest just for that in vignette)
  • decapitated 📦 that offers to download a portable chrome. I use that and it works very well.

For crrri, as a low level 📦 we could also assume that installation and configuration must be done by the user without any help from crrri. Just some documentation in README.

Add a setTimeout mechanism

It seems useful to have a timeout to reject a promise. Currently, the use seems to be like in the readme example:

promise_race(
    timeout(delay),
    chrome %>% 
      Page.enable() %>%
      Page.navigate(url = url) %>% 
      Page.frameStoppedLoading(frameId = ~ .res$frameId) %>%  
      Page.printToPDF() %...T>% { 
        .$result$data %>% base64_dec() %>% writeBin(paste0(names(url), ".pdf")) 
      }
  )

we may be able to provide a wrapper function to provide this feature more easily.

Errors for Browser domain methods

In the current version, the chr_connect() function connects to the websocket entrypoint page found at http://localhost:9222/json, see

crrri/R/chr_connect.R

Lines 241 to 246 in c873002

open_debuggers <- tryCatch(
jsonlite::read_json(sprintf("http://localhost:%s/json", debug_port), simplifyVector = TRUE),
error = function(e) list()
)
address <- open_debuggers$webSocketDebuggerUrl[open_debuggers$type == "page"]

However, methods of the Browser domain can only be sent at the browser websocket entrypoint that can be found at http://localhost:9222/json/version

I don't know whether other domains are concerned.

Add a wrapper for base64 encoded data

Currently, this is possible using jsonlite::base64_dec but you need to know how to write this

Page.printToPDF() %...T>% { 
        .$result$data %>% base64_dec() %>% writeBin(paste0(names(url), ".pdf")) 
      }
  Page.captureScreenshot(format = "png", fromSurface = TRUE) %...T>% {
    .$result$data %>% jsonlite::base64_dec() %>% writeBin("test.png")
  }

Something like crrri:::write_base64

write_base64 <- function(promise, con) {
    promise %...T>% {
        .$result$data %>% jsonlite::base64_dec() %>% writeBin(con)
    }
}

or without pipe

write_base64 <- function(promise, con) {
    data <- promise$result$data
    write <- promises::then(promise,
        raw <-  jsonlite::base64_dec(data) 
        writeBin(raw, con)
   promise::then(write, 
       function(value) promise
    )
}

It would be a side effect function, (like purrr::walk) and return the promise from the LHS.

Change the name of the await() function

I regret the name I gave to the await() function.

In JS, await is a keyword that can only be used inside an async function (an async function is a function that always returns a promise). Using await outside an async function is illegal.
I think that makes sense.

Here, the await() function is a wrapper over later::run_now(). For instance, the httpuv package also wraps later::run_now() with the httpuv::service() function.

I think we should rename the await() function.
The only proposal I have would be hold(promise). We could add a delay argument here: hold(promise, delay = 30).

[Idea] navigateToFile

Right now the Page object allows to navigate to a url, and if you want to navigate to a file you have to Page$navigate(url = sprintf("file://%s", normalizePath(file_path)) ).

It could be nice to have a $navigateToFile method that takes a relative path, normalize it, and dooes the sprintf("file://%s").

Better Identify file created automatically

I find it difficult to identify easily which files are created by the generator and which files are for our functions. We can't use subdirectories inside the R folder so it needs to be in the name.

This is small improvement I know, but what do you think ?

We could either reverse the naming scheme from the generated files:

  • ***_commands ➡️ command_***
  • ***_events ➡️ events_***

That way we'll have files regrouped in explorer.

Other solution:

  • add a new prefix like DP_*** to identify files with functions from devtools protocol.
  • add a prefix like crrri_*** to files that are not generated automatically...

What do you think?

gives information about the connexion state to chrome

We create a connexion using chrome <- chr_connect(). If we use it then call sometime chr_disconnect(chrome) the connexion is closed. Currently, chrome object does have this new information. It would be nice to know quickly if the chrome connexion we have created is still usable.

A parallel can be made with DBI and the object con. After dbDisconnect the con has a new state DISCONNECTED

Use rlang to manage environment stuff

Example:

  • get("method_to_be_sent", envir = parent.env(environment())) will become rlang::env_get(nm = "method_to_be_sent", inherit = TRUE)

As we are importing rlang, I think it could be better to use it where needed to take advantage of consistency and helper functions.

Use of Network.setRequestInterception

Hi,

If I want to check a page and get some informations about network operations, I can do something like the following :

url <- "https://www.r-project.org/"

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.responseReceived() %...T>% {
      print("received")
      print(.$result$response$url)
    }
)

However, I'd like to use Network.setRequestInterception to be able to capture only certain requests. I tried to do it this way, but it doesn't seem to work :

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
    Network.requestIntercepted() %...T>% {
      print("intercepted")
    }
)

Would you have any idea of what I'm doing wrong ?

Thanks !

Offer examples based on CRI wiki

There are a lot of example in CRI wiki.

The aim is to be able to reproduce them all (more or less) using crrri.

The way to do it is still to determine. Ideas:

  • Another repo like knitr examples
  • A demo folder, from R package structure, and as httr
  • Vignettes - but with a workaround to use promises
  • just a folder in the github repo, ignore by .Rbuildignore

We'll begin with a choice easy to change if needed.

Implement an API for event listeners

Implementing event listeners is not so easy.

Assume that we want to navigate on a website and ensure that the frame is loaded. Since multiple frames can be opened, we want to ensure that the main frame is loaded.
In order to achieve this task, we have to send Page.navigate, retrieve the frameId of the response and register a listener on the event Page.frameStoppedLoading for the given frameId.

With the current version of crrri, we have to write this (ugly) code:

library(crrri)
library(promises)
con <- chr_connect()

page <- con %>% Page.enable() # active the events

google_loaded <- page %>%
  Page.navigate(url="http://google.fr") %>% 
  then(onFulfilled = function(value) {promise(function(resolve, reject) {
    ws <- value$cnx$ws
    ws$onMessage(function(event) {
      data <- jsonlite::fromJSON(event$data)
      if (!is.null(data$method) & !is.null(data$params$frameId))
        if (data$method == "Page.frameStoppedLoading" & data$params$frameId == value$result$frameId)
          resolve(list(cnx = value$cnx, result = data$params))
    })
  })})

So, we need to implement high level functions in order to register callbacks on events.
For instance, we could do that:

Page.navigate(url="htttp://google.fr") %>%
  Page.frameStoppedLoading(frameId = ~frameId) # or frameId = ~ result$frameId

I wonder what would be the best API?

WIP: Rewrite crrri based on EventEmitter - a puppeeter like

This issue is there to follow work based on rewriting crrri to change API toward a more puppeeter-like 📦 .

This follows and relates to #8, #15, #27 .

The first idea is to have a 📦 that do not use promises at all.

The steps are in order :

There is still a choice to make on what features from puppeeter we like included in crri. The puppeeter code is rather complex with a mix between EventEmitter inherited class and use of promises.
This is something to discuss.

IDEA: Separate EventEmitter Class in an independant package

As discussed before, the eventemitter feature is generic and still non existing in the R ecosystem. We could separate this class to make it portable so that it could live in its own 📦

No timing on this but I open issue to know we thought about it.

Wrong code to stop httpuv server

In utils.R, is_available_port function, this line:
on.exit(srv$stop())

should be:
on.exit(httpuv::stopServer(srv))

otherwise, it gives error $ can't be used on atomic something, because srv is a string.

After fixed this one, I got an error that says:
Error: 'current_env' is not an exported object from 'namespace:rlang'
Called from: getExportedValue(pkg, name)

No idea how to fix.

Implement a verbose option

In the current version, a lot of messages are written to the log. It could be annoying for higher level development. We have to implement a verbose option.

Improve verbosity for user about what is happening

We have DEBUGME output for developer. See what we can do to improve verbosity on what is going on for the user and where to put it.

Async mode and promise is not the easier, we may improve the verbosity.

Idea on how

  • A wrapper with a global option to deactivate easily
get_verbose <- function(msg) {
    is_verbose <- getOption("crrrri.verbose", FALSE)
    if (is_verbose) message(msg)
}

Debugme log in double sometimes

Following #10, there seems to be an issue with how !DEBUG line are called.

> chrome <- chr_connect() 
crrri Trying to launch Chrome  
crrri Trying to launch Chrome in headless mode ... +2ms 
Running "C:/Users/chris/Documents/Chrome/chrome-win32/chrome.exe" \
  --no-first-run --headless "--user-data-dir=chrome-data-dir-jtfwycqz" \
  "--remote-debugging-port=9222" --disable-gpu --no-sandbox
crrri +-Chrome succesfully launched  +15ms 
crrri Chrome succesfully launched in headless mode. +1ms 
crrri +-It should be accessible at http://localhost: +0ms 
crrri It should be accessible at http://localhost:9222 +1ms 
crrri Trying to find  +1ms 
crrri Trying to find http://localhost:9222 +1ms 
crrri +-attempt  +0ms 
crrri attempt 1... +1ms 
crrri +- +310ms 
crrri ... http://localhost:9222 found +1ms 
crrri Retrieving Chrome websocket entrypoint at http://localhost: +1ms 
crrri Retrieving Chrome websocket entrypoint at http://localhost:9222/json ... +1ms 
crrri +-...found websocket entrypoint  +1387ms 
crrri ...found websocket entrypoint ws://localhost:9222/devtools/page/AB2AC45BB31BB240C8B28DCF725F331B +1ms 
crrri Configuring the websocket connexion... +1ms 
crrri Configuring the websocket connexion... +1ms 
crrri +-...websocket connexion configured. +6ms 
crrri ...websocket connexion configured. +0ms 
crrri Connecting R to Chrome... +4ms 
crrri Connecting R to Chrome... +0ms 
crrri ...R succesfully connected to headless Chrome through DevTools Protocol. +1110ms 
crrri ...R succesfully connected to headless Chrome through DevTools Protocol. +1ms 

some messages are duplicated and I don't know why... Some are printed before the value in backtick is evaluated. It is not really important regarding how the 📦 works but an improvement to look into on the long run. And see if it happens to other people too...

There are other solution to this if debugme as an issue...

Create other Remotes classes

It shouldn't be difficult now to create classes inheriting from CDPRemote for Opera, Node.js, Safari and Edge.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.