Comments (6)
You could decide the return value you want in the recipe.
Example:
library(promises)
library(crrri)
dump_DOM <- function(url, file = "") {
perform_with_chrome(function(client) {
Network <- client$Network
Page <- client$Page
Runtime <- client$Runtime
Network$enable() %...>% {
Page$enable()
} %...>% {
Network$setCacheDisabled(cacheDisabled = TRUE)
} %...>% {
Page$navigate(url = url)
} %...>% {
Page$loadEventFired()
} %...>% {
Runtime$evaluate(
expression = 'document.documentElement.outerHTML'
)
} %...>% (function(result) {
html <- result$result$value
rvest::read_html(html, "\n")
})
})
}
html <- dump_DOM(url = "http://www.ardata.fr/post/")
library(rvest)
html %>% html_node("title") %>% html_text()
#> [1] "Blog | ArData "
Created on 2021-03-11 by the reprex package (v1.0.0.9002)
You could also return the text directly
library(promises)
library(crrri)
dump_DOM <- function(url, file = "") {
perform_with_chrome(function(client) {
Network <- client$Network
Page <- client$Page
Runtime <- client$Runtime
Network$enable() %...>% {
Page$enable()
} %...>% {
Network$setCacheDisabled(cacheDisabled = TRUE)
} %...>% {
Page$navigate(url = url)
} %...>% {
Page$loadEventFired()
} %...>% {
Runtime$evaluate(
expression = 'document.documentElement.outerHTML'
)
} %...>% (function(result) {
result$result$value
})
})
}
html <- dump_DOM(url = "http://www.ardata.fr/post/")
library(rvest)
read_html(html) %>% html_node("title") %>% html_text()
#> [1] "Blog | ArData "
Created on 2021-03-11 by the reprex package (v1.0.0.9002)
from crrri.
This works ok with rvest. Here is an example:
library(promises)
library(crrri)
dump_DOM <- function(url, file = "") {
perform_with_chrome(function(client) {
Network <- client$Network
Page <- client$Page
Runtime <- client$Runtime
Network$enable() %...>% {
Page$enable()
} %...>% {
Network$setCacheDisabled(cacheDisabled = TRUE)
} %...>% {
Page$navigate(url = url)
} %...>% {
Page$loadEventFired()
} %...>% {
Runtime$evaluate(
expression = 'document.documentElement.outerHTML'
)
} %...>% (function(result) {
html <- result$result$value
cat(html, "\n", file = file)
})
})
}
html <- dump_DOM(url = "http://www.ardata.fr/post/", "test.html")
#> Running "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" \
#> --no-first-run --headless \
#> "--user-data-dir=C:\Users\chris\AppData\Local\r-crrri\r-crrri\chrome-data-dir-xbpnjxhj" \
#> "--remote-debugging-port=9222" --disable-gpu --no-sandbox
library(rvest)
#> Le chargement a nécessité le package : xml2
html <- read_html("test.html")
html %>% html_node("title") %>% html_text()
#> [1] "Blog | ArData "
Created on 2021-03-11 by the reprex package (v1.0.0.9002)
from crrri.
Got it. Thanks. Please find my code below -
z <- b$Runtime$evaluate('document.documentElement.outerHTML')
mydf <- z$result$value
Last question - can we use rvest on this? It seems it is not XML , hence not working.
mydf %>% rvest::html_nodes("[id$='_hcontainer']")
from crrri.
For now crrri is rather low level and you need to create the recipe yourself.
I believe chrome_read_html()
is equivalent to dumpDOM()
function we gave as example in the README: https://github.com/RLesur/crrri#transpose-chrome-remote-interface-js-scripts-dump-the-dom
It uses the expression you found and that you evaluate.
The result should be HTML so rvest or xml2 can be used on this. With an example it could be easier to see the issue.
from crrri.
To precise my thoughts, It feels like having these in crrri directly is not the best option to keep this package centered around Chrome Remote Interface.
But we had the idea of creating a package that would contain recipes like dumpDOM()
, but we did not found the time yet to start it.
from crrri.
Thanks. This is great. Just asking if it is possible to do it without saving as html file "test.html"
from crrri.
Related Issues (20)
- Logic for returning results from a function that get XHR calls response body HOT 3
- How to send some value to web page and get the output saved or retrieved as dataframe HOT 12
- Can't create chromium instance on debian linux 10 HOT 1
- Run in Github Actions HOT 4
- Is possible to extract data from Power BI dashboard using crrri package?
- Select Dropdown not working
- Upload File HOT 1
- Cannot open URL 'http://localhost:9222/json/new': HTTP status was '405 Method Not Allowed' HOT 4
- websites that don't like to be scraped HOT 1
- Document R6 class using new roxygen feature HOT 5
- Finalize a stable version ? HOT 1
- Add support and document about New Edge Chromium HOT 1
- Allow to load user profile in non-headless mode HOT 8
- add some more default flags to launch chromium
- Allow children to close parent connection HOT 2
- Suggestion: automatically find a free port if the specified is not. HOT 2
- Purge crrrri cache HOT 1
- switch CI to Github Actions HOT 2
- R 4.0.0 now runs donttest example
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crrri.