Comments (10)
@petermeissner It works, thank you! Though I do get warnings.
library(robotstxt)
packageVersion("robotstxt")
#> [1] '0.7.2'
# works, but warning
paths_allowed("https://www.google.com")
#> www.google.com
#> Warning in FUN(X[[i]], ...): partial argument match of 'x' to 'xp'
#> [1] TRUE
# also works, but also warning
paths_allowed("https://google.com")
#> google.com
#> Warning in FUN(X[[i]], ...): partial argument match of 'x' to 'xp'
#> [1] TRUE
Created on 2020-05-04 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os macOS Catalina 10.15.4
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_GB.UTF-8
#> ctype en_GB.UTF-8
#> tz Europe/London
#> date 2020-05-04
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.6 2020-04-05 [1] CRAN (R 4.0.0)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> codetools 0.2-16 2018-12-24 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> curl 4.3 2019-12-02 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> future 1.17.0 2020-04-18 [1] CRAN (R 4.0.0)
#> future.apply 1.5.0 2020-04-17 [1] CRAN (R 4.0.0)
#> globals 0.12.5 2019-12-07 [1] CRAN (R 4.0.0)
#> glue 1.4.0 2020-04-03 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0.9003 2020-05-01 [1] Github (rstudio/htmltools@984b39c)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 4.0.0)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.7 2020-04-25 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.2 2020-02-13 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.0.0)
#> robotstxt * 0.7.2 2020-05-04 [1] Github (ropensci/robotstxt@891f1d4)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> spiderbar 0.2.2 2019-08-19 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.0.0)
#> urltools 1.7.3 2019-04-14 [1] CRAN (R 4.0.0)
#> usethis 1.6.1.9000 2020-05-01 [1] Github (r-lib/usethis@4487260)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.13 2020-04-13 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
from robotstxt.
fixed
from robotstxt.
Thanks for reporting.
I have not checked what is going in any depth, but I tried the PR you linked and it makes the error vanish.
Still, this needs some further investigation.
from robotstxt.
So, its {httr} not being very forthcoming ...
- using
www.google.com
it will break since the protocol (http or https) is not specified - using
google.com
will not break despite the protocol not being specified
... I would expect both to work or neither.
Checking the linked pull request and the comments I expect the issue to be resolved within httr so there I will do no changes in robotstxt.
from robotstxt.
their will be no httr release in near future ... so I am working on a workaround within robotstxt
from robotstxt.
Thank you!
from robotstxt.
@mine-cetinkaya-rundel could you please check if the latest dev version works for you.
from robotstxt.
That's weird, I cannot reproduce the warning on my Windows machine - neither with R 3.6.2 nor with R 4.0.0 (fresh install). Can you spot a difference, do you have an idea? Do you know where this comes from?
library(robotstxt)
packageVersion("robotstxt")
#> [1] '0.7.2'
paths_allowed("https://www.google.com")
#> www.google.com
#> [1] TRUE
paths_allowed("https://google.com")
#> google.com
#> [1] TRUE
Created on 2020-05-07 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2020-05-07
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.6 2020-04-05 [1] CRAN (R 4.0.0)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> codetools 0.2-16 2018-12-24 [2] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> curl 4.3 2019-12-02 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> future 1.17.0 2020-04-18 [1] CRAN (R 4.0.0)
#> future.apply 1.5.0 2020-04-17 [1] CRAN (R 4.0.0)
#> globals 0.12.5 2019-12-07 [1] CRAN (R 4.0.0)
#> glue 1.4.0 2020-04-03 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 4.0.0)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.2 2020-02-13 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.0.0)
#> robotstxt * 0.7.2 2020-05-07 [1] local
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> spiderbar 0.2.2 2019-08-19 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.0.0)
#> urltools 1.7.3 2019-04-14 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.13 2020-04-13 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] C:/Users/peter/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.0/library
from robotstxt.
I have the following in my .Rprofile
:
options(
warn = 1,
warnPartialMatchArgs = TRUE,
warnPartialMatchDollar = TRUE,
warnPartialMatchAttr = TRUE
)
If you have warnPartialMatchArgs
set to NULL
, this might be the reason for the difference.
from robotstxt.
"thanks, package spiderbar_0.2.3.tar.gz is on its way to CRAN." 🎉
from robotstxt.
Related Issues (20)
- problem info for on_domain_change is not informative enough
- improve test coverage HOT 4
- release 0.7.x HOT 3
- test that caching works
- test that file overwrite works HOT 1
- fine tune messaging and warnings
- Event on_redirect resulting in bad behaviour HOT 1
- GOV.UK Crawl-delay HOT 9
- Partial matching warnings HOT 2
- Save cached/normal as attribute HOT 3
- Case-sensitive robots.txt results in incorrect crawl delay HOT 2
- Guess domain name with hyphen(s) correctly HOT 1
- Improve validity check: treat error messages as invalid HOT 9
- Parsing would fail for comment in last line HOT 1
- New Maintainer Wanted :-) HOT 3
- add r-cmd-check action
- remove unnecessary and legacy files
- CRAN package spiderbar and its reverse dependencies HOT 3
- CRAN: Error(s) in re-building vignettes HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from robotstxt.