databrickslabs / databricks-sdk-r Goto Github PK
View Code? Open in Web Editor NEWDatabricks SDK for R (Experimental)
Home Page: https://databrickslabs.github.io/databricks-sdk-r/
License: Apache License 2.0
Databricks SDK for R (Experimental)
Home Page: https://databrickslabs.github.io/databricks-sdk-r/
License: Apache License 2.0
I'm pretty certain it's not necessary.
I'm happy to do a PR for this if you show me how to run the code generator.
It looks like the code generator is converting headings to:
#' **Release status**
Instead of
## Release status
Additionally, in statement_execution.R
it's generating this:
#' ----
#'
#' ### **Warning: We recommend you protect the URLs in the EXTERNAL_LINKS.**
#'
#' When using the EXTERNAL_LINKS disposition, a short-lived pre-signed URL is
#' generated, which the client can use to download the result chunk directly
#' from cloud storage. As the short-lived credential is embedded in a pre-signed
#' URL, this URL should be protected.
#'
#' Since pre-signed URLs are generated with embedded temporary credentials, you
#' need to remove the authorization header from the fetch requests.
#'
#' ----
#'
and the sequences of dashes are causing this warning when documenting:
Warning: [statement_execution.R:12] @details markdown
translation failed
✖ Internal error: unknown xml node thematic_break
ℹ Please file an issue at
https://github.com/r-lib/roxygen2/issues
If you can tell me what you're trying to achieve here, I can suggest how you might best express that in R's documentation.
This is really easy to do, so I think it's worth it. You can get a basic website up and going by running usethis::use_pkgdown_github_pages()
Hi Team,
I need to build a Shiny app where the user will be authenticated from DataBricks before they fetch the data. I read that the authentication is not yet supported by this repo. Do you guys are planning it Or is there any alternative to implement this in R. Please any feedback is appreciated.
There is a common syntax error across multiple functions:
try(list(, 1))
#> Error in list(, 1) : argument 1 is empty
try(list(, a = 1))
#> Error in list(, a = 1) : argument 1 is empty
Created on 2024-03-22 with reprex v2.1.0
A few examples:
Lines 17 to 20 in 5e37300
Lines 65 to 69 in 5e37300
Currently, these can be found in this search:
https://github.com/search?q=repo%3Adatabrickslabs%2Fdatabricks-sdk-r%20%22list(%2C%20%22&type=code
If I want to get the host URL for a workspace, I have to do some error-prone string manipulation, for example (see the lines between START HERE
and END HERE
):
require(databricks)
client <- DatabricksClient()
response <- clustersCreate(
client = client,
cluster_name = "my-cluster",
spark_version = "12.2.x-scala2.12",
node_type_id = "i3.xlarge",
autotermination_minutes = 15,
num_workers = 1
)
# ##########
# START HERE
# ##########
# Get the workspace URL to be used in the following results message.
get_client_debug <- strsplit(client$debug_string(), split = "host=")
get_host <- strsplit(get_client_debug[[1]][2], split = ",")
host <- get_host[[1]][1]
# Make sure the workspace URL ends with a forward slash.
if (endsWith(host, "/")) {
} else {
host <- paste(host, "/", sep = "")
}
# ########
# END HERE
# ########
print(paste(
"View the cluster at ",
host,
"#setting/clusters/",
response$cluster_id,
"/configuration",
sep = "")
)
Ideally, I'd like to see something more along the lines of this (see the line that ends with # <-- DO THIS INSTEAD
):
require(databricks)
client <- DatabricksClient()
response <- clustersCreate(
client = client,
cluster_name = "my-cluster",
spark_version = "12.2.x-scala2.12",
node_type_id = "i3.xlarge",
autotermination_minutes = 15,
num_workers = 1
)
print(paste(
"View the cluster at ",
client$host, # <-- DO THIS INSTEAD
"#setting/clusters/",
response$cluster_id,
"/configuration",
sep = "")
)
Also, similar to the other Databricks SDKs, I'd like to be to able to get other client settings, such as:
client$account_id
client$auth_type
client$azure_*
(multiple settings)client$client_*
(multiple settings)client$config_file
client$debug_headers
client$debug_truncate_bytes
client$google_*
(multiple settings)client$host
client$http_timeout_seconds
client$password
client$profile
client$rate_limit
client$retry_timeout_seconds
client$token
client$username
There be might be a few more that I missed.
jobsRunNow function throws following error however the job run completes successfully:
jobsRunNow(job_id=1234, client=client)
Error : C stack usage 15941956 is too close to the limit
Error: C stack usage 15941956 is too close to the limit
Error: C stack usage 15941956 is too close to the limit
dplyr is a user facing package so it's rather dependency heavy in favour of providing the whole package to users. This means it doesn't feel like a great dependency for an SDK, which wants to be low-level. Fortunately, it should be pretty easy to switch out dplyr for vctrs, which is the low-level equivalent, replacing dplyr::bind_rows()
with vctrs::vec_rbind()
.
httr is mostly in maintenance mode, so I'd recommend using httr2 instead. It includes nice features for automatically retrying on failure, e.g. https://httr2.r-lib.org/reference/req_retry.html.
Looking at the following example from the readme, I'm guessing your native language has nested namespaces 😄
library(dplyr)
running <- databricks::clusters$list() %>% filter(state == 'RUNNING')
context <- databricks::command_execution$create(cluster_id=running$cluster_id, language='python')
res <- databricks::command_execution$execute(cluster_id=running$cluster_id, context_id=context$id, language='sql', command='show tables')
You can make it a bit more idiomatic just by attaching databricks:
library(dplyr)
library(databricks)
running <- clusters$list() %>% filter(state == 'RUNNING')
context <- command_execution$create(cluster_id=running$cluster_id, language='python')
res <- command_execution$execute(cluster_id=running$cluster_id, context_id=context$id, language='sql', command='show tables')
But I think it would be even more R like if you eliminated the intermediate lists in favour of a naming convention (R is a bit more like C in this sense):
library(dplyr)
library(databricks)
running <- clusters_list() %>% filter(state == 'RUNNING')
context <- command_execution_create(cluster_id=running$cluster_id, language='python')
res <- command_execution_execute(cluster_id=running$cluster_id, context_id=context$id, language='sql', command='show tables')
Fortunately it looks like this will be pretty easy to do since you're already generating these functions then wrapping them in a list 😄
Default behavior for jobsRunNow
is to block the console from running any commands. This is inconvenient if a user wants to kick off a remote job and then continue to use their console for additional work.
In addition, the cli_reporter
seems to be broken, as nothing is printed to the console during job runs.
Suggested changes:
jobsRunNow
not tie up the R console and simply return the active run id.Does the R SDK support using a DATABRICKS_HOST without https://
?
When I try to connect to a Databricks instance without the https://
, I get an error from curl::curl_fetch_memory()
, but when I add the https://
I am able to connect and list clusters. Here are some example commands:
> Sys.setenv(DATABRICKS_HOST="dbc-12345.cloud.databricks.com")
> client <- DatabricksClient()
> clustersList(client)[, "cluster_name"]
Error in curl::curl_fetch_memory(url, handle = handle) :
Bad URL, colon is first character
> Sys.setenv(DATABRICKS_HOST="https://dbc-12345.cloud.databricks.com")
> client <- DatabricksClient()
> clustersList(client)[, "cluster_name"]
[1] "Cluster A"
If this is not currently supported, can support for this be added? A number of the other Databricks libraries support setting the DATABRICKS_HOST with and without https://
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.