Code Monkey home page Code Monkey logo

databricks-sdk-r's People

Contributors

atheriel avatar falaki avatar hadley avatar nfx avatar rafikurlansik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

databricks-sdk-r's Issues

Drop `@include`

I'm pretty certain it's not necessary.

I'm happy to do a PR for this if you show me how to run the code generator.

Documentation headings

It looks like the code generator is converting headings to:

#' **Release status**

Instead of

## Release status

Additionally, in statement_execution.R it's generating this:

#' ----
#'
#' ### **Warning: We recommend you protect the URLs in the EXTERNAL_LINKS.**
#'
#' When using the EXTERNAL_LINKS disposition, a short-lived pre-signed URL is
#' generated, which the client can use to download the result chunk directly
#' from cloud storage. As the short-lived credential is embedded in a pre-signed
#' URL, this URL should be protected.
#'
#' Since pre-signed URLs are generated with embedded temporary credentials, you
#' need to remove the authorization header from the fetch requests.
#'
#' ----
#'

and the sequences of dashes are causing this warning when documenting:

Warning: [statement_execution.R:12] @details markdown
translation failed
✖ Internal error: unknown xml node thematic_break
ℹ Please file an issue at
  https://github.com/r-lib/roxygen2/issues

If you can tell me what you're trying to achieve here, I can suggest how you might best express that in R's documentation.

Add package website

This is really easy to do, so I think it's worth it. You can get a basic website up and going by running usethis::use_pkgdown_github_pages()

SSO from DataBricks

Hi Team,

I need to build a Shiny app where the user will be authenticated from DataBricks before they fetch the data. I read that the authentication is not yet supported by this repo. Do you guys are planning it Or is there any alternative to implement this in R. Please any feedback is appreciated.

syntax errors `list(, `

There is a common syntax error across multiple functions:

try(list(, 1))
#> Error in list(, 1) : argument 1 is empty
try(list(, a = 1))
#> Error in list(, a = 1) : argument 1 is empty

Created on 2024-03-22 with reprex v2.1.0

A few examples:

grantsGet <- function(client, securable_type, full_name, principal = NULL) {
query <- list(, principal = principal)
client$do("GET", paste("/api/2.1/unity-catalog/permissions/", securable_type,
"/", full_name, sep = ""), query = query)

lakeviewPublish <- function(client, dashboard_id, embed_credentials = NULL, warehouse_id = NULL) {
body <- list(, embed_credentials = embed_credentials, warehouse_id = warehouse_id)
client$do("POST", paste("/api/2.0/lakeview/dashboards/", dashboard_id, "/published",
, sep = ""), body = body)
}

Currently, these can be found in this search:

https://github.com/search?q=repo%3Adatabrickslabs%2Fdatabricks-sdk-r%20%22list(%2C%20%22&type=code

Getting client settings, such as host or account_id, is ugly

If I want to get the host URL for a workspace, I have to do some error-prone string manipulation, for example (see the lines between START HERE and END HERE):

require(databricks)

client <- DatabricksClient()

response <- clustersCreate(
  client = client,
  cluster_name = "my-cluster",
  spark_version = "12.2.x-scala2.12",
  node_type_id = "i3.xlarge",
  autotermination_minutes = 15,
  num_workers = 1
)

# ##########
# START HERE
# ##########
# Get the workspace URL to be used in the following results message.
get_client_debug <- strsplit(client$debug_string(), split = "host=")
get_host <- strsplit(get_client_debug[[1]][2], split = ",")
host <- get_host[[1]][1]

# Make sure the workspace URL ends with a forward slash.
if (endsWith(host, "/")) {
} else {
  host <- paste(host, "/", sep = "")
}
# ########
# END HERE
# ########

print(paste(
  "View the cluster at ",
  host,
  "#setting/clusters/",
  response$cluster_id,
  "/configuration",
  sep = "")
)

Ideally, I'd like to see something more along the lines of this (see the line that ends with # <-- DO THIS INSTEAD):

require(databricks)

client <- DatabricksClient()

response <- clustersCreate(
  client = client,
  cluster_name = "my-cluster",
  spark_version = "12.2.x-scala2.12",
  node_type_id = "i3.xlarge",
  autotermination_minutes = 15,
  num_workers = 1
)

print(paste(
  "View the cluster at ",
  client$host, # <-- DO THIS INSTEAD
  "#setting/clusters/",
  response$cluster_id,
  "/configuration",
  sep = "")
)

Also, similar to the other Databricks SDKs, I'd like to be to able to get other client settings, such as:

  • client$account_id
  • client$auth_type
  • client$azure_* (multiple settings)
  • client$client_* (multiple settings)
  • client$config_file
  • client$debug_headers
  • client$debug_truncate_bytes
  • client$google_* (multiple settings)
  • client$host
  • client$http_timeout_seconds
  • client$password
  • client$profile
  • client$rate_limit
  • client$retry_timeout_seconds
  • client$token
  • client$username

There be might be a few more that I missed.

Bug : jobsRunNow Failing with stack usage close to the limit Error

jobsRunNow function throws following error however the job run completes successfully:

jobsRunNow(job_id=1234, client=client)
Error : C stack usage  15941956 is too close to the limit
Error: C stack usage  15941956 is too close to the limit
Error: C stack usage  15941956 is too close to the limit

Switch out dplyr dependency for vctrs

dplyr is a user facing package so it's rather dependency heavy in favour of providing the whole package to users. This means it doesn't feel like a great dependency for an SDK, which wants to be low-level. Fortunately, it should be pretty easy to switch out dplyr for vctrs, which is the low-level equivalent, replacing dplyr::bind_rows() with vctrs::vec_rbind().

Embrace R's namespacing

Looking at the following example from the readme, I'm guessing your native language has nested namespaces 😄

library(dplyr)
running <- databricks::clusters$list() %>% filter(state == 'RUNNING')
context <- databricks::command_execution$create(cluster_id=running$cluster_id, language='python')
res <- databricks::command_execution$execute(cluster_id=running$cluster_id, context_id=context$id, language='sql', command='show tables')

You can make it a bit more idiomatic just by attaching databricks:

library(dplyr)
library(databricks)

running <- clusters$list() %>% filter(state == 'RUNNING')
context <- command_execution$create(cluster_id=running$cluster_id, language='python')
res <- command_execution$execute(cluster_id=running$cluster_id, context_id=context$id, language='sql', command='show tables')

But I think it would be even more R like if you eliminated the intermediate lists in favour of a naming convention (R is a bit more like C in this sense):

library(dplyr)
library(databricks)

running <- clusters_list() %>% filter(state == 'RUNNING')
context <- command_execution_create(cluster_id=running$cluster_id, language='python')
res <- command_execution_execute(cluster_id=running$cluster_id, context_id=context$id, language='sql', command='show tables')

Fortunately it looks like this will be pretty easy to do since you're already generating these functions then wrapping them in a list 😄

jobsRunNow ties up console and doesn't print status

Default behavior for jobsRunNow is to block the console from running any commands. This is inconvenient if a user wants to kick off a remote job and then continue to use their console for additional work.

In addition, the cli_reporter seems to be broken, as nothing is printed to the console during job runs.

Suggested changes:

  1. Make the default behavior for jobsRunNow not tie up the R console and simply return the active run id.
  2. Print the URL to the current job run in the console for convenience.

support DATABRICKS_HOST without `https://`

Does the R SDK support using a DATABRICKS_HOST without https://?

When I try to connect to a Databricks instance without the https://, I get an error from curl::curl_fetch_memory(), but when I add the https:// I am able to connect and list clusters. Here are some example commands:

> Sys.setenv(DATABRICKS_HOST="dbc-12345.cloud.databricks.com")
> client <- DatabricksClient()
> clustersList(client)[, "cluster_name"]
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Bad URL, colon is first character
> Sys.setenv(DATABRICKS_HOST="https://dbc-12345.cloud.databricks.com")
> client <- DatabricksClient()
> clustersList(client)[, "cluster_name"]
[1] "Cluster A"                            

If this is not currently supported, can support for this be added? A number of the other Databricks libraries support setting the DATABRICKS_HOST with and without https://.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.