The cuelake from cuebook

Celery task memory leak

Remove default role and change zeppelin-server role to Cuelake Role

Also remove the permissions which are not required by CueLake or Zeppelin Server
Change helm charts as well

Syntax error in interpreter.json

There is a syntax error (missing comma) on line 1275 in https://raw.githubusercontent.com/cuebook/cuelake/main/zeppelinConf/interpreter.json

Also, there is a \t on lines 1271 and 1272 that I suspect are incorrect.

And finally if you use less to view the content it C in the word Comma on line 201 is displayed as .

Below is a diff file or the changes that I made to the file.

201c201
<           "description": "Сomma separated schema (schema \u003d catalog \u003d database) filters to get metadata for completions. Supports \u0027%\u0027 symbol is equivalent to any set of characters. (ex. prod_v_%,public%,info)"
---
>           "description": "Comma separated schema (schema \u003d catalog \u003d database) filters to get metadata for completions. Supports \u0027%\u0027 symbol is equivalent to any set of characters. (ex. prod_v_%,public%,info)"
1271,1272c1271,1272
<         "spark.executor.extraJavaOptions\t": {
<           "name": "spark.executor.extraJavaOptions\t",
---
>         "spark.executor.extraJavaOptions": {
>           "name": "spark.executor.extraJavaOptions",
1275c1275
<         }
---
>         },

Remove use of pandas for reading hive metadata instead use psycopg2

Add automatic kubectl port-forward for easy local development

While development Cuelake should be able to automatically port-forward zeppelin-server in a given namespace. Namespace can be provided via an env variable.
Similarily create port-forwards for zeppelin-job-server when running a job in dev environment.

https://github.com/kubernetes-client/python/blob/release-9.0/kubernetes/docs/CoreV1Api.md#connect_get_namespaced_pod_portforward

Remove S3 dependency

Simplify notebook sorting logic

Currently the logic very complicated, have a simpler more readable function to do the sorting.

Ability to filter running/queued notebooks

Fix hive metastore issues

Test the following scenarios on hive metastore for iceberg, delta and parquert tables:

Data should be created in the given warehouse directory as env variable
When table is dropped data should be deleted

Test and fix the behaviour of hive metastore on both S3 and GCS.

Use the latest version of iceberg and delta jars and also upgrade the spark version if required.

Update product gifs

Workspaces v1

Ask following info while creating a workspace:

Name & Description
Storage (S3, GCS, AZFS, PV)
Storage credentials if required
Inactivity Timeout to shut down resources
Spark and Interpreter docker images (Show Cuelake's default values and link for creating custom images)

Read variables from notebook or paragraph

Can we used minio as S3 compatible for apache iceberg

Is your feature request related to a problem? Please describe.
Can we used minio as S3 compatible for apache iceberg

Describe the solution you'd like
Can we used minio as S3 compatible for apache iceberg

Describe alternatives you've considered
If we can use minio, need the steps to configure minio with cuelake

Additional context
Can we used minio as S3 compatible for apache iceberg

Add workspace switcher in the left menu

Update helm charts for release 0.2

Improve logs UI

Currently, logs are just JSON dumps. Copy the parser code from zeppelin and implement in CueLake so that the logs look the same as they are in zeppelin.

Auto generate zeppelin conf for workspace

Support for Jupyter Notebook

Is your feature request related to a problem? Please describe.
Your current system supports zeplin notebooks. We have a lot of notebooks designed with jupyter. And we have tons of tooling around the same. Its a tremendous effort to shift these. Requesting support for jupyter notebooks besides zepplyn.

Describe the solution you'd like
Ability to run jupyter notebooks

Describe alternatives you've considered
Tools for convert from jupyter to zepplyn. But thats a lot of work internally

Fix models for WorkflowJobs and NotebookJobs

Do not inherit from PeriodicTask instead use a foreign key

Dashboard V1

Dashboard will show all the workspaces and their resouces.

CueLake will start with 0 workspaces.

User can add a workspace from dashboard.

For a workspace following info will be shown:

Resources currently running for the workspace. Zeppelin Server, All Interpreters
Restart button for the zeppelin server
Name & Description of the workspace

27.0.0.1 - - [27/May/2021:06:14:14 +0000] "GET /api/genie/notebooks/0 HTTP/1.1" 200 68 "http://127.0.0.1:8080/notebooks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
Internal Server Error: /api/genie/driverAndExecutorStatus/
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/django/views/generic/base.py", line 70, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/code/genie/views.py", line 243, in get
    res = KubernetesServices.getDriversCount()
  File "/code/genie/services/services.py", line 657, in getDriversCount
    ret = v1.list_namespaced_pod(POD_NAMESPACE, watch=False)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 15302, in list_namespaced_pod
    return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 15427, in list_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request
    headers=headers)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 243, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '96c45951-281d-41d5-908d-b6429974a4dd', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Thu, 27 May 2021 06:14:14 GMT', 'Content-Length': '282'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:cuelake:default\" cannot list resource \"pods\" in API group \"\" in the namespace \"cuelake\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}
```

***Workaround***

A workaround is to add "pods" as a resource in the default-role in cuelake.yaml.

```
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: default-role
rules:
- apiGroups: [""]
  resources: ["pods", "configmaps"]
  verbs: ["create", "get", "update", "patch", "list", "delete", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
  resources: ["roles", "rolebindings"]
  verbs: ["bind", "create", "get", "update", "patch", "list", "delete", "watch"]
```

Retry count for notebook

Rename models

Some models name are not so apt. Change the following model names:
RunStatus -> NotebookRunLogs
WorkflowRuns -> WorkflowRunLogs

cuebook / cuelake Goto Github PK

cuelake's Introduction

Getting started

Features

Current Limitations

Support

Contributing

cuelake's People

Contributors

Stargazers

Watchers

Forkers

cuelake's Issues

Recommend Projects

Recommend Topics

Recommend Org