Code Monkey home page Code Monkey logo

nuclio-jupyter's Introduction

Periodic Documentation Status Go Report Card Slack Artifact Hub Iguazio Careers

nuclio

Nuclio - "Serverless" framework for Real-Time Events and Data Processing

In this document

Translations:

Overview

Nuclio is a high-performance "serverless" framework focused on data, I/O, and compute intensive workloads. It is well integrated with popular data science tools, such as Jupyter and Kubeflow; supports a variety of data and streaming sources; and supports execution over CPUs and GPUs. The Nuclio project began in 2017 and is constantly and rapidly evolving; many start-ups and enterprises are now using Nuclio in production.

You can use Nuclio as a standalone Docker container or on top of an existing Kubernetes cluster; see the deployment instructions in the Nuclio documentation. You can also use Nuclio through a fully managed application service (in the cloud or on-prem) in the Iguazio Data Science Platform, which you can try for free.

If you wish to create and manage Nuclio functions through code - for example, from Jupyter Notebook - see the Nuclio Jupyter project, which features a Python package and SDK for creating and deploying Nuclio functions from Jupyter Notebook. Nuclio is also an integral part of the new open-source MLRun library for data science automation and tracking and of the open-source Kubeflow Pipelines framework for building and deploying portable, scalable ML workflows.

Nuclio is extremely fast: a single function instance can process hundreds of thousands of HTTP requests or data records per second. This is 10-100 times faster than some other frameworks. To learn more about how Nuclio works, see the Nuclio architecture documentation, read this review of Nuclio vs. AWS Lambda, or watch the Nuclio serverless and AI webinar. You can find links to additional articles and tutorials on the Nuclio web site.

Nuclio is secure: Nuclio is integrated with Kaniko to allow a secure and production-ready way of building Docker images at run time.

For further questions and support, click to join the Nuclio Slack workspace.

Why another "serverless" project?

None of the existing cloud and open-source serverless solutions addressed all the desired capabilities of a serverless framework:

  • Real-time processing with minimal CPU/GPU and I/O overhead and maximum parallelism
  • Native integration with a large variety of data sources, triggers, processing models, and ML frameworks
  • Stateful functions with data-path acceleration
  • Portability across low-power devices, laptops, edge and on-prem clusters, and public clouds
  • Open-source but designed for the enterprise (including logging, monitoring, security, and usability)

Nuclio was created to fulfill these requirements. It was intentionally designed as an extendable open-source framework, using a modular and layered approach that supports constant addition of triggers and runtimes, with the hope that many will join the effort of developing new modules, developer tools, and platforms for Nuclio.

Quick-start steps

The simplest way to explore Nuclio is to run its graphical user interface (GUI) of the Nuclio dashboard. All you need to run the dashboard is Docker:

docker run -p 8070:8070 -v /var/run/docker.sock:/var/run/docker.sock --name nuclio-dashboard quay.io/nuclio/dashboard:stable-amd64

dashboard

Browse to http://localhost:8070, create a project, and add a function. When run outside of an orchestration platform (for example, Kubernetes), the dashboard will simply deploy to the local Docker daemon.

Assuming you are running Nuclio with Docker, as an example, create a project and deploy the pre-existing template "dates (nodejs)". With docker ps, you should see that the function was deployed in its own container. You can then invoke your function with curl; (check that the port number is correct by using docker ps or the Nuclio dashboard):

curl -X POST -H "Content-Type: application/text" -d '{"value":2,"unit":"hours"}' http://localhost:37975

For a complete step-by-step guide to using Nuclio over Kubernetes, either with the dashboard UI or the Nuclio command-line interface (nuctl), explore these learning pathways:

How it works

"When this happens, do that". Nuclio tries to abstract away all the scaffolding around taking an event that occurred (e.g. a record was written into Kafka, an HTTP request was made, a timer expired) and passing this information to a piece of code for processing. To do this, Nuclio expects the users to provide (at the very least) information about what can trigger an event and the code to run when such an event happens. Users provide this information to Nuclio either via the command line utility (nuctl), a REST API or visually through a web application.

architecture

Nuclio takes this information (namely, the function handler and the function configuration) and sends it to a builder. This builder will craft the function's container image holding the user's handler and a piece of software that can execute this handler whenever events are received (more on that in a bit). The builder will then "publish" this container image by pushing it to a container registry.

Once published, the function container image can be deployed. The deployer will craft orchestrator specific configuration from the function's configuration. For example, if deploying to Kubernetes the deployer will take configuration parameters like number of replicas, auto scaling timing parameters, how many GPUs the function is requesting and convert this to Kubernetes resource configuration (i.e. Deployment, Service, Ingress, etc).

Note: The deployer does not create Kubernetes native resources directly, but rather creates a "NuclioFunction" custom resource (CRD). A Nuclio service called the "controller" listens to changes on the NuclioFunction CRD and creates/modifies/destroys the applicable Kubernetes native resources (Deployment, Service, etc). This follows the standard Kubernetes operator pattern

The orchestrator will then spin up containers from the published container images and execute them, providing them the function configuration. The entrypoint of these containers is the "processor", responsible for reading the configuration, listening to event triggers (e.g. connecting to Kafka, listening for HTTP), reading events when they happen and calling the user's handler. The processor is responsible for many, many other things including handling metrics, marshaling responses, gracefully handling crashes, etc.

Scaling to Zero

Once built and deployed to an orchestrator like Kubernetes, Nuclio functions (namely, processors) can process events, scale up and down based on performance metrics, ship logs and metrics - all without the help of any external entity. Once deployed, you can terminate the Nuclio Dashboard and Controller services and Nuclio functions will still run and scale perfectly.

However, scaling to zero is not something they can do on their own. Rather - once scaled to zero, a Nuclio function cannot scale itself up when a new event arrives. For this purpose, Nuclio has a "Scaler" service. This handles all matters of scaling to zero and, more importantly, from zero.

Function examples

The following sample function implementation uses the Event and Context interfaces to handle inputs and logs, returning a structured HTTP response; (it's also possible to use a simple string as the returned value).

In Go

package handler

import (
    "github.com/nuclio/nuclio-sdk-go"
)

func Handler(context *nuclio.Context, event nuclio.Event) (interface{}, error) {
    context.Logger.Info("Request received: %s", event.GetPath())

    return nuclio.Response{
        StatusCode:  200,
        ContentType: "application/text",
        Body: []byte("Response from handler"),
    }, nil
}

In Python

def handler(context, event):
    response_body = f'Got {event.method} to {event.path} with "{event.body}"'

    # log with debug severity
    context.logger.debug('This is a debug level message')

    # just return a response instance
    return context.Response(body=response_body,
                            headers=None,
                            content_type='text/plain',
                            status_code=201)

More examples can be found in the Examples page.

Further reading

For support and additional product information, join the active Nuclio Slack workspace.

nuclio-jupyter's People

Contributors

alonmr avatar bcrant avatar dependabot[bot] avatar dinal avatar eliyahu77 avatar greydoubt avatar hedingber avatar liranbg avatar mckim27 avatar omesser avatar quaark avatar rokatyy avatar sahare92 avatar sharon-iguazio avatar tankilevitch avatar tebeka avatar tomershor avatar yaelgen avatar yanburman avatar yaronha avatar zilbermanor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nuclio-jupyter's Issues

Embed code in YAML configuration

By default we'd like to have the Python code embedded in the YAML file.
If source is something else (git repo ...) - don't embed.

%nuclio build / deploy: Notebook HTTP 404 not found on JupyterHub

Hello,

The build and deploy functions are throwing HTTP 404 errors on JupyterHub. The error is getting thrown by the notebook_file_name method when attempting to parse the notebook file URL from the iPython server.

Thanks to Yaron's assistance, a temporary work around was provided by using the filename as the parameter %nuclio build <filename>.ipynb <flags>.

def notebook_file_name(ikernel):
    """Return the full path of the jupyter notebook."""
    # Check that we're running under notebook
    if not (ikernel and ikernel.config['IPKernelApp']):
        return

    kernel_id = re.search('kernel-(.*).json',
                          ipykernel.connect.get_connection_file()).group(1)
    servers = list_running_servers()
    for srv in servers:
        query = {'token': srv.get('token', '')}
        url = urljoin(srv['url'], 'api/sessions') + '?' + urlencode(query)
        for session in json.load(urlopen(url)):
from notebook.notebookapp import list_running_servers
from urllib.parse import urljoin, urlencode
from urllib.request import urlopen
for srv in list_running_servers():
    query = {'token': srv.get('token', '')}
    url = urljoin(srv['url'], 'api/sessions') + '?' + urlencode(query)
    print(srv)
    print(url)
    print(urlopen(url).read())

returns:

{'base_url': '/user/dither/', 'hostname': '0.0.0.0', 'notebook_dir': '/home/jovyan', 'password': False, 'pid': 6, 'port': 8888, 'secure': False, 'token': '', 'url': 'http://0.0.0.0:8888/user/jdavis/'}

http://0.0.0.0:8888/user/dither/api/sessions?token=

HTTP 404 ERROR)

For reference: the suffix /user/.. is from running Jupyter via JupyterHub. We currently use the helm deployment on Kubernetes.

Package versions:
Nuclio version: 0.7.3
JupyterHub: 0.9.4
JupyterLab version: 1.0.4
Jupyter Notebook: 6.0.0

Move away from travis

travis is flaky and just layed off a lot of people. Move to circleci or maybe jenkins.

%nuclio: error: cannot deploy

Hi there,

following https://www.kubeflow.org/docs/components/misc/nuclio/ I deployed Nuclio in my GKE cluster, the following errors are triggered when testing with the nlp example in my kubeflow notebook:

%nuclio deploy -d <dashboard-external-ip> -n nlp -p default 


[14:08:55.311] (I) Deploying function 
[14:08:55.311] (I) Building 
[14:08:55.381] (I) Staging files and preparing base images 
[14:08:55.382] (I) Building processor image [imageName: "nuclio/processor-nlp:latest"]
[14:09:10.728] (W) Create function failed, setting function status [errorStack: "
Error - exit status 1
    .../nuclio/nuclio/pkg/cmdrunner/cmdrunner.go:131

Call stack:
stdout:
The push refers to repository [localhost:5000/nuclio/processor-nlp]

stderr:
Get http://localhost:5000/v2/: dial tcp [::1]:5000: connect: connection refused

    .../nuclio/nuclio/pkg/cmdrunner/cmdrunner.go:131
Failed to push image
    .../nuclio/nuclio/pkg/dockerclient/shell.go:157
Failed to push docker image into registry
    .../pkg/containerimagebuilderpusher/docker.go:55
Failed to build processor image
    .../nuclio/nuclio/pkg/processor/build/builder.go:242
"]

Moreover deployment a function to the above dashboard with cli nuctl works well, any ideas where should I start to address this issue? Thanks.

Specifying Ingress Rules in Spec for nuclio.deploy

Hi,
How do I specify Ingress rules in Python? I tried the following but please correct me.

spec_data = {
        "spec.triggers.http-trial1.class": "",
        "spec.triggers.http-trial1.kind": "http",
        "spec.triggers.http-trial1.maxWorkers": 2,
        "spec.triggers.http-trial1.workerAvailabilityTimeoutMilliseconds": 30,
        "spec.triggers.http-trial1.attributes.ingresses.0.paths": ['/python-trial-odin']
    }
import nuclio
spec = nuclio.ConfigSpec(config=spec_data)
nuclio.deploy_file(   "https://raw.githubusercontent.com/nuclio/nuclio/master/hack/examples/python/helloworld/helloworld.py", dashboard_url=dashboard_url, name=func_name, project=project_name, output_dir='/Users/rabraham/output',spec=spec)

The error I get is:

Traceback (most recent call last):
  File "/Users/rabraham/odin/nuclio_server.py", line 155, in <module>
    deploy_file_py(dashboard_url, func_name, project_name, y_d)
  File "/Users/rabraham/odin/nuclio_server.py", line 95, in deploy_file_py
    spec=spec)
  File "/Users/rabraham/odin/venv/lib/python3.6/site-packages/nuclio/deploy.py", line 82, in deploy_file
    tag=tag, verbose=verbose, create_new=create_project)
  File "/Users/rabraham/odin/venv/lib/python3.6/site-packages/nuclio/deploy.py", line 200, in deploy_config
    state, address = deploy_progress(api_address, name, verbose)
  File "/Users/rabraham/odin/venv/lib/python3.6/site-packages/nuclio/deploy.py", line 247, in deploy_progress
    ip = get_address(api_address)
  File "/Users/rabraham/odin/venv/lib/python3.6/site-packages/nuclio/deploy.py", line 259, in get_address
    raise OSError('nuclio API call failed')
OSError: nuclio API call failed

Process finished with exit code 1

How do I deploy an directory(or archive)

Hi,
Thanks to @yaronha, I'm able to deploy a single file.
How do I deploy a directory or archive? My directory will contain the handler.py and other user modules.
I tried.

import nuclio
spec = nuclio.ConfigSpec(config=spec_data)
nuclio.deploy_file(
        "https://s3.amazonaws.com/fifteenrock-odin/nuclio.zip",
        dashboard_url=dashboard_url, 
        name='a-function10', 
       project=project_name,
        output_dir='./output',
        spec=spec,
        archive=True)

The error I get is

Traceback (most recent call last):
  File "/Users/myproject/nuclio_lib/nuclio_server.py", line 197, in <module>
    deploy_archive_py(dashboard_url, func_name, project_name, spec_data, "https://s3.amazonaws.com/fifteenrock-odin/nuclio.zip")
  File "/Users/myproject/nuclio_lib/nuclio_server.py", line 119, in deploy_archive_py
    archive=True)
  File "/Users/myproject/venv/lib/python3.6/site-packages/nuclio/deploy.py", line 69, in deploy_file
    create_project=create_project)
  File "/Users/myproject/venv/lib/python3.6/site-packages/nuclio/deploy.py", line 107, in deploy_zip
    tag=tag, verbose=verbose, create_new=create_project)
  File "/Users/myproject/venv/lib/python3.6/site-packages/nuclio/deploy.py", line 197, in deploy_config
    raise DeployError('failed {} {}'.format(verb, name))
nuclio.utils.DeployError: failed creating a-function10

Also, how can I deploy a local zip file? Going through the code, it seems like it needs a remote url?
The S3 link is publicly acccessible.
I added a debug statement and I got this:

{"error":"\nError - Timed out waiting for creation state to be set\nCall stack:\nTimed out waiting for creation state to be set\n"}

conda package

Create a conda package so users will be able to install nuclio-jupyter with conda.

Add %nuclio magic

We'd like to have %nuclio magic with the following commands. See RFC

  • env: Sets environment variables, update env in function.yaml
  • env_file: Set environemt variables from YAML file, update env in function.yaml
  • cmd: Run a command, update build.Commands in function.yaml
  • export: Export notebook to a handler as zip file with code & function.yaml
  • deploy: Export then deploy function

ReadTheDocs Integration Update

Hi,

Got this mail from readthedocs:

Previously, manually configured webhooks from integrations did not have a secret attached to them. In order to improve security, we have deployed an update so that all new integrations will be created with a secret, and we are deprecating old integrations without a secret. You must migrate your integration by January 31, 2024, when they will stop working without a secret.

We are contacting you because you have at least one integration that does not have a secret set. These integrations are:

https://readthedocs.org/dashboard/nuclio-jupyter/integrations/43843/
...

If you aren't using an integration, you can delete it. Otherwise, we recommend clicking on "Resync webhook" to generate a new secret, and then update the secret in your provider's settings as well. You can check [our documentation](https://docs.readthedocs.io/en/stable/guides/setup/git-repo-manual.html) for more information on how to do this.

You can read more information about this in our blog post: [https://blog.readthedocs.com/](https://blog.readthedocs.com/security-update-on-incoming-webhooks/)

LMK if you need any help.

Add a SECURITY.md file

Description

Adding a SECURITY.md file to this repository would help document the security policies and guidelines for reporting vulnerabilities. This file will guide contributors and users on how to handle and report security issues responsibly.

Suggested Content

The SECURITY.md file could include the following sections:

  • Contact Information: How to reach the security team.
  • PGP Key: If applicable, provide a PGP key for encrypted communication.
  • Supported Versions: Which versions of the project are currently supported.
  • Reporting a Vulnerability: Step-by-step instructions on how to report a vulnerability.
  • Response Time: Expected time frame for addressing reported vulnerabilities.

Better example notebook

The example notebook has commands like ls which confuses people, write a more realistic example.

Feature Request: Context logging should pass **kwargs to underlying _logger

According to the official python3 documentation, the main ways to pass stack/traceback information is through kwargs. Right now, the logger only thing passed to the logger beyond the message is *args which doesn't allow stack information to be passed (class referenced here). This should be as simple as adding **kwargs to each. I'm willing to submit a pull request if interested in merging this.

%nuclio config

Add option to set function configuration

%nuclio config spec.maxReplicas = 5

Or

%%nuclio config
spec.maxReplicas = 5
spec.runtime = "python2.7"
build.commands +=  "apk --update --no-cache add ca-certificates"

Feature Request: Build+Deploy from a local directory containing source code and config yaml

Requested feature

As a user ideally I would be able use the python integration to build+deploy a Nuclio function by pointing toward a local directory with source code and a config yaml file (or a zip file containing those)

Currently there is similar functionality available for git or a remote zip archive:

Git python example

addr = nuclio.deploy_file('git://github.com/nuclio/nuclio#master:/hack/examples/python/helloworld',name='hw', project='myproj')

Remote Zip Jupyter example

%nuclio deploy https://myurl.com/projects/myfunc-v1.zip -n myfunc -p myproj

Current Functionality

I can build source code from a local file (e.g. .py file) like so:

my_code, _, _ = build_file('path/to/python_function.py')

I can build the config using the ConfigSpec object like so:

build_commands = ["pip install requests"]
spec = nuclio.ConfigSpec(cmd=build_commands).set_env('foo', 'bar')

And combining these two can deploy like so:

addr = deploy_code(my_code, spec=my_ConfigSpec)

Desired Functionality

It would be great to be able to eliminate the need to use ConfigSpec to build a configuration and instead point to a local directory with a yaml file similar to the functionality for git or remote zip files.

# Build and deploy in one step
addr = deploy_code('path/to/code_and_config/', name='hw', project='myproj')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.