Code Monkey home page Code Monkey logo

cedana's Introduction

Cedana

Build systems that bake realtime adaptiveness and elasticity using Cedana.

Cedana-client serves as client code to the larger Cedana system. We leverages CRIU to provide checkpoint and restore functionality for most linux processes (including docker containers).

We can monitor, migrate and automate checkpoints across a realtime network and compute configuration enabling ephemeral and hardware agnostic compute. See our website for more information about our managed product.

Some problems Cedana can help solve include:

  • Cold-starts for containers/processes
  • Keeping a process running independent of hardware/network failure
  • Managing multiprocess/multinode systems

You can get started using cedana today (outside of the base checkpoint/restore functionality) by trying out our CLI tool that leverages this system to arbitrage compute across clouds.

Build

go build

Usage

To use Cedana in a standalone context, you can directly checkpoint and restore processes with the cedana client. Configuration gets created at ~/.cedana/cedana_config.json by calling cedana bootstrap. To use Cedana, you'll need to spin up the daemon, which is a simple gRPC daemon listening on 8080:

sudo cedana daemon start 

All further commands interact with the daemon over RPC.

Launching Work

Using cedana, you can checkpoint PIDs already running on the system, but may run into issues around process groups and/or file descriptors and network sockets. To bridge this gap and make the jobs more migratable, you can launch processes or work using cedana exec. For example:

cedana exec 'python3 example.py' example_job 

where example_job is a job id associated with your task. To see tasks managed by cedana, you can use:

cedana ps

which also provides information about any local or remote checkpoints associated with the id. There's additional arguments you can pass to exec (such as passing a file for environment variables to launch the process with) which you can explore with --help.

Checkpointing

To checkpoint a running job, you can run:

cedana client dump job JOBID -d DIR 

A successsful dump creates a process_name_datetime.tar file in the directory specified with -d. Alternatively, you can forego the flag by describing a folder to store the checkpoint in in the config:

"shared_storage": {
    "dump_storage_dir": "/home/johnAdams/cedana_dumps/"
  }

See the configuration section for more toggles.

Restoring

cedana client restore job JOBID

Currently, we also support runc and by extension Docker, containerd checkpointing and more container runtime support planned in the future. It should be noted that container checkpointing is generally orchestrated externally, leading the CLI options to be a little janky.

Checkpointing these is as simple as prepending the dump/restore commands with the correct runtime. For example, to checkpoint a containerd container:

sudo cedana dump containerd -i test -p test 

where i is the imageRef and p is the containerID.

For a Docker container (which generally wraps a runc runtime):

sudo cedana dump runc -i runcID -d DIRECTORY

where runcID is the ID of the runc container (separate from what Docker daemon uses) which you can grab from runc ps. To restore, you'll need the container bundle, which you can pass to restore with --bundle. You can make a copy from a running container using docker export CONTAINER_ID -o container_bundle.tar and then:

sudo cedana restore --bundle container_bundle.tar -i new_runc_id -d DIRECTORY

Contributing

See CONTRIBUTING.md for guidelines.

cedana's People

Contributors

nravic avatar bsmithai avatar brandonsmith738 avatar wfoy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.