Code Monkey home page Code Monkey logo

accelerated-container-image's Introduction

containerd banner light mode containerd banner dark mode

PkgGoDev Build Status Nightlies Go Report Card CII Best Practices OpenSSF Scorecard Check Links

containerd is an industry-standard container runtime with an emphasis on simplicity, robustness, and portability. It is available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system: image transfer and storage, container execution and supervision, low-level storage and network attachments, etc.

containerd is a member of CNCF with 'graduated' status.

containerd is designed to be embedded into a larger system, rather than being used directly by developers or end-users.

architecture

Announcements

Now Recruiting

We are a large inclusive OSS project that is welcoming help of any kind shape or form:

  • Documentation help is needed to make the product easier to consume and extend.
  • We need OSS community outreach/organizing help to get the word out; manage and create messaging and educational content; and help with social media, community forums/groups, and google groups.
  • We are actively inviting new security advisors to join the team.
  • New subprojects are being created, core and non-core that could use additional development help.
  • Each of the containerd projects has a list of issues currently being worked on or that need help resolving.
    • If the issue has not already been assigned to someone or has not made recent progress, and you are interested, please inquire.
    • If you are interested in starting with a smaller/beginner-level issue, look for issues with an exp/beginner tag, for example containerd/containerd beginner issues.

Getting Started

See our documentation on containerd.io:

To get started contributing to containerd, see CONTRIBUTING.

If you are interested in trying out containerd see our example at Getting Started.

Nightly builds

There are nightly builds available for download here. Binaries are generated from main branch every night for Linux and Windows.

Please be aware: nightly builds might have critical bugs, it's not recommended for use in production and no support provided.

Kubernetes (k8s) CI Dashboard Group

The k8s CI dashboard group for containerd contains test results regarding the health of kubernetes when run against main and a number of containerd release branches.

Runtime Requirements

Runtime requirements for containerd are very minimal. Most interactions with the Linux and Windows container feature sets are handled via runc and/or OS-specific libraries (e.g. hcsshim for Microsoft). The current required version of runc is described in RUNC.md.

There are specific features used by containerd core code and snapshotters that will require a minimum kernel version on Linux. With the understood caveat of distro kernel versioning, a reasonable starting point for Linux is a minimum 4.x kernel version.

The overlay filesystem snapshotter, used by default, uses features that were finalized in the 4.x kernel series. If you choose to use btrfs, there may be more flexibility in kernel version (minimum recommended is 3.18), but will require the btrfs kernel module and btrfs tools to be installed on your Linux distribution.

To use Linux checkpoint and restore features, you will need criu installed on your system. See more details in Checkpoint and Restore.

Build requirements for developers are listed in BUILDING.

Supported Registries

Any registry which is compliant with the OCI Distribution Specification is supported by containerd.

For configuring registries, see registry host configuration documentation

Features

Client

containerd offers a full client package to help you integrate containerd into your platform.

import (
  "context"

  containerd "github.com/containerd/containerd/v2/client"
  "github.com/containerd/containerd/v2/pkg/cio"
  "github.com/containerd/containerd/v2/pkg/namespaces"
)


func main() {
	client, err := containerd.New("/run/containerd/containerd.sock")
	defer client.Close()
}

Namespaces

Namespaces allow multiple consumers to use the same containerd without conflicting with each other. It has the benefit of sharing content while maintaining separation with containers and images.

To set a namespace for requests to the API:

context = context.Background()
// create a context for docker
docker = namespaces.WithNamespace(context, "docker")

containerd, err := client.NewContainer(docker, "id")

To set a default namespace on the client:

client, err := containerd.New(address, containerd.WithDefaultNamespace("docker"))

Distribution

// pull an image
image, err := client.Pull(context, "docker.io/library/redis:latest")

// push an image
err := client.Push(context, "docker.io/library/redis:latest", image.Target())

Containers

In containerd, a container is a metadata object. Resources such as an OCI runtime specification, image, root filesystem, and other metadata can be attached to a container.

redis, err := client.NewContainer(context, "redis-master")
defer redis.Delete(context)

OCI Runtime Specification

containerd fully supports the OCI runtime specification for running containers. We have built-in functions to help you generate runtime specifications based on images as well as custom parameters.

You can specify options when creating a container about how to modify the specification.

redis, err := client.NewContainer(context, "redis-master", containerd.WithNewSpec(oci.WithImageConfig(image)))

Root Filesystems

containerd allows you to use overlay or snapshot filesystems with your containers. It comes with built-in support for overlayfs and btrfs.

// pull an image and unpack it into the configured snapshotter
image, err := client.Pull(context, "docker.io/library/redis:latest", containerd.WithPullUnpack)

// allocate a new RW root filesystem for a container based on the image
redis, err := client.NewContainer(context, "redis-master",
	containerd.WithNewSnapshot("redis-rootfs", image),
	containerd.WithNewSpec(oci.WithImageConfig(image)),
)

// use a readonly filesystem with multiple containers
for i := 0; i < 10; i++ {
	id := fmt.Sprintf("id-%s", i)
	container, err := client.NewContainer(ctx, id,
		containerd.WithNewSnapshotView(id, image),
		containerd.WithNewSpec(oci.WithImageConfig(image)),
	)
}

Tasks

Taking a container object and turning it into a runnable process on a system is done by creating a new Task from the container. A task represents the runnable object within containerd.

// create a new task
task, err := redis.NewTask(context, cio.NewCreator(cio.WithStdio))
defer task.Delete(context)

// the task is now running and has a pid that can be used to setup networking
// or other runtime settings outside of containerd
pid := task.Pid()

// start the redis-server process inside the container
err := task.Start(context)

// wait for the task to exit and get the exit status
status, err := task.Wait(context)

Checkpoint and Restore

If you have criu installed on your machine you can checkpoint and restore containers and their tasks. This allows you to clone and/or live migrate containers to other machines.

// checkpoint the task then push it to a registry
checkpoint, err := task.Checkpoint(context)

err := client.Push(context, "myregistry/checkpoints/redis:master", checkpoint)

// on a new machine pull the checkpoint and restore the redis container
checkpoint, err := client.Pull(context, "myregistry/checkpoints/redis:master")

redis, err = client.NewContainer(context, "redis-master", containerd.WithNewSnapshot("redis-rootfs", checkpoint))
defer container.Delete(context)

task, err = redis.NewTask(context, cio.NewCreator(cio.WithStdio), containerd.WithTaskCheckpoint(checkpoint))
defer task.Delete(context)

err := task.Start(context)

Snapshot Plugins

In addition to the built-in Snapshot plugins in containerd, additional external plugins can be configured using GRPC. An external plugin is made available using the configured name and appears as a plugin alongside the built-in ones.

To add an external snapshot plugin, add the plugin to containerd's config file (by default at /etc/containerd/config.toml). The string following proxy_plugin. will be used as the name of the snapshotter and the address should refer to a socket with a GRPC listener serving containerd's Snapshot GRPC API. Remember to restart containerd for any configuration changes to take effect.

[proxy_plugins]
  [proxy_plugins.customsnapshot]
    type = "snapshot"
    address =  "/var/run/mysnapshotter.sock"

See PLUGINS.md for how to create plugins

Releases and API Stability

Please see RELEASES.md for details on versioning and stability of containerd components.

Downloadable 64-bit Intel/AMD binaries of all official releases are available on our releases page.

For other architectures and distribution support, you will find that many Linux distributions package their own containerd and provide it across several architectures, such as Canonical's Ubuntu packaging.

Enabling command auto-completion

Starting with containerd 1.4, the urfave client feature for auto-creation of bash and zsh autocompletion data is enabled. To use the autocomplete feature in a bash shell for example, source the autocomplete/ctr file in your .bashrc, or manually like:

$ source ./contrib/autocomplete/ctr

Distribution of ctr autocomplete for bash and zsh

For bash, copy the contrib/autocomplete/ctr script into /etc/bash_completion.d/ and rename it to ctr. The zsh_autocomplete file is also available and can be used similarly for zsh users.

Provide documentation to users to source this file into their shell if you don't place the autocomplete file in a location where it is automatically loaded for the user's shell environment.

CRI

cri is a containerd plugin implementation of the Kubernetes container runtime interface (CRI). With it, you are able to use containerd as the container runtime for a Kubernetes cluster.

cri

CRI Status

cri is a native plugin of containerd. Since containerd 1.1, the cri plugin is built into the release binaries and enabled by default.

The cri plugin has reached GA status, representing that it is:

See results on the containerd k8s test dashboard

Validating Your cri Setup

A Kubernetes incubator project, cri-tools, includes programs for exercising CRI implementations. More importantly, cri-tools includes the program critest which is used for running CRI Validation Testing.

CRI Guides

Communication

For async communication and long-running discussions please use issues and pull requests on the GitHub repo. This will be the best place to discuss design and implementation.

For sync communication catch us in the #containerd and #containerd-dev Slack channels on Cloud Native Computing Foundation's (CNCF) Slack - cloud-native.slack.com. Everyone is welcome to join and chat. Get Invite to CNCF Slack.

Security audit

Security audits for the containerd project are hosted on our website. Please see the security page at containerd.io for more information.

Reporting security issues

Please follow the instructions at containerd/project

Licenses

The containerd codebase is released under the Apache 2.0 license. The README.md file and files in the "docs" folder are licensed under the Creative Commons Attribution 4.0 International License. You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.

Project details

containerd is the primary open source project within the broader containerd GitHub organization. However, all projects within the repo have common maintainership, governance, and contributing guidelines which are stored in a project repository commonly for all containerd projects.

Please find all these core project documents, including the:

information in our containerd/project repository.

Adoption

Interested to see who is using containerd? Are you using containerd in a project? Please add yourself via pull request to our ADOPTERS.md file.

accelerated-container-image's People

Contributors

0xflotus avatar akihirosuda avatar alibaba-oss avatar apostasie avatar austinvazquez avatar beef9999 avatar benwaffle avatar bigvan avatar coldwings avatar dependabot[bot] avatar dmcgowan avatar estebanreyl avatar fuweid avatar hileqaq avatar hsiangkao avatar liulanzheng avatar northtyphoon avatar salvete avatar sazzy4o avatar taoting1234 avatar testwill avatar waberzhuang avatar wheat2018 avatar wxx213 avatar yuchen0cc avatar zchee avatar zlseu-edu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

accelerated-container-image's Issues

Unable to use rpull from container registry

I follow the instructions on https://github.com/alibaba/accelerated-container-image/blob/main/docs/EXAMPLES.md

However, I got this error:

➜  accelerated-container-image git:(main) sudo bin/ctr rpull staging-registry.yuri.moe/redis:6.2.1_obd && sudo ctr run --net-host --snapshotter=overlaybd --rm -t staging-registry.yuri.moe/redis:6.2.1_obd demo
staging-registry.yuri.moe/redis:6.2.1_obd:                                        resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:23d8acc1c468e678019c12784bac514b09908c0accc7bf2a56ae8fe7fea9e1d6: downloading    |--------------------------------------|    0.0 B/3.3 KiB 
elapsed: 0.2 s                                                                    total:   0.0 B (0.0 B/s)                                         
done
ctr: failed to attach and mount for snapshot 7: failed to enable target for /sys/kernel/config/target/core/user_999999999/dev_7, failed:failed to open remote file https://staging-registry.yuri.moe/v2/redis/blobs/sha256:5b8ddc4be300c03f643ace1d74a62a3614224569b7d2ef46d69f4a3e96fcb856: unknown


These are the commands that run to get the OBD image and upload to my own registry.

sudo ctr content fetch registry.hub.docker.com/library/redis:6.2.1
sudo bin/ctr obdconv registry.hub.docker.com/library/redis:6.2.1 localhost:5000/redis:6.2.1_obd
sudo ctr i push  staging-registry.yuri.moe/redis:6.2.1_obd

I am able to lazy pull the image and run using this image registry.hub.docker.com/overlaybd/redis:6.2.1_obd

The registry is running and open to the public in case you would like to test that out. It is the latest image from https://hub.docker.com/_/registry if you want to set up your own regsitry.

Support Overlaybd image pre-warm with ctr pull

Current Overlaybd images are lazily loaded. In case we need to do some background image pre-warming or caching, it's a little tricky because we need to start a dummy container with the accelerated image and may be read all files or set the downloadDelay to a small number and let the background download to pull the full content.

Is it possible to support a mode that overlaybd image can be prewarmed with ctr pull? It is also perfectly good if it's not ctr but some other tools.

Thanks!

Test: Add Unit Tests

This is an issue open for the purpose of tracking unit test additions to the project.

push fails after using layer deduplication for obdconv

Steps:

  1. ./ctr obdconv --dbstr "username:password@tcp(db_host:port)/db_name" A B -> success
  2. ./ctr obdconv --dbstr "username:password@tcp(db_host:port)/db_name" A C -> success
  3. ./ctr image push B -> success
  4. ./ctr image push C -> failure
    report an error -> ctr: "" failed validation: invalid argument

manifest :
image

DADI image support for "No image conversion"

TLDR: we'd love to request for feasibility of adding DADI blobs as part of the original OCIV1 image

Use cases:

  1. Easier management of a single image for both accelerated and original
  2. More transparent onboarding since it does not require workload to change the deployment config, workloads could use the same image as OCIV1 and later the node could decide when to start switching to accelerated format for a seemless user experience.
  3. Quick rollback: similar to 2, in case we encountered issue for accl image, we can switch containerd config snapshotter to overlayfs for a quick rollback and being transparent to workloads.
  4. Easier security scanning and auditing

AFAIK, there was previously a gap in image spec to achieve this state. I'm wondering if DADI could leverage on the new addition into the image spec for reference types https://github.com/opencontainers/wg-reference-types

An example can be found here https://github.com/awslabs/soci-snapshotter#no-image-conversion

Just a random thought, please feel free to correct me:

  1. Pull image, the image is an OCI V1 image with DADI blobs as its additional artifacts.
  2. Containerd calls overlaybd-snapshotter.Prepare
  3. overlaybd-snapshotter checks the image, if it has DADI blobs, then stream with overlaybd. Otherwise fall back to overlayfs.

fail to use 'record-trace'

When I tried to record trace of an image (obd format), it failed:

sudo bin/ctr record-trace registry.hub.docker.com/overlaybd/redis:6.2.6_obd redis_trace

ctr: failed to setup network for namespace: plugin type="loopback" failed (add): failed to find plugin "loopback" in path [/opt/cni/bin/]

I followed the docs strictly to build the environment, I download containerd-1.6.0-rc.1-linux-amd64.tar.gz from containerd.io
All relative information I could get from google was about k8s, What should I do to handle with the problem?

Add OCI Artifact Conversion and Push Support

Proposed Functionality Additions:

  • Add CLI support in the obdconv command for converting DADI OCI Image to a new DADI OCI Artifact
  • Add original unconverted image as a subject referrer in the Artifact
  • Push the Artifact to specified OCI Artifact compliant registry

Fast pull the full images in parallel without lazy loading

What is the version of your Accelerated Container Image

No response

What would you like to be added?

Overlaydb is great at accelerating container image pulling and we've enjoyed the benefit and appreciate all the support from the community!

Why is this needed for Accelerated Container Image?

Problems

The ondemand data transfer and trace based prefetch are great tools, however, we do see another gap that can be filled in between: fast prefetch of all blobs.

The following are the reasons:

  1. For some applications, lazy load would change application behavior. One example would be K8s workload with startup/liveness/ready probes, that before doing lazy pulling, they can start up with no issue. After onboarding to lazy pull, they fail to start as the previous probe period is not long enough. This makes some application hard to adopt OverlayBD without changing config.
  2. It's not easy to observe the overall latency of image pull as the time has been attributed to application startup time. It also introduced new failure mode that previously we won't run application unless image pull is successful. With lazy pull, it could result in runtime IO hang or other errors hard to debug (this is especially hard for different teams owning application and the container/image runtime infra)
  3. Download full blobs can also be fast, only decompression is slow. Given OverlayBD images decompression is super fast. With high concurrency, we were able to saturate the VM bandwidth and download a multi-GB OverlayBD images in several seconds.

I am aware that the trace based prefetch would make this issue much better, but it can be costly to add the trace record CI/CD build system in a large scaled Infra with many dependencies.

Therefore, I feel if OverlayBD has a feature that is between lazy loading and trace based prefetch (let's just call it Prefetch), then it will be a perfect solution without require too much learning curve and courage to adopt (Problem 2 is a pretty big mindset shift that can slow down adoption)

Options

We propose some options here, please feel free to also add

  • Option 1 (what we are trying now): some external_image_puller to pull blobs from registry in parallel, this can be quick fast when VM network is saturated. After which, we put the blobs into registry_cache directory
    • Pros:
      • relatively easy to implement, no OverlayBD side changes required
      • Flexible for users (us) to tune performance
    • Cons:
      • Will it be thread safe as both overlaybd-tcmu and the external_image_puller might write to registry_cache, will overlaybd-tcmu be able to detect new blob caches added by external_image_puller?
      • Not an OverlayBD feature, cannot be reused by the community
      • Is there a good way to validate the integrity of the image?
  • Option 2: OverlayBD support prefetch with parallelism. (OverlayBD already supports rpull --download-blobs for prefetch full image. However, the performance is pretty slow because 1) it performs unnecessary apply, which is part of the containerd pull image library code 2) the blobs are pulls sequentially, which is slow.)
  • To make it fast, if the rpull also support downloads blobs in chunks in parallel and only return success if the image is fully downloaded.

Please feel free to also contribute ideas. Again, we appreciate all the great work from OverlayBD community. By contributing real world use cases and requirement, hopefully, we can also help drive OverlayBD adoptions.

Thanks!

Are you willing to submit PRs to contribute to this feature?

  • Yes, I am willing to implement it.

Record and replay block traces at runtime

Current trace based prefetch happens at build time that requires running a dummy container using the image, adding an accelerated layer and push to a new image. This approach makes using it difficult in the following ways:

  1. It changed the semantic of an image to an application. Now instead of many applications using a single image, each application needs to use their own image. The application-level image is quite intrusive to the runtime, e.g.g we need to ask different teams to update their workload to use different images that was actually the same targzip image and it seems unconventional in the container ecosystem.
  2. It is difficult to have an accurate trace built during the build time. In large organizations, the runtime environment can be quite complex (many dependencies on database, cloud resources, internal service, different networking, firewalls), it's super hard and costly to have such environment in the build time (which only has Docker etc. but no K8s or any other dependencies).

My thinking of the trace is at application level instead of at image level, it's better to be maintained by the workload/application owners instead of as part of an image. They decide when to record/replay. The interface could be:

  • To record:
  1. At runtime, the application owner puts a lock file or some other means (e.g. a new binary overlaybd-record {image}) to start recording, the input contains the trace filename
  2. Overlaybd starts to record the traces for the image once received such signal
  3. Stopping recording the trace could be time or signal based, the output will be an trace file
  4. Application owner collects and store the trace file
  • To replay
  1. Application owner simply put the trace file to a configured location or call a binary overlaybd-replay {image} etc

I'm totally open to suggestion and discussion, I think the trace based prefetch is a super awesome feature and would love to adopt/contribute to make it even better/easier for adoption, thanks!

cc @lihuiba @liulanzheng @BigVan

Question: why does DADI add an extra layer to the original image?

For instance,

For the image registry.hub.docker.com/library/redis:6.2.1 which originally has 6 layers, when I converted it, produced a new image with 7 layers.

Original manifest (6 layers):

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 7698,
    "digest": "sha256:de974760ddb2f32dbddb74b7bb8cff4c1eee06d43d36d11bbca1dc815173916e"
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 27139373,
      "digest": "sha256:f7ec5a41d630a33a2d1db59b95d89d93de7ae5a619a3a8571b78457e48266eba"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1730,
      "digest": "sha256:a36224ca8bbdc65bae63504c0aefe0881e4ebdf1ec362830b347a2ab5120a8de"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1417921,
      "digest": "sha256:7630ad34dcb2a98e15daa425b6e076c06f9f810c7714803a3bdbcf750a546ea0"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 10097197,
      "digest": "sha256:dd0ea236b03be5127b0b7fb44f1d39f757c5381a88d650fae0a0d80b0c493020"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 133,
      "digest": "sha256:ed6ed4f2f5a63e3428ddd456805d71dc55f68b763ae8d5350f29b8cea46373f2"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 409,
      "digest": "sha256:8788804112c6c0d3e91725e26d741c9b8e4ad910e63136e129043220eb8d11d4"
    }
  ]
}

Here's the converted manifest (7 layers):

{
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "digest": "sha256:6db6dfe0a795fc32c94a60a1279ab73c0290b0d2ec5d11e6ab0f14cd86a0b3a6",
    "size": 3445
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:a8b5fca80efae55088290f3da8110d7742de55c2a378d5ab53226a483f390e21",
      "size": 4739584,
      "annotations": {
        "containerd.io/snapshot/overlaybd/blob-digest": "sha256:a8b5fca80efae55088290f3da8110d7742de55c2a378d5ab53226a483f390e21",
        "containerd.io/snapshot/overlaybd/blob-size": "4739584"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:f29becf307f5f56a88da52737bd2edadf6e3c923f17c73da83769cb74777cadf",
      "size": 43542528,
      "annotations": {
        "containerd.io/snapshot/overlaybd/blob-digest": "sha256:f29becf307f5f56a88da52737bd2edadf6e3c923f17c73da83769cb74777cadf",
        "containerd.io/snapshot/overlaybd/blob-size": "43542528"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:6f67550bdd254a15fd1fa6e8ceb7accce4a9333b0e10c450f8fba3addface07e",
      "size": 26624,
      "annotations": {
        "containerd.io/snapshot/overlaybd/blob-digest": "sha256:6f67550bdd254a15fd1fa6e8ceb7accce4a9333b0e10c450f8fba3addface07e",
        "containerd.io/snapshot/overlaybd/blob-size": "26624"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:6ee7fdd7cdb1b84b828a99eb6939ee549360a08736c496884fa70ecded417cb9",
      "size": 2617856,
      "annotations": {
        "containerd.io/snapshot/overlaybd/blob-digest": "sha256:6ee7fdd7cdb1b84b828a99eb6939ee549360a08736c496884fa70ecded417cb9",
        "containerd.io/snapshot/overlaybd/blob-size": "2617856"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:cec340e3de29a417e9518c96c7d0b97155367a2cb33323c43905b68e7e48fc24",
      "size": 17311744,
      "annotations": {
        "containerd.io/snapshot/overlaybd/blob-digest": "sha256:cec340e3de29a417e9518c96c7d0b97155367a2cb33323c43905b68e7e48fc24",
        "containerd.io/snapshot/overlaybd/blob-size": "17311744"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:f87b0dc44fe35166cbfc50a3e2f51a80bcea899ebd16db8f22777db20248d6a7",
      "size": 8192,
      "annotations": {
        "containerd.io/snapshot/overlaybd/blob-digest": "sha256:f87b0dc44fe35166cbfc50a3e2f51a80bcea899ebd16db8f22777db20248d6a7",
        "containerd.io/snapshot/overlaybd/blob-size": "8192"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:bbf67319df63e0f0c7f655e9c01d2ec7cb45c91a7fd9c60f795241d8ba2f7280",
      "size": 12288,
      "annotations": {
        "containerd.io/snapshot/overlaybd/blob-digest": "sha256:bbf67319df63e0f0c7f655e9c01d2ec7cb45c91a7fd9c60f795241d8ba2f7280",
        "containerd.io/snapshot/overlaybd/blob-size": "12288"
      }
    }
  ],
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json"
}

Please help us understand how we can do 1:1 mapping between original layer -> converted layer.

install overlaybd snapshotter failed

Reproduce:

  1. just run INSTALL.sh, a shell error occur.
bash INSTALL.sh
Compile overlaybd-snapshotter...
bin/overlaybd-snapshotter
bin/ctr
copy config.json to /etc/overlaybd-snapshotter/
create service...
Would you like make containerd support overlaybd-snapshotter [Y/N]? (this will change /etc/containerd/config.toml and restart containerd) Y
INSTALL.sh: line 41: conditional binary operator expected
INSTALL.sh: line 41: syntax error near `/etc/containerd/config.toml'
INSTALL.sh: line 41: `        if [[ !-f /etc/containerd/config.toml ]]; then'

can not lazy pull

I can only pull the whole image, then run;
/var/log/overlaybd.log prints:
2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|main.cpp:301|dev_open:create image file failed 2022/01/26 20:21:11|INFO |th=0000000002AA8F60|main.cpp:291|dev_open:dev open /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/116/block/config.v1.json 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|config_util.h:53|ParseJSON:error open json file: /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/116/block/config.v1.json 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|image_service.cpp:273|create_image_file:error parse image config 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|main.cpp:301|dev_open:create image file failed 2022/01/26 20:21:11|INFO |th=0000000002AA8F60|main.cpp:291|dev_open:dev open /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/117/block/config.v1.json 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|config_util.h:53|ParseJSON:error open json file: /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/117/block/config.v1.json 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|image_service.cpp:273|create_image_file:error parse image config 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|main.cpp:301|dev_open:create image file failed 2022/01/26 20:21:11|INFO |th=0000000002AA8F60|main.cpp:291|dev_open:dev open /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/118/block/config.v1.json 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|config_util.h:53|ParseJSON:error open json file: /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/118/block/config.v1.json 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|image_service.cpp:273|create_image_file:error parse image config 2022/01/26 20:21:11|ERROR|th=0000000002AA8F60|main.cpp:301|dev_open:create image file failed 2022/01/26 20:23:07|INFO |th=00007F25202143C0|main.cpp:291|dev_open:dev open /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/22/block/config.v1.json 2022/01/26 20:23:07|INFO |th=00007F24DEA0D840|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 50092870, idx_bytes: 94604, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DE206C00|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 8396, idx_bytes: 92, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DD1FEC80|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 14234, idx_bytes: 164, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DC9F7BC0|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 29786352, idx_bytes: 50648, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DC1F67C0|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 18883, idx_bytes: 176, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DB9F4080|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 17718, idx_bytes: 180, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DB1EC780|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 11272123, idx_bytes: 11888, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DA1E5040|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 23701322, idx_bytes: 62140, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D81D2400|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 28651120, idx_bytes: 55368, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D71C8840|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 10817, idx_bytes: 104, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D69C1800|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 10993, idx_bytes: 100, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D61BDBC0|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 68894, idx_bytes: 448, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D59BCC40|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 28635014, idx_bytes: 62144, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D51B7C80|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 13938, idx_bytes: 104, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D49AF3C0|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 11050, idx_bytes: 92, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DF211400|zfile.cpp:509|load_jump_table:trailer_offset: 4737183, idx_offset: 4207947, idx_bytes: 529236, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DDA03800|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 150863644, idx_bytes: 284256, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D79CE840|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 19540623, idx_bytes: 45988, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24DA9EB040|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 9392, idx_bytes: 92, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D99DEC80|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 13495, idx_bytes: 104, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D91D6FC0|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 10207, idx_bytes: 100, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F24D89D4BC0|zfile.cpp:516|load_jump_table:read overwrite header. idx_offset: 10073, idx_bytes: 92, dict_size: 0, use_dict: 0 2022/01/26 20:23:07|INFO |th=00007F25202143C0|image_file.cpp:262|open_lowers:LSMT::open_files_ro(files, 22) success 2022/01/26 20:23:07|INFO |th=00007F25202143C0|image_file.cpp:362|init_image_file:RW layer path not set. return RO layers. 2022/01/26 20:23:07|INFO |th=00007F25202143C0|image_file.cpp:148|start_bk_dl_thread:no need to download 2022/01/26 20:23:07|INFO |th=00007F25202143C0|image_file.h:50|ImageFile:new imageFile, bs: 512, size: 68719476736

I can't find dir /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/x find in log on the host.
I can't tell which componet get in trouble.
Any help will be aappreciated.

Converted images show random diffs in rootfs

We found sometimes there can be random diffs in the rootfs between the original and converted image:

Running diff on container rootfs: /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450/rootfs, /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450-overlaybd/rootfs
Only in /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450/rootfs/etc/alternatives: python3
Only in /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450/rootfs/etc/apt/preferences.d: blocked_packages
Symbolic links /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450/rootfs/usr/bin/python3 and /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450-overlaybd/rootfs/usr/bin/python3 differ
Only in /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450/rootfs/var/lib/dpkg/alternatives: python3
Binary files /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450/rootfs/var/log/alternatives.log and /run/containerd/io.containerd.runtime.v2.task/default/2783b311f0831276-1682629450-overlaybd/rootfs/var/log/alternatives.log differ

These diffs can disappear the next time we convert the image. Is this an expected behavior?

obdconverted image fails to run for me

Hi,

I have been following the documentation to convert an OCI image to overlaybd friendly image based on https://github.com/alibaba/accelerated-container-image/blob/main/docs/EXAMPLES.md#convert-oci-image-into-overlaybd

But I get the following error when trying to run it. Note that instead of localhost:5000/redis:6.2.1_obd, I use myreg.azurecr.io/test/redis:6.2.1. It probably shouldn't make any difference?

ctr run --net-host --snapshotter=overlaybd --rm -t myreg.azurecr.io/test/redis:6.2.1 demo
ctr: failed to prepare extraction snapshot "extract-164412284-SC8e sha256:23e0fe431efc04eba59e21e54ec38109f73b5b5df355234afca317c0b32f7b0e": failed to attach and mount for snapshot 33: failed to mount /dev/sdh to /var/lib/overlaybd/snapshots/33/block/mountpoint: read-only file system: unknown

What should I check? The output

Environment:

root@agentpool1:/var/lib/waagent# ctr plugin ls | grep overlaybd
io.containerd.snapshotter.v1    overlaybd                -              ok

root@agentpool1:/var/lib/waagent# ctr snapshot --snapshotter overlaybd ls
KEY PARENT KIND

root@agentpool1:/var/lib/waagent# ctr images ls
REF                                         TYPE                                                      DIGEST                                                                  SIZE     PLATFORMS                                                                                               LABELS
myreg.azurecr.io/test/redis:6.2.1           application/vnd.docker.distribution.manifest.v2+json      sha256:d448b24bc45ae177ba279d04ea53ec09421dd5bee66b887d3106e0d380d6cc6b 65.0 MiB linux/amd64                                                                                             -
registry.hub.docker.com/library/redis:6.2.1 application/vnd.docker.distribution.manifest.list.v2+json sha256:08e282682a708eb7f51b473516be222fff0251cdee5ef8f99f4441a795c335b6 36.9 MiB linux/386,linux/amd64,linux/arm/v5,linux/arm/v7,linux/arm64/v8,linux/mips64le,linux/ppc64le,linux/s390x -

Question: what is the acceleration-layer?

I inspected the sample image provided in https://github.com/alibaba/accelerated-container-image/blob/main/docs/EXAMPLES.md#ondemand-pulling-image and noticed one layer at the end was marked as "acceleration-layer". However, when I followed he doc and converted registry.hub.docker.com/library/redis:6.2.1, I didn't find the layer in my converted image. May I know what the layer is?

                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:c32be80bd739307433858c79994ffc050393691d77453bd93bee1cc89c7473c1",
                        "size": 11264,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/acceleration-layer": "yes",
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:c32be80bd739307433858c79994ffc050393691d77453bd93bee1cc89c7473c1",
                                "containerd.io/snapshot/overlaybd/blob-size": "11264"
                        }
                }

docker manifest inspect registry.hub.docker.com/overlaybd/redis:6.2.1_obd

{
        "schemaVersion": 2,
        "config": {
                "mediaType": "application/vnd.docker.container.image.v1+json",
                "digest": "sha256:8e3bd5ee258431fd6bb25652afc666f7582777229c5147e096ccf9b20c06ac1c",
                "size": 3688
        },
        "layers": [
                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:a8b5fca80efae55088290f3da8110d7742de55c2a378d5ab53226a483f390e21",
                        "size": 4739584,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:a8b5fca80efae55088290f3da8110d7742de55c2a378d5ab53226a483f390e21",
                                "containerd.io/snapshot/overlaybd/blob-size": "4739584"
                        }
                },
                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:87763befd4f3289905d709bd03c969db43e512502be7e1132b625bdef487d01f",
                        "size": 43458048,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:87763befd4f3289905d709bd03c969db43e512502be7e1132b625bdef487d01f",
                                "containerd.io/snapshot/overlaybd/blob-size": "43458048"
                        }
                },
                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:5bf55fa8550c47a1054c7a02138b9f79b5f574f040b1e444ad717d320d3afc67",
                        "size": 25600,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:5bf55fa8550c47a1054c7a02138b9f79b5f574f040b1e444ad717d320d3afc67",
                                "containerd.io/snapshot/overlaybd/blob-size": "25600"
                        }
                },
                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:62a999219eb529a2403f2b5849869d3253bf1014721293333f7f66be54308b94",
                        "size": 2610688,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:62a999219eb529a2403f2b5849869d3253bf1014721293333f7f66be54308b94",
                                "containerd.io/snapshot/overlaybd/blob-size": "2610688"
                        }
                },
                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:f2d33f598db59a8a4fcb490764cdfca3157ec6a742870378154cbef93acefce9",
                        "size": 17303040,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:f2d33f598db59a8a4fcb490764cdfca3157ec6a742870378154cbef93acefce9",
                                "containerd.io/snapshot/overlaybd/blob-size": "17303040"
                        }
                },
                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:8d77203e222f30ab4b8ba2e232fd9d71880dd80f6f24fa18e45d1d578e40eb57",
                        "size": 8192,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:8d77203e222f30ab4b8ba2e232fd9d71880dd80f6f24fa18e45d1d578e40eb57",
                                "containerd.io/snapshot/overlaybd/blob-size": "8192"
                        }
                },
                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:8bdb50d0eb5ec766ba235c06ac8c8a6f44ab1beeed756efa532e73b79786e36a",
                        "size": 11776,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:8bdb50d0eb5ec766ba235c06ac8c8a6f44ab1beeed756efa532e73b79786e36a",
                                "containerd.io/snapshot/overlaybd/blob-size": "11776"
                        }
                },
                {
                        "mediaType": "application/vnd.oci.image.layer.v1.tar",
                        "digest": "sha256:c32be80bd739307433858c79994ffc050393691d77453bd93bee1cc89c7473c1",
                        "size": 11264,
                        "annotations": {
                                "containerd.io/snapshot/overlaybd/acceleration-layer": "yes",
                                "containerd.io/snapshot/overlaybd/blob-digest": "sha256:c32be80bd739307433858c79994ffc050393691d77453bd93bee1cc89c7473c1",
                                "containerd.io/snapshot/overlaybd/blob-size": "11264"
                        }
                }
        ],
        "mediaType": "application/vnd.docker.distribution.manifest.v2+json"
}

why overlaybd is faster than overlay2 while in warm startup scenario?

In my opinion Overlaybd is another lazy-pulling container image snapshotter for containerd.It's based on block device
and iscsi target driver.It will redirect IO from kernel virtual blocks to user mode overlaybd backend, finally resend to kernel local file system.I thinks overlaybd has longer IO path than overlayfs, for it will switch twice between user mode and kernal mode when container read a image file( no in cache), while overlayfs only switch once. Theoretically if container images are already downloaded, container read file IO would be slower in overlaybd than in overlayfs.

container start error when host reboot

Reproduce:

  1. create container
    ctr -n k8s.io run --runtime io.containerd.runc.v2 -t -d --snapshotter=overlaybd $dadi_image test_dadi bash
  2. after reboot the host, start the container
    ctr -n k8s.io t start -d test_dadi
  3. then the error occur
    ctr: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "bash": executable file not found in $PATH: unknown

This is because the block device is not created and mount after host reboot.

As I can see in the codes, the function attachAndMountBlockDevice that create and mount the block device is only called in Prepare and View interface. But for container start, only Mount interface is called so the block device cannot be created and mount.

I think the function attachAndMountBlockDevice should also be called in Mount interface to create and mount the block device?

Integration with firecracker-containerd

I am trying to integrate overlaybd image with firecracker-containerd, which takes firecracker as the runtime instead of runc in containerd. It is usually used in the case where an isolated environment is required and security is important.
However, the firecracker does not support virtio-fs, so we can not simply pass the container's rootfs by overlayfs, etc. Fortunately firecracker supports virtio-blk that allow us to pass a block device to the microVM, so we can pass the container rootfs by the block device.

That's why I want to use overlaydb-snapshotter in rwDev mode.
I labeled the snapshotter with writable=dev and then rpull the image, it occurs

ctr: failed to commit snapshot extract-199052258-yDl8 sha256:fd84eb22532fcbe372941242bb3ebc762860b26392aaad92fd3b4342a887a66c: failed to commit writable overl
aybd: failed to open file '/var/lib/overlaybd/snapshots/524/block/writable_data', 2: No such file or directory

When I unlabeled the snapshotter, and rpull again, it works well. It seems we should only label the snapshotter as "rwDev" mode during creating container, not pulling the image.

Could you please explain why we can not label it before rpull?

Another question is, does the prefetching work well in "rwDev" mode? And how can we verify this?

Proposal: A trace-based prefetching mechanism to accelerate cold start

Overview

Cache has been playing an important role in the whole architecture of ACI's I/O flow. When there is no cache (container cold start), however, the backend storage engine will still need to visit Registry frequently, and temporarily.

Prefetch is a common mechanism to avoid this situation. As it literally suggests, the key is to retrieve data in advance, and save them into cache.

There are many ways to do prefetch, for instance, we can simply read extra data beyond the designated range of a Registry blob. That might be called as Expand Prefetch, and the expand ratio could be 2x, 4x, or even higher, if our network bandwidth is sufficient.

Another way is to prioritize files and use landmarks, which is already adopted in stargz. The storage engine runtime should prefetch the range where prioritized files are contained. And finally this information will be leveraged for increasing cache hit ratio and mitigating read overhead.

In this article we are about to discuss a new prefetch mechanism based on container's startup I/O pattern. This mechanism should work with ACI and overlaybd image format.

Trace Prefetch

Since every single I/O request happens on user's own filesystem will eventually be mapped into one overlaybd's layer blob, we can then record all I/Os from the layer blob's perspective, and replay them later. That's why we call it Trace Prefetch.

Trace prefetch is time based, and it has greater granularity and predication accuracy than stargz. We don't mark a file, because user app might only need to read a small part of it in the beginning, simply prefetching the whole file would be less efficient. Instead, we replay the trace, by the exact I/O records that happened before. Each record contains only necessary information, such as the offset and length of the blob being read.

Trace is stored as an independent image layer, and MUST always be the uppermost one. Neither image manifest nor container snapshotter needs to know if it is a trace layer, snapshotter just downloads and extracts it as usual. The overlaybd backstore MUST recognize trace layer, and replay it accordingly.

Terminology

Record

Recording is to run a container based on the target image, persist I/O records during startup, and then dump them into a trace blob. The trace blob will be chained, and become the top layer.

Recording functionality SHOULD be integrated into container's build (compose) system, and MUST have a parameter to indicate how long the user wishes to record. After timeout, the build system MUST stop the running container, so the recording terminates as well.

It is user's responsibility to ensure the container is idempotent or stateless, in other words, it SHOULD be able to start anywhere and anytime without causing unexpected consequences.

When building a new image from a base image, the old trace layer (if exists in the base image) MUST be removed. New trace layer might be added later, if recording is desired.

Push

Push command will save both data layer and trace layer to Registry. The trace layer is transparent to the push command.

Replay

After Recording and Pushing, users could pull and run the specific image somewhere else. Snapshotter's storage backend SHOULD load the trace blob, and replay I/O records for each layer blob.

Example Usage

Suppose we have a docker runtime of containerd + overlaybd snapshotter. The example usage of recording a trace layer would be as follows:

docker pull <overlaybd_image>

docker build --record-trace=10s .

docker push <remote> <local>

Please take a look at this proposal and see if there are feasible ways to implement it.

Setting up with k8s generates lot of error logs

When I do the cri environment setup, I find a lot of failure logs in kubelet:

journalctl -u kubelet

....
Aug 20 02:45:20 k8s-master-17276307-0 kubelet[4845]: E0820 02:45:20.407176    4845 cri_stats_provider.go:376] Failed to get the info of the filesystem with mountpoint "/var/lib/containerd/io.containerd.snapshotter.v1.overlaybd": failed to get device for dir "/var/lib/containerd/io.containerd.snapshotter.v1.overlaybd": stat failed on /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd with error: no such file or directory.
.....

That's probably because overlaybd snapshot system is not in /var/lib/containerd/io.containerd.snapshotter.v1.overlaybd but rather /var/lib/overlaybd

That said, the overlaybd itself did not fail. It just generates a lot of error logs on kubelet

BUG: can't support writable label in Cri

Sandbox container can't start with annotation 'containerd.io/snapshot/overlaybd.writable: dir'.

Here is the deploy yaml:

### cat ./pod-config.yaml
metadata:
  attempt: 1
  uid: 1
  name: redis-obd
  namespace: default-1
log_directory: /tmp
linux:
  security_context:
    namespace_options:
      network: 2
annotations:
  containerd.io/snapshot/overlaybd.writable: dir

### cat ./container-config.yaml 
metadata:
  name: redis-obd-container
image:
  image: registry.hub.docker.com/overlaybd/redis:6.2.1_obd
log_path: redis.container.log

It will return error when using the command 'crictl run container-config.yaml pod-config.yaml'

E0620 22:24:50.095847   12072 remote_runtime.go:212] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd container: failed to read config(path=/var/lib/overlaybd/snapshots/12/block/config.v1.json) of snapshot 12: open /var/lib/overlaybd/snapshots/12/block/config.v1.json: no such file or directory: unknown"
DEBU[0000] RunPodSandboxResponse:                       
FATA[0000] running container: run pod sandbox: rpc error: code = Unknown desc = failed to create containerd container: failed to read config(path=/var/lib/overlaybd/snapshots/12/block/config.v1.json) of snapshot 12: open /var/lib/overlaybd/snapshots/12/block/config.v1.json: no such file or directory: unknown 

`ctr run` runs into `Authentication failed`

I successfully managed to pull, convert, push, and then rpull my image, but when I try to run the image I run into this error (registry hostname obfuscated):

image

Am I missing an authentication step?

ctr image push error after convert image where duplicated layers exist

The reproduce step:

./ctr obdconv --dbstr  $db -u $u:$p $src_image $dest_image

./ctr image rm $dest_image
./ctr obdconv --dbstr  $db -u $u:$p $src_image $dest_image

./ctr image push -u $u:$p $dest_image

Then is error is like:

manifest-sha256:046152104c87d2debcdc941adceb57a26ceb04ad1a4f12dcc25f4ceeabb3695a: waiting        |--------------------------------------|
elapsed: 0.1 s                                                                    total:   0.0 B (0.0 B/s)
ctr: content digest sha256:5eb52d402f717a0f1148e15131faf9f8e696a2d9756c85939623e9bca5354581: not found

It seems that when the remote layer is found in database, ctr obdconv didn't save the layer into local content store.

Overlaybd observability support

When using OverlayBD in production we will need to monitor the healthiness of OverlayBD components using popular cloud native instrumentation toolings.

A similar issue was brought up here: containerd/overlaybd#101 There are certain things users could try but it would be great that it's supported by the DADI service so it can be standardized and re-used. I believe this is key for helping DADI adoption.

The following metrics are some rough idea for what we'd like to monitor:

  • Overlaybd:

    1. Healthcheck ping for the Overlaybd daemon
    2. number of failed blob reads group by http status (500 for registry error, 404 for blob not exists, 403 for auth failure etc.)
    3. blob read latency for each block (e.g. 1M)
    4. Other unexpected errors such as failed to write to local cache or online decompression failures.
    5. Virtual block device IO hang monitoring
    6. Virtual block device IO latency
  • Overlaybd-snapshotter:

    1. Healthcheck ping for the snapshotter daemon
    2. Error count of all GRPC APIs (prepare, commit etc.)
    3. Latency for all GRPC APIs

It's ideal that the above metrics can be exposed in Prometheus such that's it's easy to monitor DADI in cloud native envs.

Some similar monitoring support:

Please let me know your thoughts, the metrics mentioned above are just some quick ideas, would be happy to discuss, too.

concurrent image pull cause overlaybd-snapshotter report error

With the latest codes, exec the concurrent image pull script, the error occurs.

#!/bin/bash

num=5
image='dadi_image'

crictl rmi $image

for((i=0;i<$num;i++)); do
	crictl pull $image &
done

the error is like:

FATA[0001] pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image "$dadi_image": failed to prepare extraction snapshot "extract-708662541-AFsb sha256:360a3fa07d9163d0ccae65846abc495b56720ba9f381bd1da752bb31cf4b72e1": failed to rename: rename /opt/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/new-147977752 /opt/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/2130: file exists: unknown

I added some debug log and found that github.com/containerd/containerd/snapshots/storage.CreateSnapshot function called in createSnapshot gives same snapshot id between two calls(concurrent image pull cause multi calls).

Until now I'm not sure how it happened and more info will be provided when I found something else.

Error while running busybox obd image: "failed to attach and mount for snapshot 80: failed to enable target for"

Hi! We have been able to run overlaybd containers (the redis server and Wordpress ones in the examples) on an AKS nodes, but I face an error when attempting to run a busybox obd image. It would be great if you could help identify the issue. Thanks!

Steps to repro:
.accelerated-container-image/script/performance/clean-env.sh

sudo nerdctl run --net host --rm --pull=always docker.io/library/busybox # Works

bin/ctr obdconv docker.io/library/busybox:latest docker.io/aganeshkumar/daditest:busybox_test_obd # Works

Output:

docker.io/aganeshkumar/daditest:busybox_test_obd:                                 resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:dfcd0a2ff1e99bcb845919322698e1e7a0a11d517e812f8af75b2cc61e90fc11: exists         |++++++++++++++++++++++++++++++++++++++| 
config-sha256:e85ab4f2f7c417565e4cf68c848b59e3d78e29c2fb96196d208180c2f3fb049f:   exists         |++++++++++++++++++++++++++++++++++++++| 
elapsed: 0.6 s                                                                    total:   0.0 B (0.0 B/s)  

nerdctl image push docker.io/aganeshkumar/daditest:busybox_obd # No error, can see image in this public dockerhub repo

.accelerated-container-image/script/performance/clean-env.sh # To remove previously pulled images

sudo nerdctl run -it --net host --rm --snapshotter=overlaybd docker.io/aganeshkumar/daditest:busybox_test_obd # Errors

Error message:

FATA[0000] failed to attach and mount for snapshot 80: failed to enable target for /sys/kernel/config/target/core/user_999999999/dev_80, failed:failed to open switch file `https://docker.io/v2/aganeshkumar/daditest/blobs/sha256:58e554736a9008721c7a0918428315cce2678f6440bb39dc9689ef22a809b7ac: unknown 

For additional context:

  • This is the script to setup a node on AKS with overlaybd snapshotter: https://github.com/ganeshkumar5699/container-acceleration. I would assume, the same issue is hit even with a regular Linux VM.
  • If we don't remove the converted obd image locally, push the image and then attempt to run it, it works (presumably because the layers already exist locally)

Potential cause of the issue:
Not all the layers of the obd image are being converted locally or being pushed properly.

It would be great to know how to make this work, as we want to run benchmarking tests with different converted obd images (starting with busybox). Thank you!

obdconv runs into "writeable file stat failed"

I'm running into this, when trying out the examples,

image

Logs don't seem to have much information, just a bunch of start-up output but no errors.

My env is,

Go lang 1.17.2
Ubuntu 20.04 (Hyper-V VM on Windows 11)
Runc 1.0
Containerd 1.6.6

Everything seems to be compiling, installing, starting up, etc, but I haven't been able to get obdconv working. If there is any other information I can provide please let me know. Thanks,

How to config to skip https verify?

I'm using private self signed registry to do tests, when I try to run container, it prints:

FATA[0003] failed to attach and mount for snapshot 141: failed to enable target for /sys/kernel/config/target/core/user_999999999/dev_141, failed:failed to open remote file https://10.31.144.9/v2/redis/blobs/sha256:17f939605cddb2d59a95acc98d2b81a19ba4230c7999dfdcd60564ab8d133b0d: No such file or directory: unknown

And the log on registry are:

2023-03-23 18:29:09 2023/03/23 10:29:09 http: TLS handshake error from 172.17.0.1:37112: local error: tls: bad record MAC

I met the problem before and fixed it by using this config to skip verify. I want to know how to skip https verify using dadi?

crictl pull image error

With the latest codes, the image download error by 'crictl pull' command.

The error like:
pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image "$dadi_image": failed to prepare extraction snapshot "extract-545136048-IAjX sha256:c3b59cdbc3b160347a81159bc96aca9ceed3995c3d3814304e252e7853c4354f": failed to rename: rename /opt/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/new-061002457 /opt/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/3479: file exists: unknown

When I clear the containerd root (which include overlaybd-snapshotter root dir), this issue disappear. May be there are some compatibility problems with the old version overlaybd-snapshotter .

Enable test code coverage report on pull request

What is the version of your Accelerated Container Image

No response

What would you like to be added?

Onboard https://codecov.io/ to provide test code coverage report on pull request

Why is this needed for Accelerated Container Image?

Improve code quality

Are you willing to submit PRs to contribute to this feature?

  • Yes, I am willing to implement it.

How to configure p2p and cache

    > @bengbeng-pp Currently in Alibaba Cloud, only the Function Compute uses trace prefetching, because it's relatively easier for them to record trace. Some business are reluctant to do such a thing.

I think what you need is Cache + P2P distribution. For each of them DADI has an open-source implementation. By setting up a large scale of SSD cluster, you basically distribute / cache every hot piece of data in the network and thus a mighty network filesystem is formed :-)

Hello,Is there any documentation on how to configure cache and p2p? When I pulled obd format image from registry, I can not see anything from /opt/overlaybd/registry_cache

Originally posted by @dbfancier in #120 (comment)

Will DADI support device mapper in later?

The rootfs size limit will prevent one of the containers in the same host cost too much disk space.

It seems that the overlayfs cannot make it. The device mapper may help, but I don't know if DADI could support device mapper.

can not run container

Use the rpull subcommand to pull the overlaybd format image, but when I execute the following command:
ctr run --net-host --snapshotter=overlaybd --rm -t registry.hub.docker.com/overlaybd/redis:6.2.1_obd demo
I get error :
ctr: failed to attach and mount for snapshot 8: failed to enable target for /sys/kernel/config/target/core/user_999999999/dev_8, : unknown
How to solve this problem ?

Overlaybd versioning strategy

Want to start a thread to discuss the general versioning strategy.

We need to way to know if the converted images can work with previous or future version of overlaybd driver. If image convertor introduces any breaking change, we need a way to check the version to know if it can work with current runtime setup.

We are wondering if we can start to include a version annotation in the manifest.

"annotations": {
"containerd.io/snapshot/overlaybd/version": "0.1.0"
}

"annotations": {
"containerd.io/snapshot/fastoci/version": "0.1.0"
}

Overlaybd blobs integrity check

What is the version of your Accelerated Container Image

No response

What would you like to be added?

Hi Overlaybd community, is there a way to verify the integrity of the downloaded OverlayBD blobs? The context is we are trying to pre-pull all the blobs into the registry_cache directory then we start running container. We need to do this to avoid lazy pulling (for better stability and observability), at the same time, we download blobs in parallel with range read + high concurrency so we can finish downloading multiple GB images in seconds. We still benefit from the Overlaybd fast and online decompression feature. The overall POC result is pretty promising.

However, we need to run integrity check on the downloaded blobs, is there a fast way to do the validation? The sha256sum over the blobs is too slow and defeated the purpose.

Why is this needed for Accelerated Container Image?

For faster and more reliable image pull.

Are you willing to submit PRs to contribute to this feature?

  • Yes, I am willing to implement it.

Convert to accelerated image within a docker container?

Is it possible to build overlaybd accelerated images from within a container?

I see that buildkit is experimentally supported (https://github.com/data-accelerator/buildkit) and can be run in a container (https://github.com/data-accelerator/buildkit#containerizing-buildkit), but I also see that accelerator layer is not supported. (https://github.com/data-accelerator/buildkit#containerizing-buildkit).

Is there another path for building or converting overlaybd images within a container?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.