Deion <div class="snippet-clipboard-content notranslate position-relative

feat(docker): Support buildkitd moby worker in dockerd 22.06-beta,about tensorchord/envd

Comments (36)

knight42 commented on August 16, 2024 2

I dived a little deeper into buildx and I am sure that docker load is not necessarily required by buildx.
Excerpt from the official documentaion: https://docs.docker.com/engine/reference/commandline/buildx_create/#driver

docker driver
Uses the builder that is built into the docker daemon. With this driver, the --load flag is implied by default on buildx build. However, building multi-platform images or exporting cache is not currently supported.

A PoC is available as well: https://gist.github.com/knight42/6c128a2edf7cebcb6816343da833295a. The built image is present in docker images without docker load.

After learning about that, I have been trying to get rid of docker load in envd, but it is unfortunate that the version of bundled buildkitd in docker engine 20.10.14 is v0.8.3-4-gbc07b2b8, while mergeop is introduced in v0.10.3.

That said, even though the bundled builkitd in docker might be new enough to support mergeop in the future, I think we still need some fallback mechanism, like using docker-container driver as what we did now.

from envd.

gaocegege commented on August 16, 2024 2

https://github.com/tensorchord/buildkit/pull/1/files

I am working on it. It is not easy.. 😟

Things we may need to change:

source/containerimage/pull.go to use docker/docker/image.Store
worker

from envd.

gaocegege commented on August 16, 2024 2

It is possible! https://github.com/gaocegege/buildkit/pull/1/files reuses the docker image store when caching the docker image in the buildkit.

Builtkit instance in the container owns its own image cache. This PR reuses /var/lib/docker/overlay2/image/ instead of using its own separate cache.

from envd.

gaocegege commented on August 16, 2024 1

buildx does not work in the local docker daemon too. We need to specify --load to load the artifact into docker.

But there is an optimization we could use:

$ docker volume inspect
[
    {
        "CreatedAt": "2022-05-05T17:30:23+08:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/buildx_buildkit_amazing_albattani0_state/_data",
        "Name": "buildx_buildkit_amazing_albattani0_state",
        "Options": null,
        "Scope": "local"
    }
]

$ docker inspect container
        {
          "Type": "volume",
          "Source": "buildx_buildkit_amazing_albattani0_state",
          "Target": "/var/lib/buildkit"
        }

We could create a volume to keep the cache persistent.

from envd.

gaocegege commented on August 16, 2024 1

Now we use diff and merge to reduce the size. But docker load is still slow

7GB image load takes ~17s

from envd.

gaocegege commented on August 16, 2024 1

Yeah, It comes from containerd diff, I think.

Maybe we can have a look at how docker build does when loading the image into its local image store.

from envd.

gaocegege commented on August 16, 2024 1

The Docker daemon was explicitly designed to have exclusive access to /var/lib/docker. Nothing else should touch, poke, or tickle any of the Docker files hidden there.

Why is that? It’s one of the hard learned lessons from the dotCloud days. The dotCloud container engine worked by having multiple processes accessing /var/lib/dotcloud simultaneously. Clever tricks like atomic file replacement (instead of in-place editing), peppering the code with advisory and mandatory locking, and other experiments with safe-ish systems like SQLite and BDB only got us so far; and when we refactored our container engine (which eventually became Docker) one of the big design decisions was to gather all the container operations under a single daemon and be done with all that concurrent access nonsense.

This means that if you share your /var/lib/docker directory between multiple Docker instances, you’re gonna have a bad time. Of course, it might work, especially during early testing. “Look ma, I can docker run ubuntu!” But try to do something more involved (pull the same image from two different instances…) and watch the world burn.

from envd.

gaocegege commented on August 16, 2024 1

A new exporter envd is introduced in the buildkit container.

The image is loaded into the docker host successfully but it requires a dockerd reboot to find the new image. Seems that dockerd does not watch the filesystem, I will figure out.

buildctl build ... --output type=envd,name=gaoce
[+] Building 1.5s (4/4) FINISHED                                                                                                                                      
 => docker-image://docker.io/library/python:3.8                                                                                                                  1.5s
 => => resolve docker.io/library/python:3.8                                                                                                                      1.5s
 => CACHED ls                                                                                                                                                    0.0s
 => CACHED pip install -i https://mirror.sjtu.edu.cn/pypi/web/simple jupyter                                                                                     0.0s
 => exporting to image                                                                                                                                           0.0s
 => => exporting layers                                                                                                                                          0.0s
 => => writing image sha256:470747d54520023ee32931048063d1f383d52046ba95625a3d41411805850893                                                                     0.0s
 => => naming to gaoce                                                                                                                                           0.0s

from envd.

gaocegege commented on August 16, 2024

We merged multiple base layers into the image, thus the size is large. diff should be used to reduce the size. Ref e64786b

from envd.

gaocegege commented on August 16, 2024

The base image nvidia:cuda:11.2-devel-ubuntu2004 is 4GB, but our base image is 7GB. We should figure out where the 3GB comes from

from envd.

knight42 commented on August 16, 2024

Perhaps we could make MIDI work in a way similar to docker-buildx, which uses the BuildKit library bundled into the Docker daemon with docker driver, so that the image is actually built by dockerd and we don't need to load the image manually.

from envd.

gaocegege commented on August 16, 2024

Cool! I think it is a great idea. Let's investigate how docker buildx plugin does.

from envd.

gaocegege commented on August 16, 2024

Ref https://github.com/docker/buildx/blob/3adca1c17db9439fb94499ccef62f6a40dcaf899/build/build.go#L1354:#L1375

Buildx also relies on docker load.

	w = &waitingWriter{
		PipeWriter: pw,
		f: func() {
			resp, err := c.ImageLoad(ctx, pr, false)
			defer close(done)
			if err != nil {
				pr.CloseWithError(err)
				w.mu.Lock()
				w.err = err
				w.mu.Unlock()
				return
			}
			prog := progress.WithPrefix(status, "", false)
			progress.FromReader(prog, "importing to docker", resp.Body)
		},
		done:   done,
		cancel: cancel,
	}
	return w, func() {
		pr.Close()
	}, nil
}

from envd.

knight42 commented on August 16, 2024

7GB image load takes ~17s

Wow! It is really awsome! 💯

from envd.

VoVAllen commented on August 16, 2024

I found the sending tarball still take ~30s on my machine. Is this expected?

from envd.

gaocegege commented on August 16, 2024

Yeah, it is expected in the current design. send tarball is the docker load process

from envd.

gaocegege commented on August 16, 2024

@knight42 Thanks for the research!

but it is unfortunate that the version of bundled buildkitd in docker engine 20.10.14 is v0.8.3-4-gbc07b2b8, while mergeop is introduced in v0.10.3

I am wondering why we should use docker 20.10.14, is it the version that supports built-in load?

from envd.

gaocegege commented on August 16, 2024

while mergeop is introduced in v0.10.3.

Currently, we use buildkit v0.10.1, and merge op is supported in this version. I am not sure if it only works after v0.10.3 🤔

from envd.

gaocegege commented on August 16, 2024

Got the problem here.

failed to solve LLB: failed to solve: failed to load LLB: unknown API capability mergeop

The client returns the error that we cannot use merge op if we eliminate docker load.

from envd.

gaocegege commented on August 16, 2024

Ref docker/buildx#1132

from envd.

knight42 commented on August 16, 2024

I am wondering why we should use docker 20.10.14, is it the version that supports built-in load?

Nope, it is just the version of the dockerd in my laptop..

while mergeop is introduced in v0.10.3.

Currently, we use buildkit v0.10.1, and merge op is supported in this version. I am not sure if it only works after v0.10.3 🤔

Sorry I double-checked the MergeOp PR, the merge op is actually introduced in v0.10.0.

The client returns the error that we cannot use merge op if we eliminate docker load.

Yeah, since we heavily leverage merge op in envd, if we want to get rid of docker load, we need to make sure the bundled buildkitd in dockerd has the support of merge op.

from envd.

gaocegege commented on August 16, 2024

20.10.16 still uses v0.8.3-4-gbc07b2b8. I am afraid that we need to wait until the next milestone of docker.

https://github.com/moby/moby/blob/v20.10.16/vendor.conf#L36

from envd.

gaocegege commented on August 16, 2024

Things that we need to confirm:

How buildkit stores layers
How buildx loads images into the local image store
How buildkit is initialized in docker
What is changed in earthly customized buildkit? moby/buildkit@master...earthly:earthly-main

from envd.

gaocegege commented on August 16, 2024

Using docker buildkitd directly is not possible now. We will figure out if we can mount some dir in the envd_buildkitd container to achieve a similar experience.

from envd.

gaocegege commented on August 16, 2024

Here https://github.com/moby/moby/blob/master/builder/builder-next/controller.go#L44:#L220 docker creates a buildkit daemon (control.Controller).

And the most important part is https://github.com/moby/moby/blob/master/builder/builder-next/worker/worker.go#L83 . Docker has a new worker type moby

from envd.

gaocegege commented on August 16, 2024

	bk, err := buildkit.New(buildkit.Opt{
		SessionManager:      sm,
		Root:                filepath.Join(config.Root, "buildkit"),
		Dist:                d.DistributionServices(),
		NetworkController:   d.NetworkController(),
		DefaultCgroupParent: cgroupParent,
		RegistryHosts:       d.RegistryHosts(),
		BuilderConfig:       config.Builder,
		Rootless:            d.Rootless(),
		IdentityMapping:     d.IdentityMapping(),
		DNSConfig:           config.DNSConfig,
		ApparmorProfile:     daemon.DefaultApparmorProfile(),
	})

buildkit (builder-next.Controller) uses dockerd's DistributionService, thus the images' blobs and metadata are stored in docker image store directly. Thus there is no need to load images.

from envd.

gaocegege commented on August 16, 2024

moby/moby#9935 (comment)

The maintainers said it is not possible to run multi docker daemons on one data root /var/lib/docker

from envd.

gaocegege commented on August 16, 2024

But, I still think it is possible to run a (minimal) docker daemon in our envd_buildkitd container. /var/lib/docker/image is only needed in envd_buildkitd.

The main concern above is that multiple daemons on the same data root (/var/lib/docker/) may break the consistency. Let's have a look at the dir architecture of the image part in /var/lib/docker

/var/lib/docker/image/overlay2
├── distribution
│   ├── diffid-by-digest
│   └── v2metadata-by-diffid
├── imagedb
│   ├── content
│   └── metadata
├── layerdb
│   ├── mounts
│   ├── sha256
│   └── tmp
└── repositories.json

distribution is used to communicate with OCI image registry, thus it is not used in envd_buildkitd. imagedb and layerdb are actually a key-value store and the key is the file name (which is a HEX). Then it should not be affected by concurrent daemons.

The last one repositories.json is to store the map from image tag name to image ID:

{
    "ubuntu": {
        "ubuntu:20.04": "sha256:53df61775e8856a464ca52d4cd9eabbf4eb3ceedbde5afecc57e417e7b7155d5",
        "ubuntu@sha256:47f14534bda344d9fe6ffd6effb95eefe579f4be0d508b7445cf77f61a0e5724": "sha256:53df61775e8856a464ca52d4cd9eabbf4eb3ceedbde5afecc57e417e7b7155d5"
    }
}

It may be affected by the concurrent daemons. But we may have some workarounds for it. For example, we can rename the image using docker API, and not tag it in the low level. We avoid manipulating this JSON file directly.

Thus in the buildkit exporter code, we should remove logic like this:

	if e.opt.ReferenceStore != nil {
		targetNames := strings.Split(e.targetName, ",")
		for _, targetName := range targetNames {
			tagDone := oneOffProgress(ctx, "naming to "+targetName)
			tref, err := distref.ParseNormalizedNamed(targetName)
			if err != nil {
				return nil, err
			}
			if err := e.opt.ReferenceStore.AddTag(tref, digest.Digest(id), true); err != nil {
				return nil, tagDone(err)
			}
			_ = tagDone(nil)
		}
	}

Or we set ReferenceStore to nil

from envd.

gaocegege commented on August 16, 2024

We can create the image service outside of docker. Then I will try if it is possible to embed it into the buildkitd process.

package main

import (
	"context"
	"path/filepath"

	"github.com/docker/docker/api/types"
	_ "github.com/docker/docker/daemon/graphdriver/overlay2"
	"github.com/docker/docker/daemon/images"
	dmetadata "github.com/docker/docker/distribution/metadata"
	"github.com/docker/docker/image"
	"github.com/docker/docker/layer"
	"github.com/docker/docker/pkg/idtools"
	refstore "github.com/docker/docker/reference"
)

func main() {
	root := "/var/lib/docker"
	graphDriver := "overlay2"
	layerStore, err := layer.NewStoreFromOptions(layer.StoreOptions{
		Root:                      root,
		MetadataStorePathTemplate: filepath.Join(root, "image", "%s", "layerdb"),
		GraphDriver:               graphDriver,
		GraphDriverOptions:        []string{},
		IDMapping:                 idtools.IdentityMapping{},
		ExperimentalEnabled:       false,
	})
	if err != nil {
		panic(err)
	}
	m := layerStore.Map()
	for k, v := range m {
		println(k, v)
	}
	imageRoot := filepath.Join(root, "image", graphDriver)
	ifs, err := image.NewFSStoreBackend(filepath.Join(imageRoot, "imagedb"))
	if err != nil {
		panic(err)
	}

	imageStore, err := image.NewImageStore(ifs, layerStore)
	if err != nil {
		panic(err)
	}
	im := imageStore.Map()
	for k, v := range im {
		println(k, v.Size)
	}

	refStoreLocation := filepath.Join(imageRoot, `repositories.json`)
	rs, err := refstore.NewReferenceStore(refStoreLocation)
	if err != nil {
		panic(err)
	}
	_ = rs
	distributionMetadataStore, err := dmetadata.NewFSMetadataStore(filepath.Join(imageRoot, "distribution"))
	if err != nil {
		panic(err)
	}
	_ = distributionMetadataStore
	imgSvcConfig := images.ImageServiceConfig{
		DistributionMetadataStore: distributionMetadataStore,
		ImageStore:                imageStore,
		LayerStore:                layerStore,
		ReferenceStore:            rs,
	}
	imageService := images.NewImageService(imgSvcConfig)
	is, err := imageService.Images(context.TODO(), types.ImageListOptions{})
	imageService.DistributionServices()
	if err != nil {
		panic(err)
	}
	for _, i := range is {
		println(i.ID)
	}
}

from envd.

VoVAllen commented on August 16, 2024

nerdctl uses a newer buildkit https://github.com/containerd/nerdctl/blob/e77e05b5fd252274e3727e0439e9a2d45622ccb9/Dockerfile.d/SHA256SUMS.d/buildkit-v0.10.3. Can we leverage this?

from envd.

gaocegege commented on August 16, 2024

nerdctl uses a newer buildkit https://github.com/containerd/nerdctl/blob/e77e05b5fd252274e3727e0439e9a2d45622ccb9/Dockerfile.d/SHA256SUMS.d/buildkit-v0.10.3. Can we leverage this?

We are using newer than nerdctl.

from envd.

gaocegege commented on August 16, 2024

Found the root cause. docker/docker/layer.Store loads /var/lib/docker/image/overlay2/layerdb in the New func. Thus the new layer cannot be found. 😟

// newStoreFromGraphDriver creates a new Store instance using the provided
// metadata store and graph driver. The metadata store will be used to restore
// the Store.
func newStoreFromGraphDriver(root string, driver graphdriver.Driver) (Store, error) {
	caps := graphdriver.Capabilities{}
	if capDriver, ok := driver.(graphdriver.CapabilityDriver); ok {
		caps = capDriver.Capabilities()
	}

	ms, err := newFSMetadataStore(root)
	if err != nil {
		return nil, err
	}

	ls := &layerStore{
		store:       ms,
		driver:      driver,
		layerMap:    map[ChainID]*roLayer{},
		mounts:      map[string]*mountedLayer{},
		locker:      locker.New(),
		useTarSplit: !caps.ReproducesExactDiffs,
	}

	ids, mounts, err := ms.List()
	if err != nil {
		return nil, err
	}

	for _, id := range ids {
		l, err := ls.loadLayer(id)
		if err != nil {
			logrus.Debugf("Failed to load layer %s: %s", id, err)
			continue
		}
		if l.parent != nil {
			l.parent.referenceCount++
		}
	}

	for _, mount := range mounts {
		if err := ls.loadMount(mount); err != nil {
			logrus.Debugf("Failed to load mount %s: %s", mount, err)
		}
	}

	return ls, nil
}

from envd.

gaocegege commented on August 16, 2024

It's not really possible without significant changes in the code-base (and adding a lot of complexity); as mentioned: those daemon's won't know what's still being used by other daemons, so if one daemon pulls an image, the other daemon's won't know it's being pulled (so don't "see" the image in the list of images that's available locally), and if a daemon removes an image, the other daemons will fail (because an image that they expected to be there is suddenly gone).

moby/moby#37008

from envd.

gaocegege commented on August 16, 2024

I think it is the end game. I am closing the issue since it is not possible.

from envd.

gaocegege commented on August 16, 2024

docker/buildx#1132 (comment)

docker 22.06-beta supports merge op. We can have a check in envd.

If the docker version is 20.xx, then we use runc worker, if the docker version is 22.xx, we use moby worker.

from envd.

kemingy commented on August 16, 2024

moby/moby#44369

from envd.

feat(docker): Support buildkitd moby worker in dockerd 22.06-beta about envd HOT 36 OPEN

Comments (36)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent