pdtpartners / nix-snapshotter Goto Github PK
View Code? Open in Web Editor NEWBrings native understanding of Nix packages to containerd
License: MIT License
Brings native understanding of Nix packages to containerd
License: MIT License
It should be possible to use nix-snapshotter even with managed Kubernetes from cloud providers like EKS and GKE. See this blog article around using another snapshotter on EKS: https://blog.realvarez.com/using-estargz-to-reduce-container-startup-time-on-amazon-eks/
I’d absolutely love to use nix-snapshotter with podman's Quadlets for local, rootless .container
and .kube
systemd units. (I don’t know of an equivalent systemd generator for k3s.) This would require nix-snapshotter to implement a CRI-O driver, AFAIK.
The CRI-O container runtime's default overlay storage (graph) driver can be configured with additional image stores and additional layer stores, which are used to configure the overlay driver to use the specified stores for image lookup and layer lookup, respectively. This additional layer store functionality is used by nydus-storage-plugin to support nydus images & stargz-snapshotter's Stargz Store CRI-O plugin to support lazy pulling of eStargz images.
First of all, thanks for the great project! I've always wanted K8S to be declarative all the way down to container images. This is awesome!
I tried running the examples on k3s by commenting out ./kubernetes.nix
in favor of ./k3s.nix
in the following file:
nix-snapshotter/modules/nixos/vm.nix
Lines 11 to 12 in 6eb21bd
then ran nix run ".#vm"
to get a VM.
Although it successfully boots a VM with k3s, I hit an error when pulling a Nix image like the preloaded one (eg. kubectl apply -Rf /etc/kubernetes/redis
will result in a pod that fails with a pull image error).
Doing the same steps as above but using ./kubernetes.nix
works flawlessly.
The Push tests are not currently testing NewInmemoryProvider so it would be good to add these
We want to verify that nix-snapshotter works with kubernetes. It'll be nice to also include kubernetes in the try-it-out VM to easily test it out. Rootless mode probably not worth the work.
Let's take a look at some good READMEs around GitHub and write a proper README. I'll add some examples below as I find them.
At a high level we want:
We don't have to put everything in a single README.md, a common pattern is to have docs/
directory and have README link to them. We'll have to evaluate whether to split it up when we start writing.
Just a heads up, we have a k8s SIG-Node WG that is considering significant changes to the CRI around image services.
Security image access policies, authentication with in proc local key rings vs over the RPC, GC cache polices, support for runtime handlers in the image service layer that choose which image to unpack from the image index (windows platform versions etc.,) and which snapshotter to use one per runtime handler, ...
For these and a number of other reasons we should chat about other potential ways to hook into the image services api.
Thought:
Rootless nix-snapshotter+containerd will be especially great to be using with home-manager, as this allows usage of our modules outside of a NixOS system as long as its systemd based.
Note that containers still require linux namespaces, thus it still needs a linux kernel. We should note that it won't work elsewhere.
Since Kubernetes is complex, writing a NixOS module for rootless Kubernetes seems difficult. Though there is usernetes, I'm not sure what they use underneath.
k3s is a single binary, and much simpler to configure. It is missing plumbing for the kubelet flag --image-service-endpoint
here, but otherwise have no other known blockers: k3s-io/k3s#8279
Ideally both rootless k3s and rootless containerd modules should be upstreamed into Home-manager and/or nixpkgs.
As of #23 we have nix-snapshotter running a NixOS VM. Let's leverage the NixOS test framework to write integration testing.
Initially we should first create the scaffolding to make this possible, but later we'll want to flesh it out with various integration tests for default.nix
exported functions like buildImage
, copyToRegistry
, as well as using containerd+nix-snapshotter with kubernetes.
Trying to run nix-snapshotter using the home-manager setup from the readme. But the containerd systemd service doesn't start, and gives the following error:
containerd-rootless[316090]: [rootlesskit:parent] error: failed to setup UID/GID map: newuidmap 316098 [0 1000 1 1 100000 65536] failed: : exec: "newuidmap": executable file not found in $PATH
Currently we have our own pushing code under pkg/nix2container
, but we still lack support for exporting as OCI tarball, directly to containerd, and so forth.
So using skopeo will let us do the following exports: containers-storage
(for podman/buildah), dir
(non-standardized format), docker
(registry), docker-archive
, docker-daemon
, oci
(non-archive), oci-archive
.
However, skopeo will not support direct export to containerd: containers/image#1572
Perhaps we should write our own afterall. There is much of the utility in nerdctl
but its a pretty big package that I want to avoid depending on (e.g. nix-snapshotter will transitively depend on stargz-snapshotter):
We need to complete docs/manual-install.md
for how to setup nix-snapshotter
manually, with examples of the systemd units and documentation around the TOML config for containerd, nix-snapshotter, kubernetes, and nerdctl.
Running NixOS 23.11, with home-manager.
The following additions to home-manager work fine:
imports = [
nix-snapshotter.homeModules.default
];
nixpkgs.overlays = [ nix-snapshotter.overlays.default ];
virtualisation.containerd.rootless = {
enable = true;
nixSnapshotterIntegration = true;
};
However, when I attempt the final change (and nixos-rebuild switch
):
services.nix-snapshotter.rootless = {
enable = true;
};
I get the error error: nix-snapshotter cannot be found in pkgs
I've tried this both with and without flakes enabled at the nixos level, and the error is the same.
Curiously, removing the problematic final change above, and adding home.packages = with pkgs; [ nix-snapshotter ]
does not give an error. So clearly pkgs
is being correctly extended with nix-snapshotter, but for some reason it's not appearing for services.nix-snapshotter.rootless.
Any thoughts?
Investigate whether GH's container registry is free. If it is, we can move our prebuilt images like hinshun/hello:nix
to it and also wire it up with CI.
main
I'm trying out the Flake instructions. I get this error when building nix-snapshotter
:
--- FAIL: TestSnapshotter (0.01s)
--- FAIL: TestSnapshotter/no_opt (0.01s)
--- FAIL: TestSnapshotter/no_opt/TestSnapshotterView (0.00s)
snapshotter_overlay_test.go:389: []string{
"lowerdir=/build/TestSnapshotterno_optTestSnapshotterView20029197"...,
+ "userxattr",
"volatile",
}
--- FAIL: TestSnapshotter/no_opt/TestSnapshotterOverlayMount (0.00s)
snapshotter_overlay_test.go:389: []string{
"lowerdir=/build/TestSnapshotterno_optTestSnapshotterOverlayMount"...,
"upperdir=/build/TestSnapshotterno_optTestSnapshotterOverlayMount"...,
+ "userxattr",
"workdir=/build/TestSnapshotterno_optTestSnapshotterOverlayMount2"...,
}
--- FAIL: TestSnapshotter/AsynchronousRemove (0.00s)
--- FAIL: TestSnapshotter/AsynchronousRemove/TestSnapshotterView (0.00s)
snapshotter_overlay_test.go:389: []string{
"lowerdir=/build/TestSnapshotterAsynchronousRemoveTestSnapshotter"...,
+ "userxattr",
"volatile",
}
--- FAIL: TestSnapshotter/AsynchronousRemove/TestSnapshotterOverlayMount (0.00s)
snapshotter_overlay_test.go:389: []string{
"lowerdir=/build/TestSnapshotterAsynchronousRemoveTestSnapshotter"...,
"upperdir=/build/TestSnapshotterAsynchronousRemoveTestSnapshotter"...,
+ "userxattr",
"workdir=/build/TestSnapshotterAsynchronousRemoveTestSnapshotterO"...,
}
I'm not sure of the root cause of the error; the machine is running k3s on NixOS, with kernel 5.15.114.
Looks like KVM is not allowed on standard runners and we'll have to upgrade to larger runners to run NixOS tests on GH actions. If we are able to get it, then here's how to enable it.
See this example repo, but it's using a third party GH runner
- name: Enable KVM group perms
run: |
echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' | sudo tee /etc/udev/rules.d/99-kvm4all.rules
sudo udevadm control --reload-rules
sudo udevadm trigger --name-match=kvm
cachix/install-nix-action
to enable kvm
: - uses: cachix/install-nix-action@v16
with:
extra_nix_config: "system-features = nixos-test benchmark big-parallel kvm"
/dev/kvm
: sudo chmod o+rw /dev/kvm
nix flake check
Currently nix flake check
is failing since nixosConfigurations.vm
has no filesystems."/"
or boot.loader.grub.device
defined. This is because the default options for qemu-vm.nix
module from upstream starts the VM without a boot loader to speed up startup. However there is also assertions
that assume these should be set even though nixos-rebuild build-vm
is happy with it.
Default nerdctl via config to use nix-snapshotter so usage can be simplified without --snapshotter nix
So far we have only tested the nix-snapshotter on NixOS so it would be good to make sure it works on other operating systems like Ubuntu.
So at a high-level, nix-snapshotter is basically overlayfs snapshotter but with additional support for nix layers.
Currently there's a lot of code duplication between pkg/nix/nix.go
and the overlayfs snapshotter from containerd: https://github.com/containerd/containerd/blob/main/snapshots/overlay/overlay.go
This is because we use private members like o.ms
:
ctx, t, err := o.ms.TransactionContext(ctx, false)
to handle the metadata layer's bolt transactions.
If we look around the remote snapshotter ecosystem, you'll see this is duplicated the same way:
https://github.com/containerd/stargz-snapshotter/blob/main/snapshot/snapshot.go
We should consider upstreaming a refactor to make it possible to embed it, thus deleting a lot of code from pkg/nix/nix.go
.
It would be worth looking into how much code coverage we have with our tests and improving it in areas that are lacking
This is a super cool project, and I can't wait to play around with it :)) but I believe that discoverability will suffer due to the poor repository name. May I suggest naming something along the lines of nix-native-oci-images
The ceremony around setting up rootless containerd and nix-snapshotter running in the same user namespace is a bit complex. We should complete docs/rootless.md
with a diagram or two explaining this, as well as the advanced module options around bindMounts
and nsenter
for extending rootless containerd with other sibling services (like fuse-overlayfs
, other snapshotters, etc).
Nomad is Hashicorp's container orchestrator. They have a containerd-driver, this means its very likely nix-snapshotter
is usable with Nomad without much effort. If there is a nomad NixOS module, we should probably try it out and add documentation for it.
Error message:
rpc error: code = InvalidArgument desc = unable to initialize unpacker: no unpack platforms defined: invalid argument
Source of error:
nix-snapshotter/pkg/nix2container/load.go
Line 51 in 6eb21bd
$ sudo nano /var/snap/microk8s/current/args/kubelet
# Add this to the end of the file:
# --image-service-endpoint=unix:///run/nix-snapshotter/nix-snapshotter.sock
Starting nix-snapshotter:
$ git clone https://github.com/pdtpartners/nix-snapshotter
$ cd nix-snapshotter
$ sudo "$(nix-build)/bin/nix-snapshotter"
We deploy to microk8s using:
kubectl apply -f "$(nix-build image.nix)"
# image.nix
{ pkgs ? import (builtins.fetchTarball {
url = "https://github.com/NixOS/nixpkgs/archive/refs/tags/23.05.tar.gz";
sha256 = "10wn0l08j9lgqcw8177nh2ljrnxdrpri7bp0g7nvrsn9rkawvlbf";
}) {}
, nix-snapshotter ? import (builtins.fetchTarball {
url = "https://github.com/pdtpartners/nix-snapshotter/archive/6eb21bd3429535646da4aa396bb0c1f81a9b72c6.tar.gz";
sha256 = "11sfy3kf046p8kacp7yh8ijjpp6php6q8wxlbya1v5q53h3980v1";
})
}:
let
redis-image = nix-snapshotter.default.buildImage {
name = "abc123-redis";
tag = "latest";
config.entrypoint = [ "${pkgs.redis}/bin/redis-server" ];
};
in
pkgs.writeText "pod.json" (builtins.toJSON rec {
apiVersion = "v1";
kind = "Pod";
metadata.name = "redis";
metadata.labels.name = metadata.name;
spec.containers = [{
inherit (metadata) name;
args = [ "--protected-mode" "no" ];
image = "nix:0${redis-image}";
ports = [{
name = "client";
containerPort = 6379;
}];
}];
})
Ideally compile it with Nix instead of Go, so tutorials for normal Go projects might not be a good fit for this project.
Try doing some research, but this likely a good idea:
Let's keep it to compile only, and then later we can add static-analysis/linting, unit-tests, and integration tests.
Currently nix-snapshotter's configured via an optional 2 positional args to main.go
. Let's refactor this out to a toml, and allow a --config
arg to be passed to main.go
.
I prefer using urfave/cli/v2. See this project as an example of organizing the commands: https://github.com/openllb/hlb/blob/master/cmd/hlb/main.go#L7
Test that we can run nix-snapshotter
against a non-conventional /nix/store
dir path. We should default to /nix/store
and allow overriding via WithNixStoreDir
snapshotter opt with an accompany NixOS test to verify everything works as expected.
Currently we are using critcl
to pull our images and ctr
to run our containers. As ctr
is quite low level would make more sense to combine these commands into a single nerdctl run
command.
Hi, I've tried to run the example but got
error: flake 'github:pdtpartners/nix-snapshotter' does not provide attribute 'apps.aarch64-linux.vm', 'packages.aarch64-linux.vm', 'legacyPackages.aarch64-linux.vm' or 'vm'
I can see systems = [ "x86_64-linux" ];
in flake.nix, is there any way to get around that ?
Currently nix-snapshotter is pretty quiet. We should add some appropriate logs to help debug issues when they arise and generally confirming that it's working the way we expect.
Some ideas:
main.go
log when we started listening, e.g.Initialized Nix Snapshotter
Registered GRPC snapshots server
Listening on unix://<addr>
nix build --out-link ...
for substitution before running the exec.Command
Since remotes.Pusher
is an interface, we can implement one that mocks out the networking and use it help us validate test expectations. This will probably involve adding a variadic option pattern to Push
so that one can override what remotes.Pusher
it should use.
A table test makes sense in this case, because there are many variants of images to test. For example, vanilla images, nix2container images, hybrid images, empty images, so-on.
Hello! First, thank you so much for this, I love the idea! And the feedback loop is soooooo much faster!
I'm struggling to find the reason behind this error message, which only appears in this particular setup: docteurklein/nixok@208201b#diff-206b9ce276ab5971a2489d75eb1b12999d4bf3843b7988cbe8d687cfde61dea0R114
When I'm shipping the same config but using nix2container directly, my pod starts, but when I use nix-snapshotter, I get this error message:
Error: failed to create containerd container: failed to mount /var/lib/containerd/tmpmounts/containerd-mount…: no such file or directory
Any idea?
sudo ctr -n k8s.io -a /run/containerd/containerd.sock c info d40b2284de669d166a8a1fac32971cdb88c5e382b1d4d172cab6f5854278fed3
{
"ID": "d40b2284de669d166a8a1fac32971cdb88c5e382b1d4d172cab6f5854278fed3",
"Labels": {
"app": "s1",
"io.cri-containerd.kind": "sandbox",
"io.kubernetes.pod.name": "s1-648769894-xs96h",
"io.kubernetes.pod.namespace": "default",
"io.kubernetes.pod.uid": "0b8c18e0-87ca-41ac-a0ab-e58df9d79a14",
"pod-template-hash": "648769894"
},
"Image": "docker.io/library/pause:latest",
"Runtime": {
"Name": "io.containerd.runc.v2",
"Options": {
"type_url": "containerd.runc.v1.Options",
"value": "SAE="
}
},
"SnapshotKey": "d40b2284de669d166a8a1fac32971cdb88c5e382b1d4d172cab6f5854278fed3",
"Snapshotter": "nix",
"CreatedAt": "2023-10-03T13:32:31.850481864Z",
"UpdatedAt": "2023-10-03T13:32:31.850481864Z",
"Extensions": {
"io.cri-containerd.sandbox.metadata": {
"type_url": "github.com/containerd/cri/pkg/store/sandbox/Metadata",
"value": "eyJWZXJzaW9uIjoidjEiLCJNZXRhZGF0YSI6eyJJRCI6ImQ0MGIyMjg0ZGU2NjlkMTY2YThhMWZhYzMyOTcxY2RiODhjNWUzODJiMWQ0ZDE3MmNhYjZmNTg1NDI3OGZlZDMiLCJOYW1lIjoiczEtNjQ4NzY5ODk0LXhzOTZoX2RlZmF1bHRfMGI4YzE4ZTAtODdjYS00MWFjLWEwYWItZTU4ZGY5ZDc5YTE0XzAiLCJDb25maWciOnsibWV0YWRhdGEiOnsibmFtZSI6InMxLTY0ODc2OTg5NC14czk2aCIsInVpZCI6IjBiOGMxOGUwLTg3Y2EtNDFhYy1hMGFiLWU1OGRmOWQ3OWExNCIsIm5hbWVzcGFjZSI6ImRlZmF1bHQifSwiaG9zdG5hbWUiOiJzMS02NDg3Njk4OTQteHM5NmgiLCJsb2dfZGlyZWN0b3J5IjoiL3Zhci9sb2cvcG9kcy9kZWZhdWx0X3MxLTY0ODc2OTg5NC14czk2aF8wYjhjMThlMC04N2NhLTQxYWMtYTBhYi1lNThkZjlkNzlhMTQiLCJkbnNfY29uZmlnIjp7InNlcnZlcnMiOlsiMTAuMC4wLjI1NCJdLCJzZWFyY2hlcyI6WyJkZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwic3ZjLmNsdXN0ZXIubG9jYWwiLCJjbHVzdGVyLmxvY2FsIl0sIm9wdGlvbnMiOlsibmRvdHM6NSJdfSwibGFiZWxzIjp7ImFwcCI6InMxIiwiaW8ua3ViZXJuZXRlcy5wb2QubmFtZSI6InMxLTY0ODc2OTg5NC14czk2aCIsImlvLmt1YmVybmV0ZXMucG9kLm5hbWVzcGFjZSI6ImRlZmF1bHQiLCJpby5rdWJlcm5ldGVzLnBvZC51aWQiOiIwYjhjMThlMC04N2NhLTQxYWMtYTBhYi1lNThkZjlkNzlhMTQiLCJwb2QtdGVtcGxhdGUtaGFzaCI6IjY0ODc2OTg5NCJ9LCJhbm5vdGF0aW9ucyI6eyJrdWJlcm5ldGVzLmlvL2NvbmZpZy5zZWVuIjoiMjAyMy0xMC0wM1QxNTozMjozMS41MjI0NDEzNjQrMDI6MDAiLCJrdWJlcm5ldGVzLmlvL2NvbmZpZy5zb3VyY2UiOiJhcGkiLCJwcm9maWxlcy5ncmFmYW5hLmNvbS9jcHUucG9ydF9uYW1lIjoiaHR0cC1tZXRyaWNzIiwicHJvZmlsZXMuZ3JhZmFuYS5jb20vY3B1LnNjcmFwZSI6InRydWUiLCJwcm9maWxlcy5ncmFmYW5hLmNvbS9tZW1vcnkucG9ydF9uYW1lIjoiaHR0cC1tZXRyaWNzIiwicHJvZmlsZXMuZ3JhZmFuYS5jb20vbWVtb3J5LnNjcmFwZSI6InRydWUifSwibGludXgiOnsiY2dyb3VwX3BhcmVudCI6Ii9rdWJlcG9kcy5zbGljZS9rdWJlcG9kcy1iZXN0ZWZmb3J0LnNsaWNlL2t1YmVwb2RzLWJlc3RlZmZvcnQtcG9kMGI4YzE4ZTBfODdjYV80MWFjX2EwYWJfZTU4ZGY5ZDc5YTE0LnNsaWNlIiwic2VjdXJpdHlfY29udGV4dCI6eyJuYW1lc3BhY2Vfb3B0aW9ucyI6eyJwaWQiOjF9LCJzdXBwbGVtZW50YWxfZ3JvdXBzIjpbMTAwMF0sInNlY2NvbXAiOnt9fSwib3ZlcmhlYWQiOnt9LCJyZXNvdXJjZXMiOnsiY3B1X3BlcmlvZCI6MTAwMDAwLCJjcHVfc2hhcmVzIjoyfX19LCJOZXROU1BhdGgiOiIvdmFyL3J1bi9uZXRucy9jbmktNTg3Y2I2ZjQtODE1NS1iNWIyLTEwMmQtZjVjOGI5MDRkMWRlIiwiSVAiOiIxMC4xLjAuMjciLCJBZGRpdGlvbmFsSVBzIjpudWxsLCJSdW50aW1lSGFuZGxlciI6IiIsIkNOSVJlc3VsdCI6eyJJbnRlcmZhY2VzIjp7ImV0aDAiOnsiSVBDb25maWdzIjpbeyJJUCI6IjEwLjEuMC4yNyIsIkdhdGV3YXkiOiIxMC4xLjAuMSJ9XSwiTWFjIjoiZmU6NmY6ZDI6ZGI6YjU6MTMiLCJTYW5kYm94IjoiL3Zhci9ydW4vbmV0bnMvY25pLTU4N2NiNmY0LTgxNTUtYjViMi0xMDJkLWY1YzhiOTA0ZDFkZSJ9LCJsbyI6eyJJUENvbmZpZ3MiOlt7IklQIjoiMTI3LjAuMC4xIiwiR2F0ZXdheSI6IiJ9LHsiSVAiOiI6OjEiLCJHYXRld2F5IjoiIn1dLCJNYWMiOiIwMDowMDowMDowMDowMDowMCIsIlNhbmRib3giOiIvdmFyL3J1bi9uZXRucy9jbmktNTg3Y2I2ZjQtODE1NS1iNWIyLTEwMmQtZjVjOGI5MDRkMWRlIn0sIm15bmV0Ijp7IklQQ29uZmlncyI6bnVsbCwiTWFjIjoiYTI6OTE6NDY6NTM6MjU6NzciLCJTYW5kYm94IjoiIn0sInZldGg3NmIwY2VkNyI6eyJJUENvbmZpZ3MiOm51bGwsIk1hYyI6IjQyOjFiOjJhOjgzOjFkOjgyIiwiU2FuZGJveCI6IiJ9fSwiRE5TIjpbe30se31dLCJSb3V0ZXMiOlt7ImRzdCI6IjEwLjEuMC4wLzE2In0seyJkc3QiOiIwLjAuMC4wLzAiLCJndyI6IjEwLjEuMC4xIn1dfSwiUHJvY2Vzc0xhYmVsIjoiIn19"
}
},
"SandboxID": "",
"Spec": {
"ociVersion": "1.1.0",
"process": {
"user": {
"uid": 0,
"gid": 0,
"additionalGids": [
1000
]
},
"args": [
"/bin/pause"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"effective": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"permitted": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
]
},
"noNewPrivileges": true,
"oomScoreAdj": -998
},
"root": {
"path": "rootfs",
"readonly": true
},
"hostname": "s1-648769894-xs96h",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/dev/shm",
"type": "bind",
"source": "/run/containerd/io.containerd.grpc.v1.cri/sandboxes/d40b2284de669d166a8a1fac32971cdb88c5e382b1d4d172cab6f5854278fed3/shm",
"options": [
"rbind",
"ro",
"nosuid",
"nodev",
"noexec"
]
},
{
"destination": "/etc/resolv.conf",
"type": "bind",
"source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/d40b2284de669d166a8a1fac32971cdb88c5e382b1d4d172cab6f5854278fed3/resolv.conf",
"options": [
"rbind",
"ro",
"nosuid",
"nodev",
"noexec"
]
}
],
"annotations": {
"io.kubernetes.cri.container-type": "sandbox",
"io.kubernetes.cri.sandbox-cpu-period": "100000",
"io.kubernetes.cri.sandbox-cpu-quota": "0",
"io.kubernetes.cri.sandbox-cpu-shares": "2",
"io.kubernetes.cri.sandbox-id": "d40b2284de669d166a8a1fac32971cdb88c5e382b1d4d172cab6f5854278fed3",
"io.kubernetes.cri.sandbox-log-directory": "/var/log/pods/default_s1-648769894-xs96h_0b8c18e0-87ca-41ac-a0ab-e58df9d79a14",
"io.kubernetes.cri.sandbox-memory": "0",
"io.kubernetes.cri.sandbox-name": "s1-648769894-xs96h",
"io.kubernetes.cri.sandbox-namespace": "default",
"io.kubernetes.cri.sandbox-uid": "0b8c18e0-87ca-41ac-a0ab-e58df9d79a14"
},
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
],
"cpu": {
"shares": 2
}
},
"cgroupsPath": "kubepods-besteffort-pod0b8c18e0_87ca_41ac_a0ab_e58df9d79a14.slice:cri-containerd:d40b2284de669d166a8a1fac32971cdb88c5e382b1d4d172cab6f5854278fed3",
"namespaces": [
{
"type": "pid"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "network",
"path": "/var/run/netns/cni-587cb6f4-8155-b5b2-102d-f5c8b904d1de"
}
],
"seccomp": {
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"names": [
"accept",
"accept4",
"access",
"adjtimex",
"alarm",
"bind",
"brk",
"capget",
"capset",
"chdir",
"chmod",
"chown",
"chown32",
"clock_adjtime",
"clock_adjtime64",
"clock_getres",
"clock_getres_time64",
"clock_gettime",
"clock_gettime64",
"clock_nanosleep",
"clock_nanosleep_time64",
"close",
"close_range",
"connect",
"copy_file_range",
"creat",
"dup",
"dup2",
"dup3",
"epoll_create",
"epoll_create1",
"epoll_ctl",
"epoll_ctl_old",
"epoll_pwait",
"epoll_pwait2",
"epoll_wait",
"epoll_wait_old",
"eventfd",
"eventfd2",
"execve",
"execveat",
"exit",
"exit_group",
"faccessat",
"faccessat2",
"fadvise64",
"fadvise64_64",
"fallocate",
"fanotify_mark",
"fchdir",
"fchmod",
"fchmodat",
"fchown",
"fchown32",
"fchownat",
"fcntl",
"fcntl64",
"fdatasync",
"fgetxattr",
"flistxattr",
"flock",
"fork",
"fremovexattr",
"fsetxattr",
"fstat",
"fstat64",
"fstatat64",
"fstatfs",
"fstatfs64",
"fsync",
"ftruncate",
"ftruncate64",
"futex",
"futex_time64",
"futex_waitv",
"futimesat",
"getcpu",
"getcwd",
"getdents",
"getdents64",
"getegid",
"getegid32",
"geteuid",
"geteuid32",
"getgid",
"getgid32",
"getgroups",
"getgroups32",
"getitimer",
"getpeername",
"getpgid",
"getpgrp",
"getpid",
"getppid",
"getpriority",
"getrandom",
"getresgid",
"getresgid32",
"getresuid",
"getresuid32",
"getrlimit",
"get_robust_list",
"getrusage",
"getsid",
"getsockname",
"getsockopt",
"get_thread_area",
"gettid",
"gettimeofday",
"getuid",
"getuid32",
"getxattr",
"inotify_add_watch",
"inotify_init",
"inotify_init1",
"inotify_rm_watch",
"io_cancel",
"ioctl",
"io_destroy",
"io_getevents",
"io_pgetevents",
"io_pgetevents_time64",
"ioprio_get",
"ioprio_set",
"io_setup",
"io_submit",
"io_uring_enter",
"io_uring_register",
"io_uring_setup",
"ipc",
"kill",
"landlock_add_rule",
"landlock_create_ruleset",
"landlock_restrict_self",
"lchown",
"lchown32",
"lgetxattr",
"link",
"linkat",
"listen",
"listxattr",
"llistxattr",
"_llseek",
"lremovexattr",
"lseek",
"lsetxattr",
"lstat",
"lstat64",
"madvise",
"membarrier",
"memfd_create",
"memfd_secret",
"mincore",
"mkdir",
"mkdirat",
"mknod",
"mknodat",
"mlock",
"mlock2",
"mlockall",
"mmap",
"mmap2",
"mprotect",
"mq_getsetattr",
"mq_notify",
"mq_open",
"mq_timedreceive",
"mq_timedreceive_time64",
"mq_timedsend",
"mq_timedsend_time64",
"mq_unlink",
"mremap",
"msgctl",
"msgget",
"msgrcv",
"msgsnd",
"msync",
"munlock",
"munlockall",
"munmap",
"name_to_handle_at",
"nanosleep",
"newfstatat",
"_newselect",
"open",
"openat",
"openat2",
"pause",
"pidfd_open",
"pidfd_send_signal",
"pipe",
"pipe2",
"pkey_alloc",
"pkey_free",
"pkey_mprotect",
"poll",
"ppoll",
"ppoll_time64",
"prctl",
"pread64",
"preadv",
"preadv2",
"prlimit64",
"process_mrelease",
"pselect6",
"pselect6_time64",
"pwrite64",
"pwritev",
"pwritev2",
"read",
"readahead",
"readlink",
"readlinkat",
"readv",
"recv",
"recvfrom",
"recvmmsg",
"recvmmsg_time64",
"recvmsg",
"remap_file_pages",
"removexattr",
"rename",
"renameat",
"renameat2",
"restart_syscall",
"rmdir",
"rseq",
"rt_sigaction",
"rt_sigpending",
"rt_sigprocmask",
"rt_sigqueueinfo",
"rt_sigreturn",
"rt_sigsuspend",
"rt_sigtimedwait",
"rt_sigtimedwait_time64",
"rt_tgsigqueueinfo",
"sched_getaffinity",
"sched_getattr",
"sched_getparam",
"sched_get_priority_max",
"sched_get_priority_min",
"sched_getscheduler",
"sched_rr_get_interval",
"sched_rr_get_interval_time64",
"sched_setaffinity",
"sched_setattr",
"sched_setparam",
"sched_setscheduler",
"sched_yield",
"seccomp",
"select",
"semctl",
"semget",
"semop",
"semtimedop",
"semtimedop_time64",
"send",
"sendfile",
"sendfile64",
"sendmmsg",
"sendmsg",
"sendto",
"setfsgid",
"setfsgid32",
"setfsuid",
"setfsuid32",
"setgid",
"setgid32",
"setgroups",
"setgroups32",
"setitimer",
"setpgid",
"setpriority",
"setregid",
"setregid32",
"setresgid",
"setresgid32",
"setresuid",
"setresuid32",
"setreuid",
"setreuid32",
"setrlimit",
"set_robust_list",
"setsid",
"setsockopt",
"set_thread_area",
"set_tid_address",
"setuid",
"setuid32",
"setxattr",
"shmat",
"shmctl",
"shmdt",
"shmget",
"shutdown",
"sigaltstack",
"signalfd",
"signalfd4",
"sigprocmask",
"sigreturn",
"socketcall",
"socketpair",
"splice",
"stat",
"stat64",
"statfs",
"statfs64",
"statx",
"symlink",
"symlinkat",
"sync",
"sync_file_range",
"syncfs",
"sysinfo",
"tee",
"tgkill",
"time",
"timer_create",
"timer_delete",
"timer_getoverrun",
"timer_gettime",
"timer_gettime64",
"timer_settime",
"timer_settime64",
"timerfd_create",
"timerfd_gettime",
"timerfd_gettime64",
"timerfd_settime",
"timerfd_settime64",
"times",
"tkill",
"truncate",
"truncate64",
"ugetrlimit",
"umask",
"uname",
"unlink",
"unlinkat",
"utime",
"utimensat",
"utimensat_time64",
"utimes",
"vfork",
"vmsplice",
"wait4",
"waitid",
"waitpid",
"write",
"writev"
],
"action": "SCMP_ACT_ALLOW"
},
{
"names": [
"socket"
],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"value": 40,
"op": "SCMP_CMP_NE"
}
]
},
{
"names": [
"personality"
],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"value": 0,
"op": "SCMP_CMP_EQ"
}
]
},
{
"names": [
"personality"
],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"value": 8,
"op": "SCMP_CMP_EQ"
}
]
},
{
"names": [
"personality"
],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"value": 131072,
"op": "SCMP_CMP_EQ"
}
]
},
{
"names": [
"personality"
],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"value": 131080,
"op": "SCMP_CMP_EQ"
}
]
},
{
"names": [
"personality"
],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"value": 4294967295,
"op": "SCMP_CMP_EQ"
}
]
},
{
"names": [
"process_vm_readv",
"process_vm_writev",
"ptrace"
],
"action": "SCMP_ACT_ALLOW"
},
{
"names": [
"arch_prctl",
"modify_ldt"
],
"action": "SCMP_ACT_ALLOW"
},
{
"names": [
"chroot"
],
"action": "SCMP_ACT_ALLOW"
},
{
"names": [
"clone"
],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"value": 2114060288,
"op": "SCMP_CMP_MASKED_EQ"
}
]
},
{
"names": [
"clone3"
],
"action": "SCMP_ACT_ERRNO",
"errnoRet": 38
}
]
},
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
}
Looks like the kubernetes module takes over the /etc/cni/net.d
directory making it read-only since it needs to configure flannel, a layer 3 network fabric for k8s. See: https://github.com/NixOS/nixpkgs/blob/53bbb203e013e8fbbcddd9f205e73674475f129a/nixos/modules/services/cluster/kubernetes/kubelet.nix#L250
Which stops nerdctl
working properly since it wants to write /etc/cni/net.d/nerdctl-bridge.conflist
.
Our options:
We should update this as we think more things to do:
Depends on #10
Since the snapshotter is mostly stateless (besides creating directories and metadata), and the outputs are just golang []mount.Mount
in-memory structs, we should be able to write good unit tests that doesn't actually mount. (We do want integration tests in the future to test that the mount.Mount
structs are valid options against the mount syscall though)
See these snasphotter tests for inspiration:
When writing a overlay-based snapshotter, typically you want maintain compatibility with regular images. It'll be nice to also run the overlay test suite against your own snapshotter, so we should upstream a PR that refactors the tests into snapshots/testsuite/overlay.go
as an exported function testsuite.OverlaySnapshotterSuite
.
In order to make sure the examples in README is well maintained, we should add automated testing to the examples and installation instructions. There are two approaches we can take:
I think it's probably cleaner to do (2), so that it's easier to maintain the README with no generation step. Let's find out if what solutions there are for (2) and if any nix project does something similar.
For testing against home-manager, we likely want a development flake, as we don't want to introduce home-manager as a top-level dependency.
Now that we refactored out a WithNixBuilder
, its becoming increasingly obvious that we should pass the full nix store path to the builder, allowing it to decide how to handle that path. For example, if you have multiple nix store dirs, then you want to only run one nix-snapshotter instead of multiple kubernetes + nix-snapshotter for each nix store dir, or add a proxy (too complicated).
Instead the much simpler solution is to pass along the nix store path directly to the builder, allowing it to choose a nix binary built for another nix store dir.
This will change the image specification as well, so we need to re-push hinshun/hello:nix
until we have kubernetes working with loaded images.
Spinning off a new issue from: #6
Another avenue to make nix-snapshotter easy to consume is to use NixOS great support for qemu VMs, that'll let someone quickly try out a Nix VM with containerd + nix-snapshotter configured with the root nix store.
We'll need to write a few modules to setup containerd and nix-snapshotter as systemd services, then expose the modules as nixosModules.default
as a flake output.
The Snapshotter specification specifies that Prepare
, View
and Mounts
should return the same mounts. Due to an adjustment to Prepare
this is no longer true so Mounts
and View
should be fixed accordingly.
Technically speaking, the code in pkg/nix2container/build.go
doesn't necessarily need nix store paths, but just arbitrary paths. This means we can write effective unit tests without involving nix.
We need to investigate how to make nix-snapshotter easier to try out. Not everyone is familiar with how to configure containerd, and running services as root so there should be a better way.
For example, we can investigate rootless containers.
Another is #19 which exposes nix-snapshotter as NixOS modules usable with either NixOS vm or NixOS directly.
Not everyone uses flakes, so it'll be nice for non-flake users to utilize nix-snapshotter. Normally this is done via using flake-compat in the default.nix
, so we'll need to move the existing default.nix
somewhere else.
Originally the derivation was kept in default.nix
to at least people use the package, but non-flake use cases also cover nixos modules and home-manager too, so we should just move the nix-snapshotter derivation into flake-parts and just let flake-compat
provide the outputs.
Consider the following image:
{ writeShellScript, runtimeShell, nix-snapshotter }:
let
hello-world = writeShellScript "hello-world"
''#!{runtimeShell}
echo "Hello, world!"
'';
in (nix-snapshotter.buildImage {
name = "repro";
resolvedByNix = true;
config.entrypoint = [ hello-world ];
})
Built and loaded by this procedure:
$ drv=$(nix eval --raw --apply 'p: let nix-snapshotter = (builtins.getFlake "github:pdtpartners/nix-snapshotter/6eb21bd"); pkgs = import (builtins.getFlake "nixpkgs/f292b49") { overlays = [ nix-snapshotter.overlays.default ]; }; container = (pkgs.callPackage p {}).copyToContainerd {}; in container.drvPath' --file ./repro.nix)
$ nix build "$drv^out"
$ sudo result/bin/copy-to-containerd
When this container is run with nerdctl run nix:0/nix/store/<path>:latest
, an error occurs:
FATA[0002] failed to mount {Type:bind Source:/nix/store/c2lyvs0iz8b3l6ijk1f9fz8ma8khxcxm-hello-world Target:/nix/store/c2lyvs0iz8b3l6ijk1f9fz8ma8khxcxm-hello-world Options:[ro rbind]} on "/tmp/initialC595249715": no such file or directory
This issue seems to be related to how the Nix closure layer is generated. Directory mountpoints are generated correctly, but file mountpoints don't appear in the final image.
nix-snapshotter/pkg/nix2container/generate.go
Lines 255 to 259 in 6eb21bd
As of Docker Desktop 4.12.0, the Docker Engine has been slowly replacing its internals with containerd, and now there is experimental support to use containerd snapshotters for image storage. It may be possible to hook up Docker Engine / Desktop with nix-snapshotter so we can docker run --rm ghcr.io/pdtpartners/hello
.
Flake-parts lets you compose your flake outputs as NixOS modules, not to be confused with using modules to define a NixOS system but NixOS modules can be used outside of NixOS due to its type checker & composability.
This will help organize our different package outputs, rootless configuration, and nixos modules for the NixOS vm.
Ideally, we can run a rootless stack with just nix run .#rootless
. We should investigate systemd user services to see if we can have a single entrypoint to run several services (rootless-containerd, nix-snapshotter with fuse-overlayfs).
The ideal UX is that if you Ctrl+C the process from nix run .#rootless
it will tear down all the services. All the state, root directories should probably be in a tmp
directory in this repository which is gitignored so that it is all contained within the repository.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.