kubernetes / registry.k8s.io Goto Github PK

This project is the repo for registry.k8s.io, the production OCI registry service for Kubernetes' container image artifacts

Home Page: https://registry.k8s.io

License: Apache License 2.0

Makefile 0.66% Go 95.62% Shell 3.72%

k8s-sig-infra kubernetes

registry.k8s.io's Introduction

registry.k8s.io

This project implements the backend for registry.k8s.io, Kubernetes's container image registry.

Known user-facing issues will be pinned at the top of our issue tracker.

For details on the implementation see cmd/archeio

The community deployment configs are documented at in the k8s.io repo with the rest of the community infra deployments, but primarily here.

For publishing to registry.k8s.io, refer to the docs at in k8s.io under registry.k8s.io/.

Stability

registry.k8s.io is GA and we ask that all users migrate from k8s.gcr.io as soon as possible.

However, unequivocally: DO NOT depend on the implementation details of this registry.

Please note that there is NO uptime SLA as this is a free, volunteer managed service. We will however do our best to respond to issues and the system is designed to be reliable and low-maintenance. If you need higher uptime guarantees please consider mirroring images to a location you control.

Other than registry.k8s.io serving an OCI compliant registry: API endpoints, IP addresses, and backing services used are subject to change at anytime as new resources become available or as otherwise necessary.

If you need to allow-list domains or IPs in your environment, we highly recommend mirroring images to a location you control instead.

The Kubernetes project is currently sending traffic to GCP and AWS thanks to their donations but we hope to redirect traffic to more sponsors and their respective API endpoints in the future to keep the project sustainable.

Privacy

This project abides by the Linux Foundation privacy policy, as documented at https://registry.k8s.io/privacy

Background

Previously all of Kubernetes' image hosting has been out of gcr.io ("Google Container Registry").

We've incurred significant egress traffic costs from users on other cloud providers in particular in doing so, severely limiting our ability to use the GCP credits from Google for purposes other than hosting end-user downloads.

We're now moving to shift all traffic behind a community controlled domain, so we can quickly implement cost-cutting measures like serving the bulk of the traffic for AWS-users from AWS-local storage funded by Amazon, or potentially leveraging other providers in the future.

For additional context on why we did this and what we're changing about kubernetes images see: https://kubernetes.io/blog/2022/11/28/registry-k8s-io-faster-cheaper-ga

Essentially, this repo implements the backend sources for the steps outlined there.

For a talk with more details see: "Why We Moved the Kubernetes Image Registry"

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Slack in channel #sig-k8s-infra
Mailing List

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

registry.k8s.io's People

Contributors

Stargazers

Watchers

Forkers

ameukam bobymcbobs bentheelder justinsb aojea dims cndoit18 justina777 harry1064 isabella232 llgoo nnhansg ajayk linuxsuren xiaoruiguo ehashman pravakar15031986 epk phoenixhadoop sujiiith lakeshsharma jithurjacob dabaooline terryhowe estesp rjsadow tzneal donhuvy ykakarap aditya09k somasekharab aliozturkseksen yunlongflower monsterspirit jcordovaj ye-hbonemyat cccc5 onceagain12 deepank-yadav webstean preeti07021994 arunavagit amaragodwin newlikeda1989 dkbyo ptxl benmaduz lmitchell19125 reddyconcept mikedow5672 zhoneym aryansharma9917 jenkinsgituser omahs isgasho swaroopska01 dearborn-open-ai nashub2012 bahtijar qwert510 xiaogao688

registry.k8s.io's Issues

OCI Distribution Conformance Testing

The OCI Distribution spec has a conformance test that you can run to see if a registry meets the OCI spec.

I ran this for archeio and we are doing great with blobs but not so well with the manifests.

https://github.com/opencontainers/distribution-spec/tree/main/conformance

Set the following variables:

export OCI_NAMESPACE=kubernetes/pause
export OCI_TAG_NAME=3.7
export OCI_TEST_PULL=1
export OCI_ROOT_URL=https://registry.k8s.io
export OCI_TEST_PUSH=0
export OCI_MANIFEST_DIGEST=sha256:f81611a21cf91214c1ea751c5b525931a0e2ebabe62b3937b6158039ff6f922d
export OCI_BLOB_DIGEST=sha256:221177c6082a88ea4f6240ab2450d540955ac6f4d5454f0e15751b653ebda165

We need to change a few things to make it comply with the spec.

Report link: https://htmlpreview.github.io/?https://github.com/upodroid/oci-proxy/blob/compliance-fixes/oci-compliance.html

/sig k8s-infra
/priority important-soon
/kind feature

Stress of registry-sandbox.k8s.io

oci-proxy is currently deployed over the 2 GCP regions: us-central1 and us-west1. We should do some stress tests to build confidence on what we are building and also ensure we can handle k8s.gcr.io traffic.

We should probably ask to GCR team some outdated metrics.

/sig k8s-infra
/priority important-longterm

determine if redirector needs to be AWS-region aware

We have two scenarios still on the table for handing off traffic to AWS for AWS clients:

We route to cloudfront + s3 or ecr or otherwise some other endpoint that is global and handles serving regional copies internally.
We ourselves have regional data storage (e.g. an s3 bucket per continent, region ...) and we need to perform region mapping ourselves before redirecting.

Both of these should be doable in the redirect service, it's mostly a matter of what makes the most sense on the AWS architecture side from a cost perspective. Both have been discussed but we haven't settled on which we'll provide.

Once we know this, we know what the redirection logic needs to look like in the redirector.
Parsing the AWS data is very easy, but the implementation and optimization will look different depending on if we need to detect regions or not.

/assign @jaypipes

I have a branch locally with benchmarks for 1) that I can easily clean up and turn into 1) or 2), but they look fairly different and I'd like to wait until we know this first.

Unable to pull coredns:v1.9.3 at Uganda.

Not exactly sure what happened,

but we are unable to pull coredns:v1.9.3 from registry.k8s.io,
while we can pull coredns:v1.9.3 from k8s.gcr.io.

We can successfully pull coredns:v1.9.3 from registry.k8s.io at Boston, MA though.

The dns lookup from Uganda shows that the IP of registry.k8s.io is 34.107.244.51.

workaround broken cloud run X-Forwarded-For header

See details debugged in #61 (comment)

/assign

been looking into the best forward plan for this, given it seems likely the bug will eventually be fixed.
will send a PR soonish hopefully.
/kind bug
/sig k8s-infra

why kubernetes images url format is `<registry_url>/<image_name>:<image_tag>`

what happend

here is kubespray images.list , and generate by manage-offline-container-images.sh

$ cat temp/images.list |grep registry.k8s.io
registry.k8s.io/pause:3.7
registry.k8s.io/coredns/coredns:v1.9.3
registry.k8s.io/dns/k8s-dns-node-cache:1.21.1
registry.k8s.io/cpa/cluster-proportional-autoscaler-amd64:1.8.5
registry.k8s.io/metrics-server/metrics-server:v0.6.2
registry.k8s.io/sig-storage/local-volume-provisioner:v2.5.0
registry.k8s.io/ingress-nginx/controller:v1.5.1
registry.k8s.io/sig-storage/csi-attacher:v3.3.0
registry.k8s.io/sig-storage/csi-provisioner:v3.0.0
registry.k8s.io/sig-storage/csi-snapshotter:v5.0.0
registry.k8s.io/sig-storage/snapshot-controller:v4.2.1
registry.k8s.io/sig-storage/csi-resizer:v1.3.0
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.4.0
registry.k8s.io/kube-apiserver:v1.25.5
registry.k8s.io/kube-controller-manager:v1.25.5
registry.k8s.io/kube-scheduler:v1.25.5
registry.k8s.io/kube-proxy:v1.25.5

why only kubernetes images url format is <registry_url>/<image_name>:<image_tag>

$ cat temp/images.list |grep registry.k8s.io/kube
registry.k8s.io/kube-apiserver:v1.25.5
registry.k8s.io/kube-controller-manager:v1.25.5
registry.k8s.io/kube-scheduler:v1.25.5
registry.k8s.io/kube-proxy:v1.25.5

$ cat temp/images.list |grep registry.k8s.io/pause
registry.k8s.io/pause:3.7

i don't know why container images can have two format, but I think Kubernetes image should also comply with the common image address standard---format2.

format1: <registry_url>/<image_name>:<image_tag>

format2: <registry_url>/<project_name>/<image_name>:<image_tag>

kubespray offline design

in offline environment, kubespray let user use a local registry url with kube_image_repo: "{{ registry_host }}" variable

However, because the kubernetes image uses a chaotic image format, it is difficult to align：

eg : i can't config kube_image_repo: "192.168.72.16", if i push registry.k8s.io/pause:3.7-->192.168.72.16/kube/pause:3.7（harbor）, Kubeadm will not find this image address, it only find 192.168.72.16/pause:3.7

what the problems

when i want push all images to offline harbor registry, This mix of image lists in different formats can lead to confusion， and it's hard for push shell scripts to Handle this situation.

and final it cause me must use docker registry (registry:2), because harbor only support format2.

suggest

registry.k8s.io/kube/pause:3.7
registry.k8s.io/kube/kube-apiserver:v1.25.5
registry.k8s.io/kube/kube-controller-manager:v1.25.5
registry.k8s.io/kube/kube-scheduler:v1.25.5
registry.k8s.io/kube/kube-proxy:v1.25.5

Enable Automated Builds

We need to finish the ko build script 🙃
We'll need to file a request in k8s.io for a staging registry etc.
Then we can enable auto-push on postsubmit with prow/cloudbuild (https://github.com/kubernetes/k8s.io/tree/main/k8s.gcr.io)

regionalize GCR in oci-proxy

Since k8s.gcr.io regionalizes to {eu,us,asia} GCR registries today, if we have the GCR team help us reroute the traffic to this domain to registry.k8s.io / oci-proxy, then we will need to regionalize ourselves.

With that team and then with the community in the SIG k8s infra meeting we've discussed the approach of simply doing:

GCLB routes to the local cloud run instance
local cloud run instance introspects region or has per-region config to identify the nearest GCR/AR registry

This should be pretty straightforward, but we should look into this a bit more before moving forward.

Age Out Old Registry Content

We migrated forward all existing images since the beginning of hosting Kubernetes images, from gcr.io/google-containers to k8s.gcr.io to registry.k8s.io

We should consider establishing an intended retention policy for images hosted by registry.k8s.io, and communicating that early.

Not retaining all images indefinitely could help the container-image-promoter avoid dealing with an indefinitely growing set of images to synchronize and sign and may also have minor hosting cost benefits.

Even if we set a very lengthy period, we should probably consider not hosting content indefinitely.

Firewall filtering by FQDN

Is there a list of FQDNS used in the hosting of registry.k8s.io that should be allowed by enterprise security teams to ensure access?

serve privacy policy

We determined in #62 that we should be operating under the linux foundation privacy policy

However, archeio does not itself serve the privacy policy in any way.

We should add an endpoint for humans like /privacy (this is fine, registry API calls are all under /v2/) that redirects to the LF privacy page to start.

It might be worth considering a landing page with some context and a link to the LF page in the future instead.

get image tag from registry.k8s.io

i want search registry.k8s.io/dns/k8s-dns-node-cache image's tag ,how can i do that?

Enable CI

We should enable CI to run roughly:

make verify
make test
make images build
(each of these should probably be a distinct job in parallel)

We can probably use github actions or prow. make verify will require docker or the current shellcheck binary. the rest merely require bash/posix/make.

increase log level in sandbox deployment

we should log lots of stuff in the sandbox so we can verify what is happening, but not in prod with the scale of requests we expect there.

we already have klog with verbosity levels, let's increase it a bit in the sandbox deployment, and then add some more verbose-only logging.

gate or temporarily compile out s3 redirect

since the s3 buckets are still not ready, we should not block #77 rollout to registry.k8s.io on production s3 bucket availability.

I didn't think it would take this long, so I went with the simpler option of not gating s3 redirection / maintaining two modes, figuring we'd promote to production soon enough, and the initial production app is literally just bog simple 3XX to k8s.gcr.io so we can get users on registry.k8s.io and shouldn't need patches.

since it is taking a while, we should make it possible to continue iterating on other aspects, like redirecting to the individual GCR regions backing k8s.gcr.io ourselves, and later artifact registry instances instaed.

Update build service account permissions

Recently #40 was merged,
the build IAM user must have sufficient permissions.

Here's the job not succeeding
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-oci-proxy-push-images/1512412123450314752

In my experimentation I set up the roles something like this.

PROJECT=k8s-infra-ii-sandbox
ACCOUNT_NUM="$(gcloud iam service-accounts list | grep 'Compute Engine' | grep -oE '[0-9]+')"
DEFAULT_SERVICE_ACCOUNT="$(gcloud iam service-accounts list | grep 'Compute Engine' | grep -oE '[a-zA-Z0-9-]+@[a-z.]+')"
REGIONS=(
    us-central1
    us-west1
)

for REGION in ${REGIONS[*]}; do
    # add cloudrun admin rolebind for build service account to
    gcloud run services add-iam-policy-binding coolest-and-fastest-service-everrr-${REGION} \
      --member=serviceAccount:$ACCOUNT_NUM@cloudbuild.gserviceaccount.com \
      --role=roles/run.admin \
      --project=$PROJECT \
      --region=${REGION}
done


gcloud iam service-accounts add-iam-policy-binding $DEFAULT_SERVICE_ACCOUNT \
  --member=serviceAccount:$ACCOUNT_NUM@cloudbuild.gserviceaccount.com \
  --role roles/iam.serviceAccountUser \
  --project=$PROJECT

Where might the permissions be managed?

china can not pull from registry.k8s.io

registry.k8s.io is blocked ?

# docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0
Error response from daemon: Head "https://asia-east1-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-state-metrics/kube-state-metrics/manifests/v2.7.0": dial tcp 108.177.97.82:443: i/o timeout
# docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0
Error response from daemon: Head "https://asia-east1-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-state-metrics/kube-state-metrics/manifests/v2.7.0": dial tcp 64.233.188.82:443: i/o timeout

Collecting log data for Data Studio reports to show registry.k8s.io success

Once traffic is redirected via oci-proxy it would make sense to create Data Studio reports to show the success of the project.

Initially the cost saving should be visible in the monthly K8s Infra GCP Billing report. However, as soon as head room has been created in monthly expenditure work could continue on mitigating the remaining K8s Infra componentes to community infa, which would again increase spending.

This would created a moving target, making it difficult to show the real cost saving achieved over the next months

For this we would need the logs for oci-proxy, reg.k8s.io as well as die log data from the AWS S3 buckets. Further processing of the data would allow us to match SHA and K8s release version, which would give insight into cost per release.

@BobyMCbobs @hh

Privacy policy for registry.k8s.io ?

Just questioning the need of the privacy policy. From https://groups.google.com/a/kubernetes.io/g/dev/c/DYZYNQ_A6_c/m/oD9_Q8Q9AAAJ, we may have individuals using their personal IP address.

registry.k8s.io currently runs on GCP infrastructure. This means all GCP services we use are covered by Google privacy policy

I took a quick look on what other projects are doing:
Golang: https://proxy.golang.org/privacy
Pypi: https://www.python.org/privacy/
Debian: https://www.debian.org/legal/privacy

/sig k8s-infra
/assign @dims
cc @BenTheElder @cblecker

Feel free to close if it's not needed.

drop registry v1 support

This is pointless code bloat at this point apparently, GCR doesn't even serve it and it's unlikely clients were trying to use it / it doesn't support multi-arch images so the past many k8s releases probably would not really work through it ...

ref: #28 (comment)

should be a clean, simple PR. we can do further refactors in the future.

/help
/good-first-issue

oci-proxy should redirect per-AWS region

Here are the regions (and ASNs) we need to oci-proxy to actively redirect.

@ameukam is creating the AWS infra in kubernetes/k8s.io#3568.

If we treat Amazon ASNs not in ip-ranges.json as a type of region, then it’s #4 at 11.12% of Amazons total:

21.5% : us-west-2
16.7% : us-west-1
13% : us-east-1
11.1% : Other Amazon ASNs (not in ip-ranges.json, but collected in k8s.io/meta/asns/amazon.yaml
11.1% : eu-central-1
9.2% : eu-central-1
6.39% : us-east-2
5.32% : ap-southeast-1
2.99% : us-west-1
2.6% : ap-northeast-1
2.12% : ap-south-1
=====^^Roughly ~80%^^=====

FQDN's for registry.k8s.io

Is there a list of FQDN's that could be whitelisted so that the image url's of registry.k8s.io from version 4.2.0 can be accessed through a firewall?

error pulling image on GKE in Singapore

Following #115 (comment) (which seems unrelated to the original issue, which was seemingly an issue with the loadbalancer when removing europe-west3), I couldn't reproduce locally, but I can reproduce the issue on a standard GKE cluster in asia-southeast1-a

create a standard cluster in asia-southeast1-a
kubectl run --image=registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.3.0@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47 test
kubectl describe pod test

observe events:

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  4m53s                  default-scheduler  Successfully assigned default/test to gke-sg-test-default-pool-0f72f8d8-znhx
  Normal   Pulling    3m24s (x4 over 4m52s)  kubelet            Pulling image "registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.3.0@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47"
  Warning  Failed     3m24s (x4 over 4m51s)  kubelet            Failed to pull image "registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.3.0@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47": rpc error: code = Unknown desc = failed to pull and unpack image "registry.k8s.io/ingress-nginx/kube-webhook-certgen@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47": failed to resolve reference "registry.k8s.io/ingress-nginx/kube-webhook-certgen@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47": failed to do request: Head "https://asia-southeast1-docker.pkg.dev/v2/k8s-artifacts-prod/images/ingress-nginx/kube-webhook-certgen/manifests/sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47": failed to authorize redirect: failed to fetch anonymous token: unexpected status: 403 Forbidden
  Warning  Failed     3m24s (x4 over 4m51s)  kubelet            Error: ErrImagePull
  Warning  Failed     3m12s (x6 over 4m50s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    2m59s (x7 over 4m50s)  kubelet            Back-off pulling image "registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.3.0@sha256:549e71a6ca248c5abd51cdb73dbc3083df62cf92ed5e6147c780e30f7e007a47"

cc @winggundamth

/assign
/priority critical-urgent

determine which AWS regions should serve the image layers

Part of:

kubernetes/k8s.io#3411

Follow up of:

Determine if we should push the image layers to all the AWS regions or pick the regions where we want to push them.
List of AWS S3 regions: https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

document stance on allow-listing registry.k8s.io traffic

We cannot afford to commit to the backing endpoints and details of registry.k8s.io being stable, the project needs to be able to take advantage of whatever resources we have available to us at any given point in time in order to keep the project afloat.

As-is, we're very close to running out of funds and in an emergency state, exceeding our $3M/year GCP credits with container image hosting being a massively dominant cost in excess of 2/3 of our spend. Even in the future when we shift traffic to other platforms using the registry.k8s.io system, we need to remain flexible and should not commit to specific backing details.

E.G. we may be receiving new resources from other vendors, following the current escalation with the CNCF / Governing Board.

We should clearly document, prominently, bolded in this repo's README that we point https://registry.k8s.io to, an explicit stance on this.

We've already had requests to document the exact list of endpoints to allowlist, which is not an expectation we can sustain.

We should also consider giving pointers regarding how end-users can run their own mirrors to:

insulate themselves from shifting implementation details of registry.k8s.io affecting their egress allow-lists
reduce costs to the project
improve reliability for their own clusters (I.E. not depend on the uptime of a volunteer-staffed free registry)

/sig k8s-infra
/priority important-soon
/kind documentation

standup sandbox deployment

@ameukam started this here kubernetes/k8s.io#3318

We still need:

Loadbalancer
NEGs
Cloud Run deployment of the image from this repo (at least for the initial setup)

The infra configs should all live in https://github.com/kubernetes/k8s.io, but that issue tracker is noisier 🙃

document request handling flow

/assign

Monitoring of registry.k8s.io

archeio is currently deployed in sandbox mode using Cloud Run:

We have built-in metrics for Cloud Run but we may need some custom metrics to gain more visibility in the system we are building.
Possible metrics to collect:

HTTP code for every response from k8s.gcr.io

Alerting: #75

ASN lookups

Background

Using collected+collated ASN data from aggregators, the published IP range information from the cloud credits vendors, the source IP can be resolved all the way up to the provider's name and publisher-verified ASNs.

Currently this data is being used effectively by the public-log-asn-matcher job found in k8s.io/image/public-log-asn-matcher, which provides the BigQuery dataset for a detailed DataStudio report to display the (PII) Kubernetes public artifact traffic information, it uses:

This rich data will provide everything needed to redirect the traffic to the closest provider which will save costs.

Implementation

There are a few potential ways this could be implemented, but I think the easiest path should be:

pull in separately pushed & rendered data, like that is produced from
https://github.com/kubernetes/k8s.io/blob/97379d1e1b03265af62269063b35fe8114f4d81d/images/public-log-asn-matcher/pg-init.d/lib-01-prepare-company-asn-data.sh#L30 and
https://github.com/kubernetes/k8s.io/blob/97379d1e1b03265af62269063b35fe8114f4d81d/images/public-log-asn-matcher/pg-init.d/lib-01-prepare-company-asn-data.sh#L42
(flatten the data out even more?)
on a schedule as a go-routine
load the data into memory after changes
make simple look ups against it (very rough pseudo code as example)

// IP range cidr -> providerName
type Lookup map[string]string
lookup := getLookups()
// from https://github.com/kubernetes/k8s.io/tree/main/registry.k8s.io/infra/meta/asns
verifiedASNMetadata := getVerifiedASNMetadata()
// defaults to k8s.gcr.io
registryProvider := getDefaultRegistryProvider()
sourceIP := req.Header.Get("X-Real-Ip")
ipRanges := getIPRanges()
for _, ipRange := range ipRanges {
  _, subnet, _ := net.ParseCIDR(ipRange)
  if !subnet.Contains(sourceIP) {
    continue
  }
  providerName, ok := lookup[ipRange]
  if !ok {
    continue
  }
  registryProvider = verifiedASNMetadata[providerName].RedirectsTo.Registry
  break
}
http.Redirect(w, r, "https://"+registryProvider+path, http.StatusPermanentRedirect)

the registryProvider backends should also be pinged periodically in the background to ensure no client errors after redirection

outage in europe-west3 (Frankfurt)

My cluster is failing, unable to pull an image, and http://registry.k8s.io also spills out errors.
Is something going on here?

Add `preset-use-new-registry` to various jobs

In order to add some production load testing on the registry.k8s.io redirection, a request from @dims is to add the label preset-use-new-registry=true so that some e2e tests that pull images can pull through the new registry

ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1647896105383489

consider a more useful `/` redirect

http://registry-sandbox.k8s.io/ redirects to https://github.com/kubernetes/k8s.io/tree/main/registry.k8s.io which doesn't actually contain anything we're using or any docs.

https://github.com/kubernetes/k8s.io/wiki/New-Registry-url-for-Kubernetes-(registry.k8s.io) is probably a better URL for now?

cc @dims @ameukam

We should also probably apply #79 to paths like this, since we'll update it in the future.

document testing regimen

/assign
/sig k8s-infra

Random download failures - 403 errors [hetzner]

Hi,

Attepting to build a 3 node kubernetes cluster, using kubespray (latest) on hetzner cloud instances running Debian 11.

First attempt failed due to download failure for kubeadm on 1 of the 3 instances. Confirmed using local download, 1 fail, 2 sucess.
Swapped in a replacement instance, and moved past this point, assumed possible ip blacklisting, though not confirmed.

All 3 instances then downloaded 4 calico networking containers, and came to the pause 3.7 download, using a command like this.

root@kube-3:# /usr/local/bin/nerdctl -n k8s.io pull --quiet registry.k8s.io/pause:3.7
root@kube-3:# nerdctl images
REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE
registry.k8s.io/pause 3.7 bb6ed397957e 4 seconds ago linux/amd64 700.0 KiB 304.0 KiB
bb6ed397957e 4 seconds ago linux/amd64 700.0 KiB 304.0 KiB

on the failing instance, we see the following error if applied by hand, using kubespray it tries 4 times and then fails the whole install at that point.

root@kube-2:~# /usr/local/bin/nerdctl -n k8s.io pull --quiet registry.k8s.io/pause:3.7
FATA[0000] failed to resolve reference "registry.k8s.io/pause:3.7": unexpected status from HEAD request to https://registry.k8s.io/v2/pause/manifests/3.7: 403 Forbidden

Do you have any idea why the download from this registry might be failing, and is there any alternative source I could try ?

The ip address starts and ends as shown below, and was run a couple of minutes ago

Thu 12 Jan 2023 02:52:21 PM UTC

65.x.x.244

Many thanks

Mike

integration testing

We should perform an integration test in presubmit. As any additional code complexity is added and we start to roll out registry.k8s.io to projects, it is very important that we test thoroughly, and unit testing alone will not be sufficient.

We should start by having a tool like crane pull an image through a locally running copy of archeio in an integration test target, and adding this to CI.

Rename this repo to registry.k8s.io

This repo is for an important piece of software that will be heavily used by the community when running.
I don't believe the name oci-proxy currently chosen reflects the behaviour or function of the webserver, as it doesn't directly implement the OCI spec and does not act as a proxy but functions as a redirect with handling particular paths and headers.

I would suggest that the project live in kubernetes-sigs/registry.k8s.io, so that it is very clear it's purpose.

Keen to hear thoughts!

Mapping AWS regions and S3 bucket region

Part of:

kubernetes/k8s.io#3411

From #39, we identified which region the traffic come the most but we need to ensure redirection for traffic coming from regions not listed.
We could also assume the rest of regions are served from only 1 bucket but they is a risk of high latency during image pulling.

Draft:

regions	bucket name	bucket location
us-east-1, sa-east-1		us-east-1
ca-central-1, us-east-2		us-east-2
us-west-1		us-west-1
us-west-2		us-west-2
ap-south-1, me-south-1		ap-south-1
ap-northeast-1, ap-northeast-2, ap-northeast-3		ap-northeast-1
ap-east-1, ap-southeast-1, ap-southeast-2. ap-southeast-3		ap-southeast-1
eu-central-1, eu-south-1		eu-central-1
eu-west-1, af-south-1		eu-west-1
eu-north-1, eu-west-2, eu-west-3		eu-west-2

/sig k8s-infra
/area artifacts
/priority important-soon
/milestone v1.25
/assign @BenTheElder

re-enable linters when golangci-lint is go1.18 ready

ref: 14eeea7

golangci/golangci-lint#2649

golangci-lint is also disabling some linters itself, we will need to update versions when there is a fixed version. see above linked issue

/help

primary upstream backend should be overridable with runtime config

We should add an environment variable to allow override k8s.gcr.io. This could be useful in testing, and importantly it will allow us to potentially move to some new *.gcr.io as the upstream if we ever manage to update k8s.gcr.io to point to our redirector, which will still be backed by the registries behind k8s.gcr.io in part.

don't use permanent redirect

these are temporary redirects

change the redirect code and tests

Alerting for production deployment?

kubernetes/kubernetes#109938 (review)

Serve container image layers from AWS by default (make exception when clients are from Google)

Our current logic is to default to Google for traffic not from AWS.
We should update the logic to default to AWS if not Google.

This will directly address our top two priorities from our meeting last week.

-1 : GCP spend HAS to go down (So we stay within budget and make room for other things we need the $$'s for)
0 : AWS spend HAS to go up (If we don't use it, we will end up not getting more)

Our main logic for handling redirects is here:
https://github.com/kubernetes/registry.k8s.io/blob/main/cmd/archeio/app/handlers.go#L123-L131

		// check if client is known to be coming from an AWS region
		awsRegion, ipIsKnown := regionMapper.GetIP(clientIP)
		if !ipIsKnown {
			// no region match, redirect to main upstream registry
			redirectURL := upstreamRedirectURL(rc, rPath)
			klog.V(2).InfoS("redirecting blob request to upstream registry", "path", rPath, "redirect", redirectURL)
			http.Redirect(w, r, redirectURL, http.StatusTemporaryRedirect)
			return
		}

I'm suggesting the following or similar:

	// initialize map of clientIP to AWS region
	regionMapper := gcp.NewGCPRegionMapper()
       //... snip ...//
		// check if client is known to be coming from an GCP region
		gcpRegion, ipIsKnown := regionMapper.GetIP(clientIP) // 
		if !ipIsKnown {
			// no region match at GCP, redirect to main upstream registry
			redirectURL := upstreamRedirectURL(rc, rPath)
			klog.V(2).InfoS("redirecting blob request to upstream registry", "path", rPath, "redirect", redirectURL)
			http.Redirect(w, r, redirectURL, http.StatusTemporaryRedirect)
			return
		}

We will need to create a net/cidrs/gcp similar to main/pkg/net/cidrs/aws

It should be nearly the same code, with minor changes to main/pkg/net/cidrs/aws/internal/ranges2go/genrawdata.sh

Swapping out the AWS ranges with GCP ranges:

production deployment should be gitops

We auto deploy the registry-sandbox instance, and everything for that is configured either in k8s.io terraform, or in the make rule backing make deploy / the cloudbuild.yml for this repo.

The production configuration currently seems to only be partially source controlled, the production (registry.k8s.io) cloud run deployment appears to still be manually submitted by @ameukam

Now that kubernetes has migrated, we really ought to fix this kubernetes/kubernetes#109938 (comment)

tag archeio following semver

We should tag archeio with semanting versioning : https://semver.org/

/priority important-soon
/milestone v1.24

fix broken kops-AWS registry-sandbox e2e CI

https://testgrid.k8s.io/sig-k8s-infra-oci-proxy#kops-grid-gcr-mirror-canary

xref: kubernetes-sigs/aws-ebs-csi-driver#1251, https://kubernetes.slack.com/archives/C8MKE2G5P/p1651005000692509

possible workaround: disable the driver and disable the CSI tests, we can do with other coverage.

bake version info into binary for logging

we have version info in the tags we push, but we should probably also bake it into the binary for logging. we can use a golang x-def and update the build commands.

ideally this should be fully sortable git info, see kubernetes-sigs/kind#2618, and also a starting non-fully-sortable version of embedding this info in a similar makefile in that repo as a starting point.

Docker client fails to download from the sandbox

After a auto-deployment of archeio, e2e-kops-grid-gcr-mirror-canary is failing:

See https://testgrid.k8s.io/sig-k8s-infra-oci-proxy#kops-grid-gcr-mirror-canary

❯ docker pull registry-sandbox.k8s.io/pause:3.1
3.1: Pulling from pause
67ddbfb20a22: Pulling fs layer 
error pulling image configuration: error parsing HTTP 400 response body: invalid character 'P' looking for beginning of value: "ParseAddr(\" 34.110.128.221\"): unexpected character (at \" 34.110.128.221\")\n"

Seems like an issue with IP address parsing

/kind bug
/sig k8s-infra
/area artifacts
/milestone v1.24

full e2e testing

Analyze registry.k8s.io logs

Continuing from kubernetes/k8s.io#1343 (comment)

Cloud Run logs requests btw. It looks like this:

{
  "insertId": "62c2d53e0003371fdc7d0e63",
  "httpRequest": {
    "requestMethod": "GET",
    "requestUrl": "https://oci-proxy-4txm7cz3ca-ew.a.run.app/v2/pause/manifests/latest",
    "requestSize": "359",
    "status": 308,
    "responseSize": "823",
    "userAgent": "curl/7.79.1",
    "remoteIp": "62.3.X.X",
    "serverIp": "216.239.36.53",
    "latency": "0.002888185s",
    "protocol": "HTTP/1.1"
  },
  "resource": {
    "type": "cloud_run_revision",
    "labels": {
      "revision_name": "oci-proxy-00007-rus",
      "service_name": "oci-proxy",
      "location": "europe-west1",
      "configuration_name": "oci-proxy",
      "project_id": "coen-mahamed-ali"
    }
  },
  "timestamp": "2022-07-04T11:55:42.210719Z",
  "severity": "INFO",
  "labels": {
    "instanceId": "00c527f6d4c262c0c5e52e02b9e0598455af84a75be4db07302d51c05ad943142fb40a1eedd66cc7d8fb9fed14ac8349f4a06236b5f8235e83c02c64ac3b494fa3"
  },
  "logName": "projects/coen-mahamed-ali/logs/run.googleapis.com%2Frequests",
  "trace": "projects/coen-mahamed-ali/traces/3c44b9996e13e00f68afa0b8dad552aa",
  "receiveTimestamp": "2022-07-04T11:55:42.217327936Z",
  "spanId": "6672162191736732715",
  "traceSampled": true
}

We can sink these logs to bigquery and start crunching data.

@BobyMCbobs @hh

Cut 0.0.2 release

We should cut a new tag for archeio. Lot of changes landed after the 0.0.1

kubernetes / registry.k8s.io Goto Github PK

registry.k8s.io's Introduction

registry.k8s.io

Stability

Privacy

Background

Community, discussion, contribution, and support

Code of conduct

registry.k8s.io's People

Contributors

Stargazers

Watchers

Forkers

registry.k8s.io's Issues

what happend

kubespray offline design

what the problems

suggest

Background

Implementation

Recommend Projects

Recommend Topics

Recommend Org