neuro-inc / platform-registry-api Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
@dalazx has added pagination for GCP but it does not work on Amazon.
We need this fix to pass our e2e tests on client
Got this in logs:
2021-07-22 12:58:34,725 - aiohttp.server - ERROR - Error handling request
Traceback (most recent call last):
File "/root/.local/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 422, in _handle_request
resp = await self._request_handler(request)
File "/root/.local/lib/python3.8/site-packages/sentry_sdk/integrations/aiohttp.py", line 123, in sentry_app_handle
reraise(*_capture_exception(hub))
File "/root/.local/lib/python3.8/site-packages/sentry_sdk/_compat.py", line 54, in reraise
raise value
File "/root/.local/lib/python3.8/site-packages/sentry_sdk/integrations/aiohttp.py", line 113, in sentry_app_handle
response = await old_handle(self, request)
File "/root/.local/lib/python3.8/site-packages/aiohttp/web_app.py", line 499, in _handle
resp = await handler(request)
File "/root/.local/lib/python3.8/site-packages/aiohttp/web_middlewares.py", line 119, in impl
return await handler(request)
File "/root/.local/lib/python3.8/site-packages/aiohttp_remotes/x_forwarded.py", line 94, in middleware
return await handler(request)
File "/root/.local/lib/python3.8/site-packages/aiohttp/web_middlewares.py", line 119, in impl
return await handler(request)
File "/root/.local/lib/python3.8/site-packages/platform_registry_api/api.py", line 437, in handle_repo_tags_list
response = await self._handle_aws_ecr_tags_list(
File "/root/.local/lib/python3.8/site-packages/platform_registry_api/api.py", line 534, in _handle_aws_ecr_tags_list
"tags": [image["imageTag"] for image in data["imageIds"]],
KeyError: 'imageIds'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 390, in start
resp = await self._request_handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_app.py", line 366, in _handle
resp = await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 106, in impl
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp_remotes/x_forwarded.py", line 73, in middleware
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 106, in impl
return await handler(request)
File "/neuromation/platform_registry_api/api.py", line 259, in handle_catalog
client_response.raise_for_status()
File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 853, in raise_for_status
headers=self.headers)
aiohttp.client_exceptions.ClientResponseError: 403, message='Forbidden'
2019-06-11 15:02:44,096 - aiohttp.access - INFO - 10.138.0.8 [11/Jun/2019:15:02:43 +0000] "GET /v2/_catalog HTTP/1.1" 500 330 "-" "Python/3.6 aiohttp/3.5.4"
neuro image push synthetic-reverse
Using local image 'synthetic-reverse:latest'
Using remote image 'image://adavydow/synthetic-reverse:latest'
ERROR: Docker API error: received unexpected HTTP status: 500 Internal Server Error
Just got this error.
Pushed the image again and it was pushed without any problems.
Image was pretty large (7.5 GB).
message from @mariyadavydova :
Team, I observe weird behaviour with tags. I remember that we discussed the problem with AWS, but I don’t remember the current state of this issue. It works as follows: if you have more than 30 tags, you get an error (when asking for next 30 tags):
# an image with ton of tags
(neuro) mariyadavydova@Mariyas-MacBook-Pro platform-web-ui % neuro image tags image://neuro-public/cookiecutter-e2e/extras-e2e-custom-dockerfile
ERROR: Cannot authenticate (Not Authorized)
# an image with one tag
(neuro) mariyadavydova@Mariyas-MacBook-Pro platform-web-ui % neuro image tags image:image-1
image://neuro-public/mrsmariyadavydova/image-1:v2
Example of failed CI: https://github.com/neuromation/neuro-extras/actions/runs/259203480
Example of a message that has the upstream project name in it's payload.
{'errors': [{
'code': 'NAME_UNKNOWN',
# TODO: this has to be fixed ASAP:
'detail': {'name': 'testproject/neuromation/unknown'},
'message': 'repository name not known to registry',
}]}
run tests/e2e/test.sh, when it does docker pull localhost:5000/1ce6fb32-e097-426b-a140-f3136c5b2e93/ubntu:latest
there will be an error in server logs:
registry_1 | 2018-12-28 12:09:42,401 - aiohttp.server - ERROR - Error handling request
registry_1 | Traceback (most recent call last):
registry_1 | File "/usr/local/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 242, in data_received
registry_1 | messages, upgraded, tail = self._request_parser.feed_data(data)
registry_1 | File "aiohttp/_http_parser.pyx", line 523, in aiohttp._http_parser.HttpParser.feed_data
registry_1 | aiohttp.http_exceptions.BadStatusLine: invalid HTTP method
registry_1 | 2018-12-28 12:09:42,407 - platform_registry_api.api - DEBUG - registry request: <Request GET /v2/ >; headers: <CIMultiDictProxy('Host': 'localhost:5000', 'User-Agent': 'docker/18.09.0-ce go/go1.11.2 git-commit/4d60db472b kernel/4.19.4-arch1-1-ARCH os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.0-ce \\(linux\\))', 'Accept-Encoding': 'gzip', 'Connection': 'close')>
in PR #34 we hard-coded number of images 1000
that the platform receives when requested _catalog
(discussion in Russian https://neuromation.slack.com/archives/CE76BT03G/p1547576598103500).
This means, when this limit exceeded the only first 1000 images will be seen. Instead, the registry-api should request the docker-registry iteratively until there are images to see. Note, we should use GET params n
, b
and last
(see https://docs.docker.com/registry/spec/api/#catalog)
We cannot build the images using Kaniko with the nevest version of platformregistery
on AWS clusters.
The problem appears when the Kaniko pulls cached layer from our registry to speedup the image build.
At this step, our registry fails with error:
2021-06-29 07:55:44,393 - aiohttp.server - ERROR - Error handling request
Traceback (most recent call last):
File "/root/.local/lib/python3.7/site-packages/aiohttp/web_protocol.py", line 422, in _handle_request
resp = await self._request_handler(request)
File "/root/.local/lib/python3.7/site-packages/sentry_sdk/integrations/aiohttp.py", line 123, in sentry_app_handle
reraise(*_capture_exception(hub))
File "/root/.local/lib/python3.7/site-packages/sentry_sdk/_compat.py", line 54, in reraise
raise value
File "/root/.local/lib/python3.7/site-packages/sentry_sdk/integrations/aiohttp.py", line 113, in sentry_app_handle
response = await old_handle(self, request)
File "/root/.local/lib/python3.7/site-packages/aiohttp/web_app.py", line 499, in _handle
resp = await handler(request)
File "/root/.local/lib/python3.7/site-packages/aiohttp/web_middlewares.py", line 119, in impl
return await handler(request)
File "/root/.local/lib/python3.7/site-packages/aiohttp_remotes/x_forwarded.py", line 94, in middleware
return await handler(request)
File "/root/.local/lib/python3.7/site-packages/aiohttp/web_middlewares.py", line 119, in impl
return await handler(request)
File "/root/.local/lib/python3.7/site-packages/platform_registry_api/api.py", line 599, in handle
auth_headers=auth_headers,
File "/root/.local/lib/python3.7/site-packages/platform_registry_api/api.py", line 711, in _proxy_request
async for chunk in client_response.content.iter_any():
File "/root/.local/lib/python3.7/site-packages/aiohttp/streams.py", line 39, in __anext__
rv = await self.read_func()
File "/root/.local/lib/python3.7/site-packages/aiohttp/streams.py", line 386, in readany
raise self._exception
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
Which means that the image layer was not download compeletely and, as a result, Kaniko fails on the layer checksum verification.
Example of Kaniko error message:
INFO[0153] Found cached layer, extracting to filesystem
error building image: error building stage: failed to execute command: extracting fs from image: error verifying sha256 checksum; got "9a593afe07a07cc862136409824a9b391a352945a4dcbbd7de8084ab8b13d572", want "92c6c2f9925e03fe28f4e343e6aa128aeb78eccc59c50589cefa15db665857c8"
This problem is not reproducible with platform registry api v21.2.11
.
As a result, we downgraded our platformregistryapi on AWS clusters from v21.4.26 to v21.2.11.
IDK yet, what might be the reason for this.
When you delete the last tag of image, the image itself should not be displayed in neuro image ls
.
This holds only for cluster deployments, where do not use managed image registry.
@anayden you told we have enabled garbage collector, but I didn't found where is it.
Could you elaborate, please?
We were observing significant memory leaks while putting relatively heavy load on the registry.
cc @asvetlov
Currently we request a GCR oauth2 token each time we receive a request. We should try to cache tokens per scope and invalidate according to the expiration information in the token payload.
It is not clear how to force the docker client to pass the specified token as the Bearer token value.
It might be possible to respond 401 with WWW-Auth: Bearer realm=Docker Registry
instead of WWW-Auth: Basic realm=Docker Registry
as we do now.
this is what Azure sends:
{"repositories":null}
rpc error: code = Unknown desc = Error response from daemon: error parsing HTTP 403 response body: invalid character \':\' after top-level value: "403: Forbidden"
Purpose: improve reproducibility, give user more control over images.
$ neuro pull tomcat@sha256:c34ce3c1fcc0c7431e1392cc3abd0dfe2192ffea1898d5250f199d3ac8d8720f
...
$ neuro images --digests
image:tomcat:latest sha256:c34ce3c1fcc0c7431e1392cc3abd0dfe2192ffea1898d5250f199d3ac8d8720f
Idea: by @adavydow
Dependabot couldn't authenticate with https://devpi-dev.neu.ro/testuser/dev/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
In AWS, a repo can exist after the last tag was removed. This leads to "unremovable" images in neuro image ls
. To overcome this problem, this fix was introduced, but it can lead to problems if the user tries to list tags before they upload the first image, as it will drop repo.
The solution is to add a special endpoint to remove the repo and use it in the client when neuro image rm
was called for AWS ECR repo.
Dependabot couldn't authenticate with https://pypi.python.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
Closes #354
AWS ECR API has two methods: batch_delete_image
and delete_repository
. The latter one should be called on image delete when there's no images left in the repo.
From @zubenkoivan:
In our tests we pushed image with name neuro-f7b0ba1a9278974e-1/platform-e2e--6ad98d85-5516-492c-bd99-13f438aff0c1-date202106241254-date
- this test is failing in AWS compute clusters, since AWS registry changed its API. Now it forbits two dashes together --
in the image name.
Therefore, we need:
The following areas need to be covered:
/v2
;/v2/_catalog/
;/v2/<name>/...
.curl -vv -u reg:$REGISTRY_TOKEN https://registry-staging.neu.ro/v2/dalazx/python/tags/list | jq
{
"child": [],
"manifest": {
"sha256:32e8ae6d055cd64e02655d2617c0863c948d204a7ed5f1c8a149a34020c5b210": {
"imageSizeBytes": "352124901",
"layerId": "",
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"tag": [
"3.6.6-stretch"
],
"timeCreatedMs": "1532541907437",
"timeUploadedMs": "1558971642550"
}
},
"name": "light-reality-205619/dalazx/python",
"tags": [
"3.6.6-stretch"
]
}
we should be removing light-reality-205619/
^
note that the fix may require implementing a dedicated handler.
We want to add support of extended ACL where cluster names are in the authority component of permission URIs.
This functionality should be optional. We need to introduce a feature flag set by an environment variable.
The cluster name should also be passed via an env var and should be required if the feature flag is on.
We set sock_read
to 30
seconds, but if it's a push request, we perform almost no reads, but the timeout is still applied. we should set sock_read
timeout to None
in this case.
Use some hardcoded credentials for now.
E2E tests have to be updated.
Let's implement next syntax:
neuro image remove image://user/image1:tag1
neuro image remove --all-tags image://user/image1
Looks like our ingress (traefik) is loosing some important headers prior delivering requests to our services.
Specifically, the Authorization
header in PATCH
requests is missing.
I used the same docker client, but different servers:
the server I ran locally was receiving the required header successfully:
2018-09-05 19:41:12,005 - platform_registry_api.api - DEBUG - registry request:
<Request PATCH /v2/ubuntu/blobs/uploads/APy59-BfDUiFCq_0jeEL1aldt-l56VrWlk9kCtkgNCPw9QuHBjyDpw3IYi-Zg6Oo6ZiyydOKZ5jxtX0Ki5qddC4 >;
headers: <CIMultiDictProxy('Host': 'localhost:5000',
'User-Agent': 'docker/18.03.1-ce go/go1.9.5 git-commit/9ee9f40 kernel/4.9.87-linuxkit-aufs os/linux arch/amd64 UpstreamClient(Docker-Client/18.03.1-ce \\(darwin\\))',
'Transfer-Encoding': 'chunked',
'Authorization': 'Basic TOKEN',
'Accept-Encoding': 'gzip',
'Connection': 'close')>
But the server deployed in K8S (in both envs) was not:
2018-09-05 19:34:27,523 - platform_registry_api.api - DEBUG - registry request: <Request PATCH /v2/ubuntu/blobs/uploads/APy59-Aj-oUGoSQzTfIpE0QTJWrDLKF-pramJVVXdqlThGedltaAe_TdATNyyVsNrz3AcOR69HH37ZIFmgtL5rw >;
headers: <CIMultiDictProxy('Host': 'registry.dev.neuromation.io',
'User-Agent': 'docker/18.03.1-ce go/go1.9.5 git-commit/9ee9f40 kernel/4.9.87-linuxkit-aufs os/linux arch/amd64 UpstreamClient(Docker-Client/18.03.1-ce \\(darwin\\))',
'Transfer-Encoding': 'chunked',
'Accept-Encoding': 'gzip',
'X-Forwarded-For': '10.128.0.9',
'X-Forwarded-Host':'registry.dev.neuromation.io',
'X-Forwarded-Port': '80',
'X-Forwarded-Proto': 'http',
'X-Forwarded-Server': 'traefik-548fc44c6c-hpj2w',
'X-Real-Ip': '10.128.0.9')>
HEAD /v2/dalazx/python/blobs/sha256:c4eb586021290cde57d793b8b8a1248272bc7ff438999cecafc37cf2c09650ad
is failing with
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 757, in start
message, payload = await self._protocol.read() File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 543, in read
await self._waiter File "/usr/local/lib/python3.6/site-packages/aiohttp/client_proto.py", line 195, in data_received messages, upgraded, tail = self._parser.feed_data(data) File "aiohttp/_http_parser.pyx", line 523, in aiohttp._http_parser.HttpParser.feed_data aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid character in chunk size header' The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 390, in start resp = await self._request_handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_app.py", line 366, in _handle
resp = await handler(request) File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 106, in impl return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp_remotes/x_forwarded.py", line 73, in middleware
return await handler(request) File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 106, in impl return await handler(request)
File "/neuromation/platform_registry_api/api.py", line 307, in handle request, url_factory=url_factory, url=upstream_repo_url.url, token=token File "/neuromation/platform_registry_api/api.py", line 352, in _proxy_request timeout=timeout,
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 855, in __aenter__ self._resp = await self._coro
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 391, in _request await resp.start(conn) File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 762, in start message=exc.message, headers=exc.headers) from exc aiohttp.client_exceptions.ClientResponseError: 400, message='invalid character in chunk size header'
Currently our catalog route returns image URIs which is inconsistent to the API spec:
{
"repositories": [
"image://adavydow-ssh",
"image://adavydow/adavydow-ssh",
"image://adavydow/artem_sftp_client",
"image://adavydow/artem_ssh",
see https://docs.docker.com/registry/spec/api/#listing-repositories
The method AWSECRUpstream.create_repo
uses variable _existing_repos
to cache already created repositories. But as it is simple in memory set, it can contain repo that was deleted. For example, suppose the following:
foo
-> it's created in ECR -> 'foo'
is stored in _existing_repos
foo
from ECRfoo
-> it's in _existing_repos
, so no repo is created on ECR -> request fails with error.Example error from CI run:
name unknown: The repository with name 'dev/neuro-cli-e2e/e2e-banana-image' does not exist in the registry with id '771188043543'
use ServerConfig.name
.
We need to enforce repo namespacing so that a user is allowed to manage their images only within their namespace corresponding to the username.
if a user has a username testuser
, the following requests should be accepted:
docker pull registry.staging.neuromation.io/testuser/name:tag
docker push registry.staging.neuromation.io/testuser/name:tag
whereas
docker pull registry.staging.neuromation.io/anotheruser/name:tag
docker push registry.staging.neuromation.io/anotheruser/name:tag
are not.
Here we set up basic authorization:
https://github.com/neuromation/platform-registry-api/blob/de8ab633bb79fbbfecc7e5e5894b0c044ba4c9fa/platform_registry_api/user.py#L24
Although most of our services use Bearer authorization.
The functionality is crucial for writing CLI E2E tests.
Without the deletion, tests will dump dev registry with temporary images.
Implement missing catalog request:
Task can implemented in step based approach with the very first version to list only user specific images, and the next step / enhancement to implement shared containers. The latter can be useful as it would give powerful way to share pre-defined/pre-built images that contains all required libs, and would thus reduce amount of work required for typical research lifecycle.
It is possible to upload and download images but it is not possible to remove them.
See
https://docs.docker.com/registry/spec/api/#pagination
We cannot afford retrieving the complete list of images from GCR.
401, 403 etc
Currently we implemented pagination only for the list of repos. List of tags is just cut after 100 items.
https://docs.docker.com/registry/spec/api/#listing-image-tags
It is easier to implement pagination for the list of tags because we should not filter it.
The Oracle Cloud Registry requires the scope actions to be passed explicitly.
Instead of *
they require push,pull
for repositories.
They also do not support the registry scopes whatsoever.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.