redboxllc / scuttle Goto Github PK
View Code? Open in Web Editor NEWThis project forked from monzo/envoy-preflight
A wrapper for applications to help with running Istio Sidecars
License: MIT License
This project forked from monzo/envoy-preflight
A wrapper for applications to help with running Istio Sidecars
License: MIT License
We have a pod that runs tests (main container) and an istio sidecar. Once scuttle is added to the flow, even if the test container has failed tests, the entire pod enters 'Completed' state as appose to an 'Error' state (which occurs if scuttle is not present).
Is it possible to propagate the status of the failed test container to the pod?
e.g
NAME READY STATUS RESTARTS AGE
test 0/2 Error 0 38m
and not:
NAME READY STATUS RESTARTS AGE
test 0/2 Completed 0 38m
error as seen in pod:
containerStatuses:
- containerID: docker://...
image: istio/proxyv2:1.5.8
name: istio-proxy
state:
terminated:
containerID: docker://...
exitCode: 0
finishedAt: "2021-01-19T09:36:40Z"
reason: Completed
startedAt: "2021-01-19T09:07:19Z"
- containerID: docker://...
name: test
state:
terminated:
containerID: docker://...
exitCode: 120
finishedAt: "2021-01-19T09:36:40Z"
reason: Error
startedAt: "2021-01-19T09:07:19Z"
lean pod yaml:
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
sidecar.istio.io/inject: "true"
restartPolicy: Never
containers:
- name: test
command: ["scuttle", "/bin/sh", "-c"]
env:
- name: ENVOY_ADMIN_API
value: "http://127.0.0.1:15000"
- name: ISTIO_QUIT_API
value: "http://127.0.0.1:15000"
Hi,
We sometimes (maybe every 100th run) get logs like this for our CronJob-created pods:
2020-06-13T14:15:06.871856039Z scuttle: Logging is now enabled
2020-06-13T14:15:06.871942104Z scuttle: Blocking until envoy starts
2020-06-13T14:15:09.820310817Z scuttle: Blocking finished, envoy has started
2020-06-13T14:15:09.832039127Z scuttle: Received exit signal, exiting
When this happens it leaves istio-proxy
running for some reason. This results in the job never finishing. Which means a new job will never start (because we don't allow more than one running job).
As you can see from the log messages above it seems that scuttle exits pretty much immediately after envoy has started
is logged.
We're running Scuttle 1.2.3, Istio 1.5 and Kubernetes 1.16.
Looking in main.go I noticed this comment:
// Signal received before the process even started. Let's just exit.
And that does indeed seem to be the case.
I can't find any relevant istio-proxy logs more than this line: 2020-06-13T14:15:08.986202Z info Envoy proxy is ready
Does anyone have similar experiences? What can be done to fix this?
I'm not fluent in golang so I don't feel comfortable digging further into the code / submitting a PR, but some suggestions maybe to help with debugging these kinds of issues:
err == nil
case and one for the errors.Is(err, context.Canceled)
case. Add as much context as possible to the logging.Signal received before the process even started. Let's just exit.
into log message)@linjmeyer What are your thoughts? Do you have some experience in debugging this? (I noticed the same log message in #23)
I suggest defaulting ENVOY_ADMIN_API
to http://localhost:15000
to satisfy the most common case.
A security scan found that the Scuttle 1.3.6 binary is using version 0.3.0 of x/text, which is vulnerable to CVE-2020-14040. Could x/text be updated to at least version 0.3.3?
We are seeing a strange issue with scuttle in which we are seeing a mysterious signal "urgent i/o condition" being sent to scuttle and causing envoy / istio sidecar to stop as soon as it's started. The issue is summarized with these lines lines - the first showing the istio-proxy is read, the second scuttle acknowledging this, the 3rd, getting the signal and the rest showing that it's quitting.
2021-03-25 11:06:37.934 istio-proxy 2021-03-25T11:06:37.934856Z info Envoy proxy is ready
2021-03-25 11:06:38.267 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Blocking finished, Envoy has started
2021-03-25 11:06:38.295 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Received signal 'urgent I/O condition', exiting
2021-03-25 11:06:38.295 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Kill received: (Action: Stopping Istio with API, Reason: ISTIO_QUIT_API is set, Exit Code: 1)
2021-03-25 11:06:38.295 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Stopping Istio using Istio API 'http://127.0.0.1:15000' (intended for Istio >v1.2)
2021-03-25 11:06:38.316 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Sent quitquitquit to Istio, status code: 200
Here is an full export of the logs from the node, and container as well as our istiod pods:
@timestamp Node / container logs Log
2021-03-25 11:06:21.123 ec2node-x ena 0000:00:07.0 eth2: Local page cache is disabled for less than 16 channels
2021-03-25 11:06:26.893 ec2node-x http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
2021-03-25 11:06:32.581 ec2node-x I0325 11:06:32.581736 7921 setters.go:77] Using node IP: "10.234.36.215"
2021-03-25 11:06:33.862 ec2node-x {"level":"info","ts":"2021-03-25T11:06:33.862Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.5 ..."}
2021-03-25 11:06:33.933 ec2node-x I0325 11:06:33.932823 7921 prober.go:124] Readiness probe for "e24a559ffa452bb7e284a4f3690fa5d3_namespace(4e64d45d-52ea-4c19-a4b2-cda5b1b72c09):istio-proxy" failed (failure): Get http://100.66.110.107:15021/healthz/ready: dial tcp 100.66.110.107:15021: connect: connection refused
2021-03-25 11:06:35.933 ec2node-x I0325 11:06:35.932833 7921 prober.go:124] Readiness probe for "e24a559ffa452bb7e284a4f3690fa5d3_namespace(4e64d45d-52ea-4c19-a4b2-cda5b1b72c09):istio-proxy" failed (failure): Get http://100.66.110.107:15021/healthz/ready: dial tcp 100.66.110.107:15021: connect: connection refused
2021-03-25 11:06:36.736 istio-proxy 2021-03-25T11:06:36.735981Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2021-03-25 11:06:36.736 istio-proxy 2021-03-25T11:06:36.736016Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2021-03-25 11:06:36.736 istio-proxy 2021-03-25T11:06:36.736506Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2021-03-25 11:06:36.736 istio-proxy 2021-03-25T11:06:36.736543Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2021-03-25 11:06:36.765 istio-proxy 2021-03-25T11:06:36.765549Z info xdsproxy Envoy ADS stream established
2021-03-25 11:06:36.765 istio-proxy 2021-03-25T11:06:36.765651Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012
2021-03-25 11:06:36.767 istio-proxy 2021-03-25T11:06:36.767262Z warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2021-03-25 11:06:36.777 discovery 2021-03-25T11:06:36.777047Z info ads ADS: new connection for node:sidecar~100.66.110.107~e24a559ffa452bb7e284a4f3690fa5d3.namespace~namespace.svc.cluster.local-5
2021-03-25 11:06:36.780 discovery 2021-03-25T11:06:36.780741Z info ads CDS: PUSH for node:e24a559ffa452bb7e284a4f3690fa5d3.namespace resources:196
2021-03-25 11:06:36.846 discovery 2021-03-25T11:06:36.846468Z info ads EDS: PUSH for node:e24a559ffa452bb7e284a4f3690fa5d3.namespace resources:133 empty:0 cached:133/133
2021-03-25 11:06:36.854 istio-proxy 2021-03-25T11:06:36.854774Z info sds resource:default new connection
2021-03-25 11:06:36.854 istio-proxy 2021-03-25T11:06:36.854786Z info sds resource:ROOTCA new connection
2021-03-25 11:06:36.854 istio-proxy 2021-03-25T11:06:36.854832Z info sds Skipping waiting for gateway secret
2021-03-25 11:06:36.854 istio-proxy 2021-03-25T11:06:36.854837Z info sds Skipping waiting for gateway secret
2021-03-25 11:06:36.933 istio-proxy 2021-03-25T11:06:36.933776Z info cache Root cert has changed, start rotating root cert for SDS clients
2021-03-25 11:06:36.933 istio-proxy 2021-03-25T11:06:36.933801Z info cache GenerateSecret default
2021-03-25 11:06:36.934 istio-proxy 2021-03-25T11:06:36.934135Z info sds resource:default pushed key/cert pair to proxy
2021-03-25 11:06:37.055 istio-proxy 2021-03-25T11:06:37.054956Z info cache Loaded root cert from certificate ROOTCA
2021-03-25 11:06:37.055 istio-proxy 2021-03-25T11:06:37.055140Z info sds resource:ROOTCA pushed root cert to proxy
2021-03-25 11:06:37.083 discovery 2021-03-25T11:06:37.082941Z info ads LDS: PUSH for node:e24a559ffa452bb7e284a4f3690fa5d3.namespace resources:163
2021-03-25 11:06:37.336 discovery 2021-03-25T11:06:37.332927Z info ads RDS: PUSH for node:e24a559ffa452bb7e284a4f3690fa5d3.namespace resources:60
2021-03-25 11:06:37.934 istio-proxy 2021-03-25T11:06:37.934856Z info Envoy proxy is ready
2021-03-25 11:06:38.267 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Blocking finished, Envoy has started
2021-03-25 11:06:38.295 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Received signal 'urgent I/O condition', exiting
2021-03-25 11:06:38.295 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Kill received: (Action: Stopping Istio with API, Reason: ISTIO_QUIT_API is set, Exit Code: 1)
2021-03-25 11:06:38.295 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Stopping Istio using Istio API 'http://127.0.0.1:15000' (intended for Istio >v1.2)
2021-03-25 11:06:38.316 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Sent quitquitquit to Istio, status code: 200
2021-03-25 11:06:38.318 discovery 2021-03-25T11:06:38.318887Z info ads ADS: "100.66.110.107:58984" sidecar~100.66.110.107~e24a559ffa452bb7e284a4f3690fa5d3.namespace~namespace.svc.cluster.local-5 terminated with stream closed
2021-03-25 11:06:38.318 istio-proxy 2021-03-25T11:06:38.318183Z warning envoy config StreamAggregatedResources gRPC config stream closed: 13,
2021-03-25 11:06:38.318 istio-proxy 2021-03-25T11:06:38.318470Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012
2021-03-25 11:06:38.319 istio-proxy 2021-03-25T11:06:38.319013Z warning envoy config StreamSecrets gRPC config stream closed: 13,
2021-03-25 11:06:38.319 istio-proxy 2021-03-25T11:06:38.319029Z warning envoy config StreamSecrets gRPC config stream closed: 13,
2021-03-25 11:06:38.319 istio-proxy 2021-03-25T11:06:38.319067Z info sds resource:ROOTCA connection is terminated: rpc error: code = Canceled desc = context canceled
2021-03-25 11:06:38.319 istio-proxy 2021-03-25T11:06:38.319086Z error sds Remote side closed connection
2021-03-25 11:06:38.319 istio-proxy 2021-03-25T11:06:38.319067Z info sds resource:default connection is terminated: rpc error: code = Canceled desc = context canceled
2021-03-25 11:06:38.319 istio-proxy 2021-03-25T11:06:38.319115Z error sds Remote side closed connection
2021-03-25 11:06:38.402 istio-proxy 2021-03-25T11:06:38.402249Z info Epoch 0 exited normally
2021-03-25 11:06:38.402 istio-proxy 2021-03-25T11:06:38.402272Z info No more active epochs, terminating
2021-03-25 11:06:38.460 ec2node-x time="2021-03-25T11:06:38.460784673Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
2021-03-25 11:06:38.680 e24a559ffa452bb7e284a4f3690fa5d3 2021-03-25T11:06:38Z scuttle: Received signal 'urgent I/O condition', passing to child
2021-03-25 11:06:38.872 ec2node-x {"level":"info","ts":"2021-03-25T11:06:38.872Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.5 ..."}
2021-03-25 11:06:38.991 ec2node-x I0325 11:06:38.990897 7921 kubelet.go:1960] SyncLoop (PLEG): "e24a559ffa452bb7e284a4f3690fa5d3_namespace(4e64d45d-52ea-4c19-a4b2-cda5b1b72c09)", event: &pleg.PodLifecycleEvent{ID:"4e64d45d-52ea-4c19-a4b2-cda5b1b72c09", Type:"ContainerDied", Data:"80a75a8a226d881053607260c8837ac9c1627a888b1443aa4186528c22219898"}
2021-03-25 11:06:38.991 ec2node-x I0325 11:06:38.991714 7921 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 80a75a8a226d881053607260c8837ac9c1627a888b1443aa4186528c22219898
We are using Scuttle v1.3.1.
My understanding is this signal is
SIGURG P2001 Ign Urgent condition on socket (4.2BSD)
from the man pages.
I am trying to get as much information as I can to understand this issue, but ultimately I would like to know if it is it possible to ignore this signal? Or increase logging to determine where it's receiving this from?
Any background on why this would occur is appreciated.
Please let me know,
Thank you!
Since I put scuttle
in front of my program it would make sense that it does its job assuming an up-to-date version of istio.
executing a docker image built for arm64 (aka intel) architecture on arm64 (aka m1, m2) macs is very slow, as to become un-practical.
Docker allows a single docker image to contains “binaries” for multiple architecture and choose at build/run time the appropriate image.
#56 offers a solution.
In a CronJob where the application ran under scuttle, istio envoy proxy started and the job ran. Primary application exited and scuttle logged that it was posting the quitquitquit api command to envoy. However it appears that the istio proxy had, for whatever reason, already terminated. Scuttle continued to probe the missing envoy API for hours, keeping the Job running and blocking further executions.
Hi @ALL,
I want to install the package scuttle, but it occurred error like below:
What happen to my R ?
why its head file beachmat3/beachmat.h not found? it make me confused.
I try anyways that i can do had this question unsolved.
Any kindly help would be appreciated.
Best
This line is needed; https://github.com/redboxllc/scuttle/pull/4/files#diff-b8f85395a93f7ac963c933e9d844fd93R10 in the Dockerfile you build of your service; otherwise you an obscure error about "standard-init"
Is there plans to create a release in the future with the latest version of go?
These env vars:
env:
- name: ISTIO_QUIT_API
value: http://127.0.0.1:15020
- name: ENVOY_ADMIN_API
value: http://127.0.0.1:15000
works on istio 1.3.3
But without ENVOY_ADMIN_API, it hangs. Note its port, which is different from the README's port of Envoy's admin API.
We encountered a very interesting case the other day at work.
*This happened using Istio 1.3.3 but it affects also other versions of Istio.
In a big cluster with lots of namespaces and services, when a Job was started using scuttle to wrap the main process, the Envoy process of the istio-proxy
sidecar was OOM killed. Despite this, scuttle
kept waiting for ever. So I suggest that we add an environment variable that instructs scuttle
to exit if Envoy
hasn't become available within that timeframe.
I tried setting the environment variable QUIT_WITHOUT_ENVOY_TIMEOUT
however it appears that it is not being used. This is done in a Docker like the following;
ENV QUIT_WITHOUT_ENVOY_TIMEOUT=15s
ENTRYPOINT ["/path/to/scuttle", "..."]
All I get is the following message indefinitely (NB: Envoy is not running)
2020-11-10T17:11:40Z scuttle: Scuttle 1.3.1 starting up, pid 1
2020-11-10T17:11:40Z scuttle: Logging is now enabled
2020-11-10T17:11:40Z scuttle: Blocking until Envoy starts
Proposal:
sharedNamespace
requirement when using pkill on Kubernetes to docs.Current the ENVOY_ADMIN_API
is used to prevent application startup until Istio/Envoy is ready. Istio itself has a readiness probe (used by k8s healthchecks) at http://127.0.0.1:15020/healthz/ready
. Looking at the master branch of Istio this endpoint verifies the Istio Admin API is ready (or it will time out), and internally verifies Envoy is operational.
I think using Istio's readiness endpoint makes more sense as we can be more confident that not only Envoy is ready, but Istio itself is too. Most likely this would mean deprecating ENVOY_ADMIN_API
as the name of the variable wouldn't make sense.
Hi folks,
Do you have a contribution guideline for things like:
Thanks!
I have a use case that may be pertinent for others interested in using scuttle for their project. My use case revolves around software that wants to use scuttle conditionally, depending on whether Istio sidecar injection is installed or not, without the software (Kubernetes resources, Helm chart, etc.) having any knowledge about Istio being present or not.
ENVOY_ADMIN_API
and ISTIO_QUIT_API
).istio-proxy
is present.istio-proxy
container is present in the Pod, scuttle will block and wait for ENVOY_ADMIN_API
to respond, per its current behavior.istio-proxy
container is not present in the Pod, scuttle will effectively disable itself, as if the ENV vars were never provided, per its current behavior.This I feel makes scuttle more robust in environments that cannot dynamically configure their ENV vars or command arguments based on the pre-determined knowledge that Istio is present or not, and yet need a solution to properly wait for the istio-proxy
sidecar to be up and ready to go before they begin network activity.
I hope this makes sense. This could be a good first contribution by someone (like myself), and this heuristic check can itself be conditional for starters. It will involve importing the client-go Kubernetes client and doing intra-cluster inspection of its own Pod. Technically the shareProcessNamespace
would allow for a solution without a kube-apiserver client, but I think it's more elegant to inspect the workload metadata versus the OS-level namespace.
BONUS: This same heuristic could be used to intelligently enable/disable a default value for ENVOY_ADMIN_API
and ISTIO_QUIT_API
. As noted in those two issues (#9 and #12), it is a concern that having those variables default to a value will make it unpleasant to run the container in a testing or development fashion. Intraspecting the Kubernetes workload (if Kubernetes even exists, and the istio-proxy sidecar exists) to flip on the default value might be a good approach to the problem.
For open sourcing this project and providing a blog post about how to use it! It's very appreciated! ❤️
This is with Scuttle v1.2.1 (the latest as of this writing).
https://github.com/redboxllc/scuttle/blob/master/main.go#L135
scuttle: Stopping Istio using Istio API 'http://127.0.0.1:15020' (intended for Istio >v1.2)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x7c2c07]
goroutine 1 [running]:
main.killIstioWithAPI()
/home/runner/work/scuttle/scuttle/main.go:135 +0x287
main.kill(0x0)
/home/runner/work/scuttle/scuttle/main.go:110 +0xcc
main.main()
/home/runner/work/scuttle/scuttle/main.go:84 +0x31f
Attempting to dereference resp
which is nil.
Istio 1.3 introduced the /quitquitquit
endpoint for triggering an Istio shutdown. Prior to that a pkill
command had to be used. Istio 1.3 is now end of life, I think we can remove the pkill
shutdown feature as all supported version of Istio have the better option of using the quit endpoint.
When something goes wrong with scuttle, Istio or the underlying executable being run you can get logs like:
scuttle: Logging is now enabled
scuttle: Blocking until envoy starts
scuttle: Blocking finished, envoy has started
scuttle: Received exit signal, exiting
Timestamps would help understand what happened with logs like this. In the above it's not possible to determine if scuttle was hung, for how long it ran, etc.
Scuttle doesn't use a real logging package/framework right now, it may be better long term to investigate what logging options exist in the go ecosystem.
Istio 1.5 comes with a lot of changes, particularly around it's architecture and the merging of several services into one binary.
It should be verified that Scuttle is compatible with Istio 1.5, and if not any bugs/changes should be raised as new GitHub issues.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.