redboxllc / scuttle Goto Github PK

View Code? Open in Web Editor NEW

This project forked from monzo/envoy-preflight

116.0 116.0 25.0 2.25 MB

A wrapper for applications to help with running Istio Sidecars

License: MIT License

Go 95.62% Dockerfile 1.47% Shell 2.91%

hacktoberfest istio k8s kubernetes

scuttle's People

Contributors

Stargazers

Watchers

scuttle's Issues

Propagate container failure to pod status

We have a pod that runs tests (main container) and an istio sidecar. Once scuttle is added to the flow, even if the test container has failed tests, the entire pod enters 'Completed' state as appose to an 'Error' state (which occurs if scuttle is not present).
Is it possible to propagate the status of the failed test container to the pod?
e.g

NAME    READY   STATUS      RESTARTS   AGE
test      0/2           Error                  0          38m

and not:

NAME     READY   STATUS      RESTARTS   AGE
test           0/2      Completed             0          38m

error as seen in pod:

  containerStatuses:
  - containerID: docker://...
    image: istio/proxyv2:1.5.8
    name: istio-proxy
    state:
      terminated:
        containerID: docker://...
        exitCode: 0
        finishedAt: "2021-01-19T09:36:40Z"
        reason: Completed
        startedAt: "2021-01-19T09:07:19Z"
  - containerID: docker://...
    name: test
    state:
      terminated:
        containerID: docker://...
        exitCode: 120
        finishedAt: "2021-01-19T09:36:40Z"
        reason: Error
        startedAt: "2021-01-19T09:07:19Z"

lean pod yaml:

apiVersion: v1
kind: Pod
metadata:
  name: test
  annotations:
    sidecar.istio.io/inject: "true"
  restartPolicy: Never
  containers:
    - name: test
      command: ["scuttle", "/bin/sh", "-c"] 
     env:
        - name: ENVOY_ADMIN_API
          value: "http://127.0.0.1:15000"
        - name: ISTIO_QUIT_API
          value: "http://127.0.0.1:15000"

Feature Request: Dockerfile/image

Repo should be updated with a Dockerfile with scuttle CLI
CI/CD for build + publish of docker image needs to be investigated

istio-proxy left running without app container ever starting - stuck pods

Hi,

We sometimes (maybe every 100th run) get logs like this for our CronJob-created pods:

2020-06-13T14:15:06.871856039Z scuttle: Logging is now enabled
2020-06-13T14:15:06.871942104Z scuttle: Blocking until envoy starts
2020-06-13T14:15:09.820310817Z scuttle: Blocking finished, envoy has started
2020-06-13T14:15:09.832039127Z scuttle: Received exit signal, exiting

When this happens it leaves istio-proxy running for some reason. This results in the job never finishing. Which means a new job will never start (because we don't allow more than one running job).

As you can see from the log messages above it seems that scuttle exits pretty much immediately after envoy has started is logged.

We're running Scuttle 1.2.3, Istio 1.5 and Kubernetes 1.16.

Looking in main.go I noticed this comment:
// Signal received before the process even started. Let's just exit.

And that does indeed seem to be the case.

I can't find any relevant istio-proxy logs more than this line: 2020-06-13T14:15:08.986202Z info Envoy proxy is ready

Does anyone have similar experiences? What can be done to fix this?

I'm not fluent in golang so I don't feel comfortable digging further into the code / submitting a PR, but some suggestions maybe to help with debugging these kinds of issues:

Split the following if statement into two separate ones: https://github.com/redboxllc/scuttle/blob/v1.3.0/main.go#L43 - One for the err == nil case and one for the errors.Is(err, context.Canceled) case. Add as much context as possible to the logging.
More detailed logging here: https://github.com/redboxllc/scuttle/blob/v1.3.0/main.go#L70 (which signal was received, move comment Signal received before the process even started. Let's just exit. into log message)

@linjmeyer What are your thoughts? Do you have some experience in debugging this? (I noticed the same log message in #23)

Default for ENVOY_ADMIN_API

I suggest defaulting ENVOY_ADMIN_API to http://localhost:15000 to satisfy the most common case.

Update to x/text 0.3.3

A security scan found that the Scuttle 1.3.6 binary is using version 0.3.0 of x/text, which is vulnerable to CVE-2020-14040. Could x/text be updated to at least version 0.3.3?

Unknown SIGURG (urgent I/O condition) signal causing Scuttle / Istio to die

We are seeing a strange issue with scuttle in which we are seeing a mysterious signal "urgent i/o condition" being sent to scuttle and causing envoy / istio sidecar to stop as soon as it's started. The issue is summarized with these lines lines - the first showing the istio-proxy is read, the second scuttle acknowledging this, the 3rd, getting the signal and the rest showing that it's quitting.

2021-03-25 11:06:37.934	istio-proxy	2021-03-25T11:06:37.934856Z info Envoy proxy is ready 
2021-03-25 11:06:38.267	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Blocking finished, Envoy has started 
2021-03-25 11:06:38.295	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Received signal 'urgent I/O condition', exiting 
2021-03-25 11:06:38.295	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Kill received: (Action: Stopping Istio with API, Reason: ISTIO_QUIT_API is set, Exit Code: 1) 
2021-03-25 11:06:38.295	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Stopping Istio using Istio API 'http://127.0.0.1:15000' (intended for Istio >v1.2) 
2021-03-25 11:06:38.316	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Sent quitquitquit to Istio, status code: 200

Here is an full export of the logs from the node, and container as well as our istiod pods:

@timestamp	Node / container logs	Log
2021-03-25 11:06:21.123	ec2node-x	ena 0000:00:07.0 eth2: Local page cache is disabled for less than 16 channels
2021-03-25 11:06:26.893	ec2node-x	http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11) 
2021-03-25 11:06:32.581	ec2node-x	I0325 11:06:32.581736    7921 setters.go:77] Using node IP: "10.234.36.215"
2021-03-25 11:06:33.862	ec2node-x	{"level":"info","ts":"2021-03-25T11:06:33.862Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.5 ..."}
2021-03-25 11:06:33.933	ec2node-x	I0325 11:06:33.932823    7921 prober.go:124] Readiness probe for "e24a559ffa452bb7e284a4f3690fa5d3_namespace(4e64d45d-52ea-4c19-a4b2-cda5b1b72c09):istio-proxy" failed (failure): Get http://100.66.110.107:15021/healthz/ready: dial tcp 100.66.110.107:15021: connect: connection refused
2021-03-25 11:06:35.933	ec2node-x	I0325 11:06:35.932833    7921 prober.go:124] Readiness probe for "e24a559ffa452bb7e284a4f3690fa5d3_namespace(4e64d45d-52ea-4c19-a4b2-cda5b1b72c09):istio-proxy" failed (failure): Get http://100.66.110.107:15021/healthz/ready: dial tcp 100.66.110.107:15021: connect: connection refused
2021-03-25 11:06:36.736	istio-proxy	2021-03-25T11:06:36.735981Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size 
2021-03-25 11:06:36.736	istio-proxy	2021-03-25T11:06:36.736016Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size 
2021-03-25 11:06:36.736	istio-proxy	2021-03-25T11:06:36.736506Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size 
2021-03-25 11:06:36.736	istio-proxy	2021-03-25T11:06:36.736543Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size 
2021-03-25 11:06:36.765	istio-proxy	2021-03-25T11:06:36.765549Z info xdsproxy Envoy ADS stream established 
2021-03-25 11:06:36.765	istio-proxy	2021-03-25T11:06:36.765651Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012 
2021-03-25 11:06:36.767	istio-proxy	2021-03-25T11:06:36.767262Z warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections 
2021-03-25 11:06:36.777	discovery	2021-03-25T11:06:36.777047Z info ads ADS: new connection for node:sidecar~100.66.110.107~e24a559ffa452bb7e284a4f3690fa5d3.namespace~namespace.svc.cluster.local-5 
2021-03-25 11:06:36.780	discovery	2021-03-25T11:06:36.780741Z info ads CDS: PUSH for node:e24a559ffa452bb7e284a4f3690fa5d3.namespace resources:196 
2021-03-25 11:06:36.846	discovery	2021-03-25T11:06:36.846468Z info ads EDS: PUSH for node:e24a559ffa452bb7e284a4f3690fa5d3.namespace resources:133 empty:0 cached:133/133 
2021-03-25 11:06:36.854	istio-proxy	2021-03-25T11:06:36.854774Z info sds resource:default new connection 
2021-03-25 11:06:36.854	istio-proxy	2021-03-25T11:06:36.854786Z info sds resource:ROOTCA new connection 
2021-03-25 11:06:36.854	istio-proxy	2021-03-25T11:06:36.854832Z info sds Skipping waiting for gateway secret 
2021-03-25 11:06:36.854	istio-proxy	2021-03-25T11:06:36.854837Z info sds Skipping waiting for gateway secret 
2021-03-25 11:06:36.933	istio-proxy	2021-03-25T11:06:36.933776Z info cache Root cert has changed, start rotating root cert for SDS clients 
2021-03-25 11:06:36.933	istio-proxy	2021-03-25T11:06:36.933801Z info cache GenerateSecret default 
2021-03-25 11:06:36.934	istio-proxy	2021-03-25T11:06:36.934135Z info sds resource:default pushed key/cert pair to proxy 
2021-03-25 11:06:37.055	istio-proxy	2021-03-25T11:06:37.054956Z info cache Loaded root cert from certificate ROOTCA 
2021-03-25 11:06:37.055	istio-proxy	2021-03-25T11:06:37.055140Z info sds resource:ROOTCA pushed root cert to proxy 
2021-03-25 11:06:37.083	discovery	2021-03-25T11:06:37.082941Z info ads LDS: PUSH for node:e24a559ffa452bb7e284a4f3690fa5d3.namespace resources:163 
2021-03-25 11:06:37.336	discovery	2021-03-25T11:06:37.332927Z info ads RDS: PUSH for node:e24a559ffa452bb7e284a4f3690fa5d3.namespace resources:60 
2021-03-25 11:06:37.934	istio-proxy	2021-03-25T11:06:37.934856Z info Envoy proxy is ready 
2021-03-25 11:06:38.267	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Blocking finished, Envoy has started 
2021-03-25 11:06:38.295	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Received signal 'urgent I/O condition', exiting 
2021-03-25 11:06:38.295	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Kill received: (Action: Stopping Istio with API, Reason: ISTIO_QUIT_API is set, Exit Code: 1) 
2021-03-25 11:06:38.295	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Stopping Istio using Istio API 'http://127.0.0.1:15000' (intended for Istio >v1.2) 
2021-03-25 11:06:38.316	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Sent quitquitquit to Istio, status code: 200 
2021-03-25 11:06:38.318	discovery	2021-03-25T11:06:38.318887Z info ads ADS: "100.66.110.107:58984" sidecar~100.66.110.107~e24a559ffa452bb7e284a4f3690fa5d3.namespace~namespace.svc.cluster.local-5 terminated with stream closed 
2021-03-25 11:06:38.318	istio-proxy	2021-03-25T11:06:38.318183Z warning envoy config StreamAggregatedResources gRPC config stream closed: 13,  
2021-03-25 11:06:38.318	istio-proxy	2021-03-25T11:06:38.318470Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012 
2021-03-25 11:06:38.319	istio-proxy	2021-03-25T11:06:38.319013Z warning envoy config StreamSecrets gRPC config stream closed: 13,  
2021-03-25 11:06:38.319	istio-proxy	2021-03-25T11:06:38.319029Z warning envoy config StreamSecrets gRPC config stream closed: 13,  
2021-03-25 11:06:38.319	istio-proxy	2021-03-25T11:06:38.319067Z info sds resource:ROOTCA connection is terminated: rpc error: code = Canceled desc = context canceled 
2021-03-25 11:06:38.319	istio-proxy	2021-03-25T11:06:38.319086Z error sds Remote side closed connection 
2021-03-25 11:06:38.319	istio-proxy	2021-03-25T11:06:38.319067Z info sds resource:default connection is terminated: rpc error: code = Canceled desc = context canceled 
2021-03-25 11:06:38.319	istio-proxy	2021-03-25T11:06:38.319115Z error sds Remote side closed connection 
2021-03-25 11:06:38.402	istio-proxy	2021-03-25T11:06:38.402249Z info Epoch 0 exited normally 
2021-03-25 11:06:38.402	istio-proxy	2021-03-25T11:06:38.402272Z info No more active epochs, terminating 
2021-03-25 11:06:38.460	ec2node-x	time="2021-03-25T11:06:38.460784673Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
2021-03-25 11:06:38.680	e24a559ffa452bb7e284a4f3690fa5d3	2021-03-25T11:06:38Z scuttle: Received signal 'urgent I/O condition', passing to child 
2021-03-25 11:06:38.872	ec2node-x	{"level":"info","ts":"2021-03-25T11:06:38.872Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.5 ..."}
2021-03-25 11:06:38.991	ec2node-x	I0325 11:06:38.990897    7921 kubelet.go:1960] SyncLoop (PLEG): "e24a559ffa452bb7e284a4f3690fa5d3_namespace(4e64d45d-52ea-4c19-a4b2-cda5b1b72c09)", event: &pleg.PodLifecycleEvent{ID:"4e64d45d-52ea-4c19-a4b2-cda5b1b72c09", Type:"ContainerDied", Data:"80a75a8a226d881053607260c8837ac9c1627a888b1443aa4186528c22219898"}
2021-03-25 11:06:38.991	ec2node-x	I0325 11:06:38.991714    7921 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 80a75a8a226d881053607260c8837ac9c1627a888b1443aa4186528c22219898

We are using Scuttle v1.3.1.

My understanding is this signal is

SIGURG       P2001      Ign     Urgent condition on socket (4.2BSD)

from the man pages.

I am trying to get as much information as I can to understand this issue, but ultimately I would like to know if it is it possible to ignore this signal? Or increase logging to determine where it's receiving this from?

Any background on why this would occur is appreciated.

Please let me know,
Thank you!

Default to using ISTIO_QUIT_API=http://127.0.0.1:15020

Since I put scuttle in front of my program it would make sense that it does its job assuming an up-to-date version of istio.

Extend docker image to support arm architecture

executing a docker image built for arm64 (aka intel) architecture on arm64 (aka m1, m2) macs is very slow, as to become un-practical.

Docker allows a single docker image to contains “binaries” for multiple architecture and choose at build/run time the appropriate image.

#56 offers a solution.

scuttle will wait forever if it cannot reach envoy during quit

In a CronJob where the application ran under scuttle, istio envoy proxy started and the job ran. Primary application exited and scuttle logged that it was posting the quitquitquit api command to envoy. However it appears that the istio proxy had, for whatever reason, already terminated. Scuttle continued to probe the missing envoy API for hours, keeping the Job running and blocking further executions.

why the package install occurred error

Hi @ALL,
I want to install the package scuttle, but it occurred error like below:

What happen to my R ?
why its head file beachmat3/beachmat.h not found? it make me confused.
I try anyways that i can do had this question unsolved.
Any kindly help would be appreciated.
Best

Document usage on Alpine

This line is needed; https://github.com/redboxllc/scuttle/pull/4/files#diff-b8f85395a93f7ac963c933e9d844fd93R10 in the Dockerfile you build of your service; otherwise you an obscure error about "standard-init"

Scuttle go dependency

Is there plans to create a release in the future with the latest version of go?

Document: How to use with Istio

These env vars:

        env:
        - name: ISTIO_QUIT_API
          value: http://127.0.0.1:15020
        - name: ENVOY_ADMIN_API 
          value: http://127.0.0.1:15000

works on istio 1.3.3

But without ENVOY_ADMIN_API, it hangs. Note its port, which is different from the README's port of Envoy's admin API.

Add `QUIT_WITHOUT_ENVOY_TIMEOUT` environment variable

We encountered a very interesting case the other day at work.

*This happened using Istio 1.3.3 but it affects also other versions of Istio.

In a big cluster with lots of namespaces and services, when a Job was started using scuttle to wrap the main process, the Envoy process of the istio-proxy sidecar was OOM killed. Despite this, scuttle kept waiting for ever. So I suggest that we add an environment variable that instructs scuttle to exit if Envoy hasn't become available within that timeframe.

QUIT_WITHOUT_ENVOY_TIMEOUT not working

I tried setting the environment variable QUIT_WITHOUT_ENVOY_TIMEOUT however it appears that it is not being used. This is done in a Docker like the following;

ENV QUIT_WITHOUT_ENVOY_TIMEOUT=15s
ENTRYPOINT ["/path/to/scuttle", "..."]

All I get is the following message indefinitely (NB: Envoy is not running)

2020-11-10T17:11:40Z scuttle: Scuttle 1.3.1 starting up, pid 1
2020-11-10T17:11:40Z scuttle: Logging is now enabled
2020-11-10T17:11:40Z scuttle: Blocking until Envoy starts

Feature Request: Update for Istio 1.3 and /quitquitquit endpoint

Istio 1.3 brings a /quitquitquit endpoint similar to Envoy
Feature should be added to support this endpoint

Proposal:

Add env variable for Istio version
When >=1.3 or not set, use /quitquitquit
When <1.3, use pkill
Add Istio version to logging output
Add this to documentation/README. Additionally add the sharedNamespace requirement when using pkill on Kubernetes to docs.

Consider using Istio readiness check instead of Envoy admin API

Current the ENVOY_ADMIN_API is used to prevent application startup until Istio/Envoy is ready. Istio itself has a readiness probe (used by k8s healthchecks) at http://127.0.0.1:15020/healthz/ready. Looking at the master branch of Istio this endpoint verifies the Istio Admin API is ready (or it will time out), and internally verifies Envoy is operational.

I think using Istio's readiness endpoint makes more sense as we can be more confident that not only Envoy is ready, but Istio itself is too. Most likely this would mean deprecating ENVOY_ADMIN_API as the name of the variable wouldn't make sense.

Contribution guidelines

Hi folks,

Do you have a contribution guideline for things like:

How to create a PR for this repo
Contact point for maintainers
...etc

Thanks!

Dynamically determine whether istio-proxy sidecar is present before blocking to wait for it to start

I have a use case that may be pertinent for others interested in using scuttle for their project. My use case revolves around software that wants to use scuttle conditionally, depending on whether Istio sidecar injection is installed or not, without the software (Kubernetes resources, Helm chart, etc.) having any knowledge about Istio being present or not.

The Kubernetes workload would be statically configured with the appropriate scuttle ENV vars (such as ENVOY_ADMIN_API and ISTIO_QUIT_API).
scuttle would conditionally enforce waiting for Envoy to start based on heuristics where scuttle would inspect its own Pod to determine if istio-proxy is present.
- If the istio-proxy container is present in the Pod, scuttle will block and wait for ENVOY_ADMIN_API to respond, per its current behavior.
- If the istio-proxy container is not present in the Pod, scuttle will effectively disable itself, as if the ENV vars were never provided, per its current behavior.

This I feel makes scuttle more robust in environments that cannot dynamically configure their ENV vars or command arguments based on the pre-determined knowledge that Istio is present or not, and yet need a solution to properly wait for the istio-proxy sidecar to be up and ready to go before they begin network activity.

I hope this makes sense. This could be a good first contribution by someone (like myself), and this heuristic check can itself be conditional for starters. It will involve importing the client-go Kubernetes client and doing intra-cluster inspection of its own Pod. Technically the shareProcessNamespace would allow for a solution without a kube-apiserver client, but I think it's more elegant to inspect the workload metadata versus the OS-level namespace.

BONUS: This same heuristic could be used to intelligently enable/disable a default value for ENVOY_ADMIN_API and ISTIO_QUIT_API. As noted in those two issues (#9 and #12), it is a concern that having those variables default to a value will make it unpleasant to run the container in a testing or development fashion. Intraspecting the Kubernetes workload (if Kubernetes even exists, and the istio-proxy sidecar exists) to flip on the default value might be a good approach to the problem.

Thank you

For open sourcing this project and providing a blog post about how to use it! It's very appreciated! ❤️

Segfault when ISTIO_QUIT_API fails to respond

This is with Scuttle v1.2.1 (the latest as of this writing).

https://github.com/redboxllc/scuttle/blob/master/main.go#L135

scuttle: Stopping Istio using Istio API 'http://127.0.0.1:15020' (intended for Istio >v1.2)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x7c2c07]

goroutine 1 [running]:
main.killIstioWithAPI()
	/home/runner/work/scuttle/scuttle/main.go:135 +0x287
main.kill(0x0)
	/home/runner/work/scuttle/scuttle/main.go:110 +0xcc
main.main()
	/home/runner/work/scuttle/scuttle/main.go:84 +0x31f

Attempting to dereference resp which is nil.

Remove Istio pkill command for Istio <1.3

Istio 1.3 introduced the /quitquitquit endpoint for triggering an Istio shutdown. Prior to that a pkill command had to be used. Istio 1.3 is now end of life, I think we can remove the pkill shutdown feature as all supported version of Istio have the better option of using the quit endpoint.

Add timestamps to scuttle logging

When something goes wrong with scuttle, Istio or the underlying executable being run you can get logs like:

scuttle: Logging is now enabled
scuttle: Blocking until envoy starts
scuttle: Blocking finished, envoy has started
scuttle: Received exit signal, exiting

Timestamps would help understand what happened with logs like this. In the above it's not possible to determine if scuttle was hung, for how long it ran, etc.

Scuttle doesn't use a real logging package/framework right now, it may be better long term to investigate what logging options exist in the go ecosystem.

Verify Scuttle works with Istio 1.5

Istio 1.5 comes with a lot of changes, particularly around it's architecture and the merging of several services into one binary.

It should be verified that Scuttle is compatible with Istio 1.5, and if not any bugs/changes should be raised as new GitHub issues.

redboxllc / scuttle Goto Github PK

scuttle's People

Contributors

Stargazers

Watchers

Forkers

scuttle's Issues

Recommend Projects

Recommend Topics

Recommend Org