openfaas / of-watchdog Goto Github PK

View Code? Open in Web Editor NEW

257.0 6.0 114.0 4.61 MB

Reverse proxy for STDIO and HTTP microservices

License: MIT License

Go 95.98% Shell 0.80% Dockerfile 0.32% Makefile 2.89%

watchdog stdout faas sidecar lambda kubernetes functions stdio http api

of-watchdog's Introduction

of-watchdog

Reverse proxy for HTTP microservices and STDIO

The of-watchdog implements an HTTP server listening on port 8080, and acts as a reverse proxy for running functions and microservices. It can be used independently, or as the entrypoint for a container with OpenFaaS.

This version of the OpenFaaS watchdog adds support for HTTP proxying as well as STDIO, which enables reuse of memory and very fast serving of requests. It does not aim to replace the Classic Watchdog, but offers another option for those who need these features.

A download is made via GitHub releases, but the watchdog is meant to be copied from the container image published to ghcr.io in a multi-stage build:

FROM --platform=${TARGETPLATFORM:-linux/amd64} ghcr.io/openfaas/of-watchdog:0.9.11 as watchdog
FROM --platform=${TARGETPLATFORM:-linux/amd64} node:18-alpine as ship

COPY --from=watchdog /fwatchdog /usr/bin/fwatchdog

See example templates

Goals

Keep function process warm for lower latency / caching / persistent connections through using HTTP
Enable streaming of large responses from functions, beyond the RAM or disk capacity of the container
Cleaner abstractions for each "mode"

Modes

There are several modes available for the of-watchdog which changes how it interacts with your microservice or function code.

A comparison of three watchdog modes. Top left - Classic Watchdog, top right: afterburn (deprecated), bottom left HTTP mode from of-watchdog.

HTTP mode - the default and most efficient option all template authors should consider this option if the target language has a HTTP server implementation.
Serializing mode - for when a HTTP server implementation doesn't exist, STDIO is read into memory then sent into a forked process.
Streaming mode - as per serializing mode, however the request and response are both streamed instead of being buffered completely into memory before the function starts running.

API

Private endpoints, served by watchdog:

/_/health - returns true when the process is started, or if a lock file is in use, when that file exists.
/_/ready - as per /_/health, but if max_inflight is configured to a non-zero value, and the maximum number of connections is met, it will return a 429 status

Any other HTTP requests:

/* any other Path and HTTP verbs are sent to the function

1. HTTP (mode=http)

1.1 Status

HTTP mode is recommend for all templates where the target language has a HTTP server implementation available.

See a few different examples of templates, more are available via faas-cli template store list, such as golang-middleware, python3-http and node*.

To get the repository for a specific template use faas-cli template store describe NAME.

1.2 Description

A process is forked when the watchdog starts, we then forward any request incoming to the watchdog to a HTTP port within the container.

Pros:

Fastest option for high concurrency and throughput
More efficient concurrency and RAM usage vs. forking model
Database connections can be persisted for the lifetime of the container
Files or models can be fetched and stored in /tmp/ as a one-off initialization task and used for all requests after that
Does not require new/custom client libraries like afterburn but makes use of a long-running daemon such as Express.js for Node or Flask for Python

Example usage for testing:

Forward to an Nginx container:

$ go build ; mode=http port=8081 fprocess="docker run -p 80:80 --name nginx -t nginx" upstream_url=http://127.0.0.1:80 ./of-watchdog

Forward to a Node.js / Express.js hello-world app:

$ go build ; mode=http port=8081 fprocess="node expressjs-hello-world.js" upstream_url=http://127.0.0.1:3000 ./of-watchdog

Cons:

One more HTTP hop in the chain between the client and the function
Daemons such as express/flask/sinatra can be unpredictable when used in this way so many need additional configuration
Additional memory may be occupied between invocations vs. forking model

1.3 Structured logging

It is not currently possible to have the watchdog's own messages outputted in JSON:

2024/04/25 17:29:06 Listening on port: 8080
2024/04/25 17:29:06 Writing lock-file to: /tmp/.lock
2024/04/25 17:29:06 Metrics listening on port: 8081
2024/04/25 17:29:08 GET / - 301 Moved Permanently - ContentLength: 39B (0.0049s) [test]

However, you can write your own log lines in JSON. Just set the prefix_logs environment variable to false, to remove the default prefix that the watchdog emits otherwise.

With prefix_logs on:

2024-04-24T21:00:04Z {"msg": "unable to connect to database"}

With prefix_logs off:

{"msg": "unable to connect to database"}

1.4 Tracing / correlation IDs

The gateway sends an X-Call-Id header which should be used in your own logger to correlate and trace requests.

In HTTP mode, the watchdog will append the X-Call-Id to its own HTTP log messages in square brackets if you set the log_callid environment variable to true:

2024/04/25 17:29:58 GET / - 301 Moved Permanently - ContentLength: 39B (0.0037s) [079d9ff9-d7b7-4e37-b195-5ad520e6f797]

2. Serializing fork (mode=serializing)

2.1 Status

This mode is designed to replicate the behaviour of the original watchdog for backwards compatibility.

2.2 Description

Forks one process per request. Multi-threaded. Ideal for retro-fitting a CGI application handler i.e. for Flask.

Limited to processing files sized as per available memory.

Reads entire request into memory from the HTTP request. At this point we serialize or modify if required. That is then written into the stdin pipe.

Stdout pipe is read into memory and then serialized or modified if necessary before being written back to the HTTP response.
A static Content-type can be set ahead of time.
HTTP headers can be set even after executing the function (not implemented).
Exec timeout: supported.

3. Streaming fork (mode=streaming) - default.

Forks a process per request and can deal with a request body larger than memory capacity - i.e. 512mb VM can process multiple GB of video.

HTTP headers cannot be sent after function starts executing due to input/output being hooked-up directly to response for streaming efficiencies. Response code is always 200 unless there is an issue forking the process. An error mid-flight will have to be picked up on the client. Multi-threaded.

Input is sent back to client as soon as it's printed to stdout by the executing process.
A static Content-type can be set ahead of time.
Exec timeout: supported.

4. Static (mode=static)

This mode starts an HTTP file server for serving static content found at the directory specified by static_path.

See an example in the Hugo blog post.

Metrics

Name	Description	Type
http_requests_total	Total number of requests	Counter
http_request_duration_seconds	Duration of requests	Histogram
http_requests_in_flight	Number of requests in-flight	Gauge

Configuration

Environmental variables:

Note: timeouts should be specified as Golang durations i.e. 1m or 20s.

Option	Usage
`buffer_http`	(Deprecated) Alias for `http_buffer_req_body`, will be removed in future version
`content_type`	Force a specific Content-Type response for all responses - only in forking/serializing modes.
`exec_timeout`	Exec timeout for process exec'd for each incoming request (in seconds). Disabled if set to 0.
`fprocess` / `function_process`	Process to execute a server in `http` mode or to be executed for each request in the other modes. For non `http` mode the process must accept input via STDIN and print output via STDOUT. Also known as "function process".
`healthcheck_interval`	Interval (in seconds) for HTTP healthcheck by container orchestrator i.e. kubelet. Used for graceful shutdowns.
`http_buffer_req_body`	`http` mode only - buffers request body in memory before forwarding upstream to your template's `upstream_url`. Use if your upstream HTTP server does not accept `Transfer-Encoding: chunked`, for example WSGI tends to require this setting. Default: `false`
`http_upstream_url`	`http` mode only - where to forward requests i.e. `http://127.0.0.1:5000`
`jwt_auth`	For OpenFaaS for Enterprises customers only. When set to `true`, the watchdog will require a JWT token to be passed as a Bearer token in the Authorization header. This token can only be obtained through the OpenFaaS gateway using a token exchange using the `http://gateway.openfaas:8080` address as the authority.
`jwt_auth_debug`	Print out debug messages from the JWT authentication process (OpenFaaS for Enterprises only).
`jwt_auth_local`	When set to `true`, the watchdog will attempt to validate the JWT token using a port-forwarded or local gateway running at `http://127.0.0.1:8080` instead of attempting to reach it via an in-cluster service name (OpenFaaS for Enterprises only).
`log_buffer_size`	The amount of bytes to read from stderr/stdout for log lines. When exceeded, the user will see an "bufio.Scanner: token too long" error. The default value is `bufio.MaxScanTokenSize`
`log_call_id`	In HTTP mode, when printing a response code, content-length and timing, include the X-Call-Id header at the end of the line in brackets i.e. `[079d9ff9-d7b7-4e37-b195-5ad520e6f797]` or `[none]` when it's empty. Default: `false`
`max_inflight`	Limit the maximum number of requests in flight, and return a HTTP status 429 when exceeded
`mode`	The mode which of-watchdog operates in, Default `streaming` see doc. Options are http, serialising fork, streaming fork, static
`port`	Specify an alternative TCP port for testing. Default: `8080`
`prefix_logs`	When set to `true` the watchdog will add a prefix of "Date Time" + "stderr/stdout" to every line read from the function process. Default `true`
`read_timeout`	HTTP timeout for reading the payload from the client caller (in seconds)
`ready_path`	When non-empty, requests to `/_/ready` will invoke the function handler with this path. This can be used to provide custom readiness logic. When `max_inflight` is set, the concurrency limit is checked first before proxying the request to the function.
`static_path`	Absolute or relative path to the directory that will be served if `mode="static"`
`suppress_lock`	When set to `false` the watchdog will attempt to write a lockfile to `/tmp/.lock` for healthchecks. Default `false`
`upstream_url`	Alias for `http_upstream_url`
`write_timeout`	HTTP timeout for writing a response body from your function (in seconds)

Unsupported options from the Classic Watchdog:

Option	Usage
`write_debug`	In the classic watchdog, this prints the response body out to the console
`read_debug`	In the classic watchdog, this prints the request body out to the console
`combined_output`	In the classic watchdog, this returns STDOUT and STDERR in the function's HTTP response, when off it only returns STDOUT and prints STDERR to the logs of the watchdog

of-watchdog's People

Contributors

Stargazers

Watchers

Forkers

imikushin kayila gkuchta viveksyngh ericstoekl wooramkang shaunwarman wirelineio dmrub ivanayov ewilde carol8421 thomasjpfan morrislaw svanellewee sulthonzh cody3337 rgee0 omrishtam windofthesky willaaam ryskiwt brandonkal hc-chien enjoywall pnetwork chucklay sargun pmenglund matipan lreimer dfquaresma u5surf lianix boogermann aklyachkin kwojcicki cconger lucasroesler k9nd0 yuanbing sriveros95 timotto scraymondjr paulofelipefeitosa jahentao waterdrips qolzam traversals-analytics-and-intelligence imega hilariocoelho adevinta jonyhy96 mprinc yacineb asankaran byzanteam kylinxiang70 everesio lihopentium laashub-soa itsksaurabh conan-k-chang mosaic141688 junneyang muchcontact anthonychu mjiuming ngduchai studiogangster clix-dev-llc marvel-works pmbanugo juandisay developer-guy p-fortier dfchong mslaga jcollie suyi32 zzhou612 igaisin devdoshi my-cloud-native phip123 yanghaku zhiqunwang welteki suttod cmacq2 shikachuu ch4nn0n swibon96 ufms-lab suryatmodulus blinkbear pangubase zyfyy omkarprabhu-98 yanyinglin

of-watchdog's Issues

HTTP Path is not passed in to handler in http mode

Expected Behaviour

Paths should be forwarded to the upstream function.

Current Behaviour

I tried to run a SimpleHTTPServer in Python and noticed the file-browser wasn't passing through the path.

Possible Solution

Update the proxying code to pass the requestURI

Steps to Reproduce (for bugs)

port=8081 mode=http fprocess="python -m SimpleHTTPServer" upstream_url="http://127.0.0.1:8000" ./of-watchdog

Browse into a directory and see it doesn't work

Context

Running as a Go binary outside of Docker.

We should be careful to test this change since the templates only bind to a single path anyway of /

https://github.com/openfaas-incubator/node8-express-template/blob/master/template/node8-express/index.js#L83

Serializing mode sends nothing in stdin

How serializing works ?

Expected Behaviour

Hi all, I use serializing mode, and I get nothing in STDIN (while streaming works).
Overall, how to use serializing mode ? I want to manipulate the stdin request and set status code, body, headers ... inside my handler, and send out to the stdout. I think it's the best suited mode, but i can get nothing.

Current Behaviour

POST http://openfaas_gateway:8000/function/name
{"foo": "bar"}

With a PHP script :

<?php

$stdin = file_get_contents("php://stdin");

// $stdin is empty ...

fwrite(STDOUT, $stdout); 

// How to write a HTTP response correctly with stdout ?

Steps to Reproduce (for bugs)

This Dockerfile with the script above.

FROM php:alpine
RUN apk add --no-cache git curl
RUN curl -sSLf https://github.com/openfaas-incubator/of-watchdog/releases/download/0.4.6/of-watchdog > /usr/bin/fwatchdog && chmod +x /usr/bin/fwatchdog
COPY index.php /usr/src/function
WORKDIR /usr/src/function
ENV function_process="php index.php"
ENV mode="serializing"
HEALTHCHECK --interval=3s CMD [ -e /tmp/.lock ] || exit 1
CMD ["fwatchdog"]

Context

I want to manipulate the stdin request and set status code, body, headers ... inside my function with stdout.

Your Environment

Docker 18.09.1
Docker swarm
macOs
PHP 7.3

Update Docker build layer to go 1.11.13

Per the security announcement https://groups.google.com/forum/#!topic/golang-announce/65QixT3tcmg we should update the docker build layers to at least Go 1.11.13

Possible Solution

The builder layer in the Dockerfile should use golang:1.11

Update Dockerfile and rebuild/release

openfaas/faas#1291
openfaas/templates#170
openfaas/faas-netes#494
openfaas/nats-queue-worker#66
#78
https://github.com/openfaas-incubator/faas-idler/issues/32
openfaas/golang-http-template#28
openfaas-incubator/faas-federation#4
openfaas-incubator/vcenter-connector#27
openfaas-incubator/faas-memory#3
openfaas-incubator/faas-rancher#8

Only Rebuild

openfaas/faas-swarm#56
openfaas/ingress-operator#10
https://github.com/openfaas-incubator/openfaas-operator/issues/87

Support for structured logging

Expected Behaviour

In order to integrate with my org's log aggregator I need to provide pure JSON logs from all deployed apps. That means that the logs will need to be parsed as a JSON as such:

{"level":"info","ts":"2019-09-05T15:33:26.302Z","caller":"function/file.go:64","msg":"Parsing the payload from the handler", taskId: 14}
...
{"ts":"2019-09-05T15:35:35.212Z","method":"POST","path":"/","status":"200 OK","ContentLength": 121}

Current Behaviour

The current behavior is that there is a prefix for the logs coming from the function
The log of the function response status is unstructured.

2019/09/05 15:33:26 stdout: {"level":"info","ts":"2019-09-05T15:33:26.302Z","caller":"function/file.go:64","msg":"Parsing the payload from the handler", taskId: 14}

2019/09/05 15:33:26 POST / - 200 OK - ContentLength: 121

Possible Solution

I'm raising two issues here and I don't think they should have the same solution. So let's tackle one issue at a time:

Removing all the prefix from the log (including the timestamp)

This could be achieved either by simply using fmt.Println instead of log, or by setting the flags on the current logger to: log.SetFlags(0).

This could be optional by passing setting an environment variable like disable_logger_prefix

Formatted printing for the function's response status

We need to have an option to print the response status in a formatted log, adding a structured JSON log can be simply enough by either mocking the JSON format with Printf so it can be performant. Or importing once of the available loggers which will be able to print the JSON format.

Again this should be optional and set by an environment variable like logger_format.

Context

We are aggregating all the logs across all of our products into a log aggregator to be indexed and made queryable. At the moment, watchdog uses the default logger of golang which doesn't support a JSON structure and adds a timestamp prefix. That means we aren't able to use the watchdog as it is now, and we will need to fork the project and make the necessary adjustments

Error deploying node8-express template to raspberrypi

I tried to create sample express app from :

https://github.com/openfaas-incubator/node8-express-template

But I see the following error in logs after deployment. I am new to docker/openfaas so sincere apologies if this is due to some setup issue on my end.

faas-node-express.1.dp85hxbcd6i3@pi03    | Forking - node [index.js]
faas-node-express.1.dp85hxbcd6i3@pi03    | 2018/06/07 03:54:17 Started logging stderr from function.
faas-node-express.1.dp85hxbcd6i3@pi03    | 2018/06/07 03:54:17 Started logging stdout from function.
faas-node-express.1.dp85hxbcd6i3@pi03    | 2018/06/07 03:54:17 Error reading stdout: EOF

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ):18.05.0-ce
Are you using Docker Swarm or Kubernetes (FaaS-netes)? Docker Swarm
Operating System and version (e.g. Linux, Windows, MacOS): Raspian Lite
Link to your project or a code example to reproduce issue:

Add a new `static` mode for serving static content

Expected Behaviour

Template creators would specify a new mode called static that would allow them to serve static content that they specified over http.

Current Behaviour

Users cannot use the watchdog for serving static content. They would have to create their own static server or use an existing solution like nginx.

Possible Solution

We would have a new mode called static and a new variable called publish, publish would have the relative path to the directory that the user wants to serve.

Something that we could add in the future is analytics so that users can now which blog posts or page was the most visited.

Steps to Reproduce (for bugs)

Context

While creating a template for static sites I was not able to reuse the watchdog.

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ):
Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Operating System and version (e.g. Linux, Windows, MacOS):
Link to your project or a code example to reproduce issue:

Support request on scaling from zero

If you invoke a function (use of-watchdog HTTP mode) after it's been scaled to zero, a new function pod has been successfully created but get Server returned unexpected status code: 500 - result.

If you look into the function pod log below:

Forking - node [bootstrap.js]
2020/03/10 11:23:19 Started logging stderr from function.
2020/03/10 11:23:19 Started logging stdout from function.
2020/03/10 11:23:19 OperationalMode: http
2020/03/10 11:23:19 Timeouts: read: 1m5s, write: 1m5s hard: 1m0s.
2020/03/10 11:23:19 Listening on port: 8080
2020/03/10 11:23:19 Writing lock-file to: /tmp/.lock
2020/03/10 11:23:19 Metrics listening on port: 8081
2020/03/10 11:23:21 Upstream HTTP request error: Post http://127.0.0.1:3000/: dial tcp 127.0.0.1:3000: connect: connection refused
2020/03/10 11:23:27 stdout: OpenFaaS Node.js listening on port: 3000

you can tell of-watch throw an Upstream HTTP request error 6 seconds before the express Node.js server started to listen at port 3000.

Expected Behaviour

When invoking a function that has been scaled to zero, the function needs to be successfully executed and return the correct response.

Extra response time due to the pod initialisation is acceptable.

Current Behaviour

When invoking a function that has been scaled to zero, a Server returned unexpected status code: 500 - response is returned.

Possible Solution

Possible Solution 1:

Make of-watch re-try the connection to the internal HTTP server (under HTTP mode) during a certain timeframe. During the timeframe, the of-watch should not throw an Upstream HTTP request error

Possible Solution 2:

Implement a readiness probe in addition to current liveness Probe (that serve at /_/health). Before the internal HTTP server is up, the readiness probe should not return 200 status code.

Steps to Reproduce (for bugs)

Turn on the scale to zero feature of faas-idler and deploy the function with com.openfaas.scale.zero=true label
Wait 30 mins and make sure the function pod is terminated by the faas-idler
invoke the function and watch the result

Context

This issue makes scale to zero feature unusable as the first request after scaled to zero will always fail.

Your Environment

CLI:
 commit:  ea687659ecf14931a29be46c4d2866899d36c282
 version: 0.11.8

Gateway
 uri:     http://127.0.0.1:8080
 version: 0.18.10
 sha:     80b6976c106370a7081b2f8e9099a6ea9638e1f3
 commit:  Update Golang versions to 1.12


Provider
 name:          openfaas-operator
 orchestration: kubernetes
 version:       0.14.1 
 sha:           e747b6ace86bc54184d899fa10cf46dada331af1

Suggestion: Implement Go linting

Brief summary including motivation/context:

This is more of a meta-issue. I'm curious as to whether there's any interest in adding a linter to the project. I tested golangci-lint, and it runs in 1.685 seconds. I used the following config:

linters:
  enable:
    - golint
    - gosec
    - interfacer
    - unconvert
    - dupl
    - goconst
    - gocyclo
    - goimports
    - misspell
    - scopelint
    - gofmt

It showed the following output. Although most of these lint identifications aren't super important, some of them might catch places where the code could be simpler, or where the code might be ambiguous.

config/config_test.go:164:32: Using the variable on range scope `testCase` in function literal (scopelint)
			actual, err := New([]string{testCase.env})
			                            ^
config/config_test.go:170:18: Using the variable on range scope `testCase` in function literal (scopelint)
			if process != testCase.wantProcess {
			              ^
config/config_test.go:171:42: Using the variable on range scope `testCase` in function literal (scopelint)
				t.Errorf("Want process %v, got: %v", testCase.wantProcess, process)
				                                     ^
executor/afterburn_runner.go:96:10: Error return value of `w.Write` is not checked (errcheck)
		w.Write(bodyBytes)
		       ^
executor/http_runner.go:94:21: Error return value of `cmd.Process.Signal` is not checked (errcheck)
		cmd.Process.Signal(syscall.SIGTERM)
		                  ^
executor/http_runner.go:187:10: Error return value of `w.Write` is not checked (errcheck)
		w.Write(bodyBytes)
		       ^
executor/serializing_fork_runner.go:26:10: Error return value of `w.Write` is not checked (errcheck)
		w.Write([]byte(err.Error()))
		       ^
main.go:168:23: Error return value of `functionInvoker.Start` is not checked (errcheck)
	functionInvoker.Start()
	                     ^
main.go:298:23: Error return value of `functionInvoker.Start` is not checked (errcheck)
	functionInvoker.Start()
	                     ^
config/config.go:130:2: should use 'return <expr>' instead of 'if <expr> { return <bool> }; return <bool>' (gosimple)
	if env[key] == "true" {
	^
executor/http_runner.go:155:3: should use a simple channel send/receive instead of select with a single case (gosimple)
		select {
		^
main.go:31:26: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
		fmt.Fprintf(os.Stderr, configErr.Error())
		                       ^
main.go:108:5: should omit comparison to bool constant, can be simplified to !suppressLock (gosimple)
	if suppressLock == false {
	   ^
main.go:129:3: redundant break statement (gosimple)
		break
		^
main.go:132:3: redundant break statement (gosimple)
		break
		^
main.go:135:3: redundant break statement (gosimple)
		break
		^
main.go:258:2: should merge variable declaration with assignment on next line (gosimple)
	var envs []string
	^
main.go:335:55: should omit comparison to bool constant, can be simplified to !lockFilePresent() (gosimple)
			if atomic.LoadInt32(&acceptingConnections) == 0 || lockFilePresent() == false {

Any design changes

In the Dockerfile where we run the tests, we would have to install the linter, and run the linters.

Pros + Cons

Pros

It makes it easier to avoid "dumb" code issues
It simplifies the code

Cons

It takes time to run
Sometimes linters can be annoying

Effort required up front

It would require that we take the current repo, fix, or ignore, the linter issues, and then add a linter configuration, as well as configuration to invoke the linter at CI time.

Effort required for CI/CD, release, ongoing maintenance

Basically keeping up to date the linter, and the linter configuration.

Migration strategy / backwards-compatibility

See the effort required up front.

Parse integer, non-Duration timeout values

Expected Behaviour

The classic watchdog supports parsing integer values i.e. 60 and interpreting that as 60s whenever a non-golang duration is given. of-watchdog should do that too, but @bmcstdio mentioned he didn't see that.

Current Behaviour

Only supporting Golang duration.

https://github.com/openfaas-incubator/of-watchdog/blob/master/config/config.go#L118-L127

Possible Solution

Update the parsing code to use the following:

https://github.com/openfaas/faas/blob/master/watchdog/readconfig.go#L31

Steps to Reproduce (for bugs)

Create a new function using of-watchdog, i.e. node10-express
Set the timeout value without s i.e. 5
Observe this being interpreted as something like 5ms when the user may expect it to be 5s
Update the code, and test again.

Context

openfaas/golang-http-template#22 (comment)

[HTTP] Handle forked process exit

When the forked process exist, the watchdog could log the exit code and release the lock file to trigger a container restart.

Expected Behaviour

If the function process exists, the watchdog should log the exit code/stdio and release the lock file.

Current Behaviour

Currently the watchdog only logs that it couldn't read the stdio.

write_debug isnt used

From my experimenting with the of-watchdog and looking at the code it seems the write_debug env variable is currently not used .

Expected Behaviour

Either write_debug is removed from the docs or it is implemented as specified by the README.MD

Current Behaviour

Possible Solution

Temporarily remove write_debug from the docs until it is implemented.

Steps to Reproduce (for bugs)

Context

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ):
Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Operating System and version (e.g. Linux, Windows, MacOS):
Link to your project or a code example to reproduce issue:

Add Lambda Custom Runtime mode

Expected Behaviour

The of-watchdog has several modes including http, streaming and a classic mode which simulates the classic watchdog.

I carried out a PoC that showed we can support functions directly from AWS Lambda using the new custom runtime and published the first solution of its kind on GitHub. The code was just a PoC and needs some refactoring to be hardened.

https://github.com/alexellis/lambda-on-openfaas-poc

I was able to show that given a file-system created by the docker-lambda project we can take code written for a custom runtime on AWS Lambda and run it on OpenFaaS with Kubernetes or any of the other back-ends.

Rather than using channels on their own, for synchronization we should also use a SyncMap or a Map with a RWMutex as demonstrated in my inlets project: https://github.com/alexellis/inlets/compare/sync

I'm looking for a volunteer to add this mode to the of-watchdog and port across the example shown in the repo alexellis/lambda-on-openfaas-poc. This will then allow us to create a template and base Docker images to allow people to run their AWS Lambda Node/Python/Go etc projects on OpenFaaS.

Alex

Override to disable chunked encoding in HTTP mode

Expected Behaviour

We could offer an env-var override to disable chunked encoding in HTTP mode. The side-effect is that we would have to cache the request in memory before proxying it to the upstream_url - but this also adds greater compatibility with frameworks like PHP Swoole.

Current Behaviour

PHP Swoole for instance cannot support transfer-encoding of chunked.

Possible Solution

If an env-var is set then buffer the response to gauge the length and then forward on to the upstream_url.

Steps to Reproduce (for bugs)

Try to do a POST to a function using this template -> https://github.com/alexellis/php7-swoole-template
Then exec into the function and do a post with an explicit content-length set (not chunked) and it will work

Make mode=serializing be fully backwards compatible with the original watchdog

For of-watchdog to take the place of the original watchdog, it should be fully backwards compatible with the original watchdog.

Expected Behavior

Support environmental variables cgi_headers, marshal_request, and combine_output with their corresponding feature.

Current Behavior

The above features are not fully supported.

Possible Solution

Migrate over the tests and code from the original watchdog.

Context

Maintaining two versions of watchdog doubles the amount of work one needs to do. Once of-watchdog in serializing mode becomes backwards compatible, it can replace the original watchdog while the other modes can get worked on.

Add one-shot mode

Expected Behaviour

To enable the "batch job" use-case, users should be able to specify a "one shot" mode or parameter. This would allow unlimited requests to /healthz and /metrics, but only a single request to /, after which it would shutdown the binary process.

This is partially to work around limitations in Kubernetes jobs with daemons, web-servers and side-cars which keep the job in a "running" status.

kubernetes/kubernetes#25908
kubernetes/enhancements#753

Argo workflows does appear to work in "sidecar" mode without any additional changes to the watchdog, but I suspect building on Kubernetes Jobs would be cleaner from a dependencies point of view.

Example with figlet container:

https://twitter.com/alexellisuk/status/1148239010034311169

Example in Argo docs on sidecars:

https://github.com/argoproj/argo/blob/master/examples/README.md#sidecars

Information about where the http server is listen and serve on

Give information where the http server listen on

Expected Behaviour

It could be nice to see which addr/port the http server is listen on

Current Behaviour

You need to know that the port 8080 is default, and in binds to 0.0.0.0.

Possible Solution

Give information before binding the http server or when it actually is bind

[HTTP] Handle forked process timeout

When the forked process doesn't respond to http calls, the watchdog could release the lock file to trigger a container restart.

Expected Behaviour

If the function http server timeouts, the watchdog should log the timeout exception and release the lock file.
Maybe implement a retry mechanism and exit after several timeouts?

Current Behaviour

Currently the watchdog only logs the http error without detection a timeout.

Suggestion: Make default timeouts clearer

Hey all,

In the documentation, read_timeout, write_timeout, and exec_timeout are explained, but their defaults are not mentioned. This can be a huge headache while troubleshooting as the error raised by a timeout doesn't always specify that a timeout is to blame.

Expected Behaviour

Mention that the default timeouts are "10s"

Current Behaviour

Pretty self-explanatory.

Context

In my case, I had a faas function that took ~10s to execute. I had set the exec_timeout but not the read_timeout or write_timeout. My function was attempting to return, but the caller was receiving an unexplained 502 error (honestly still not sure why I was getting a 502 instead of a 408, that one might be on faas, not watchdog). It took a while to realize that a timeout might be to blame.

Implement graceful shutdown from classic watchdog

See changes and notes on this commit:

openfaas/faas@de2c74f

This should be largely a copy/paste exercise but will require testing.

Is of-watchdog ready to merge into OpenFaaS?

Hi OpenFaas, thanks a lot for this amazing project.
May I ask whether you have any plan about when to merge of-watchdog into faas?

Better Support for Structured Logging

Expected Behaviour

Structured Logging messages should not be split between lines. I want my message to appear:

stdout: {"level":"info", "msg":"No module named my-module", "pipe":"stderr", "time":"2019-08-08T01:47:55Z", "context": "this is a contrived long message of more than 256 bytes", "invoker": "cconger", "extra bits": ["there", "are", "so", "many", "things", "on", "this", "error"]}

Current Behaviour

Currently the behavior is to slurp 256 bytes per log line. This can cause long messages like the one above to be split:

stdout: {"level":"info", "msg":"No module named my-module", "pipe":"stderr","time":"2019-08-08T01:47:55Z",  "context": "this is a contrived long message of more than 256 bytes", "invoker": "cconger", "extra bits": ["there", "are", "so", "many", "things", "on

stdout: ", "this", "error"]}

Possible Solution

Use the token Scanner for newlines from bufio instead of a fixed buffer size.

Context

My functions use a structured logging library to have rich structured logging, however due to the log lines being split by the watchdog it becomes tricky to parse the properly when they exceed 256 bytes.

Support streaming multipart/formdata with of-watchdog

Expected Behaviour

It should be possible to post a multipart request to the watchdog and work with the parts in such a way that Content-Disposition header information (such as the filename) is available to the function.

If there is only one part, feed it to stdin.

If possible, allow to access multiple parts by their names as stream, too.

Make content-disposition information available to the function similar to the usual request headers.
(The Content-Disposition header is defined as a response header only, but it may occur in multipart/formdata for requests).

Current Behaviour

There is no special support for multipart requests.

Possible Solution

Context

The related issue openfaas/faas#344 asked for multipart support by means of a json object containing all base64 encoded parts, that might not be ideal in terms of memory requirements for big multipart requests.
In openfaas/faas#345 it was concluded that support for multipart should go into of-watchdog

Original HTTP Host header is not passed to the handler.

Expected Behaviour

Host header should be forwarded to the upstream function.

Current Behaviour

Since Host header is not a part of request.Header field it is not copied by the copyHeaders function.

Possible Solution

Do request.Host = r.Host before copyHeaders(request.Header, &r.Header) in http_runner.go

Steps to Reproduce (for bugs)

Context

We need to construct URI links that depends on the Host header. This information is lost when function is called. Also seems that the same modification need to be done for the gateway as well.

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ):
17.12.1-ce
Are you using Docker Swarm or Kubernetes (FaaS-netes)?
FaaS-netes
Operating System and version (e.g. Linux, Windows, MacOS):
CentOS 7
Link to your project or a code example to reproduce issue:

[Feature Request] Static Mode With Catch-all Capability

This issue is a feature request asking to add a catch-all capability to the static mode.
The static server should redirect every possible url pointed to the function to the root path.
Example: http:127.0.0.1:8080/function/hello/subpath -> http:127.0.0.1:8080/function/hello

In Nginx, this is accomplished with the try_files directive.
https://docs.nginx.com/nginx/admin-guide/web-server/serving-static-content/#trying-several-options

Current Behaviour

Currently, requesting any function subpath, i.e http:127.0.0.1:8080/function/hello/subpath, will return a 404 error.

Context

This capability is mandatory in the context of Single Page Applications, like React.js apps.
Using Hash History adding # in the URL brings other issues and is not a valid solution for professional modern web applications.
A Single Page Application technically only have a single index.html, and a server should redirect every possible url pointed to the domain at this index.html located at the root.

In Nginx, we implement a catch-all rule for SPA like so:

location / {
    try_files  $uri  $uri/  index.html;
}

Link to your project or a code example to reproduce issue

Minimal React.js Single Page Application packaged for OpenFaaS with the dockerfile template and the watchdog in static mode: https://github.com/Janaka-Steph/react-spa-openfaas

Add RED metrics

As an operator I want to enable HPAv2 auto-scaling with custom Pod metrics on Kubernetes.

Task / testing

Syncing the patch from:

openfaas/faas#1151

Publish Docker multi-arch images

Expected Behaviour

We should publish a Docker image for each binary we produce and then create a multi-arch manifest to collect them all together.

This gives a speed boost, but also makes the Dockerfiles easier to manage and update.

Current Behaviour

Download via curl in each Dockerfile

Possible Solution

See how this was done in the openfaas/faas project by myself and @rgee0 and how it was subsequently applied to the templates in openfaas/templates.

Update CI:

Enable experimental Docker CLI
Create a Docker image for each binary from scratch
Push those images
Create the multi-arch (manifest) image and push it

Update of-watchdog tempaltes

Send PRs to change from getting of-watchdog from curl to from the new base Docker image.

empty log line

bingLoggingPipe prints empty line (even without timestamp) when printing stdout/stderr from function

Expected Behaviour

I would expect empty lines to be ignored or properly timestamped to keep the line structure (for log collectors)

Current Behaviour

With this in a function handler:

fmt.Println("hello")
fmt.Print(" there")
fmt.Println(", world")

the function container output is

Forking - ./handler []
2019/09/21 16:10:53 OperationalMode: http
2019/09/21 16:10:53 Started logging stderr from function.
2019/09/21 16:10:53 Started logging stdout from function.
2019/09/21 16:10:53 Writing lock-file to: /tmp/.lock
2019/09/21 16:10:53 Metrics server. Port: 8081
2019/09/21 16:12:01 stdout: hello

2019/09/21 16:12:01 stdout:  there
2019/09/21 16:12:01 stdout: , world

(2 blank lines due to \r\n of the Println)

Possible Solution

Right-strip scanner.Text()?

Steps to Reproduce (for bugs)

Provided in Current behaviour

Provide X-Duration-Seconds in HTTP response

Expected Behaviour

Similar behaviour to the classic watchdog:

HTTP/1.1 200 OK
Server: nginx/1.13.12
Date: Wed, 07 Nov 2018 14:37:08 GMT
Content-Type: text/html
Content-Length: 176
Connection: keep-alive
X-Call-Id: 6afdc9b1-ac15-47fa-ba83-3a546ab9193d
X-Duration-Seconds: 0.047277
X-Start-Time: 1541601428214993189
Strict-Transport-Security: max-age=15724800; includeSubDomains

Current Behaviour

HTTP/2 200 
server: nginx/1.13.12
date: Wed, 07 Nov 2018 14:36:22 GMT
content-type: application/cloudevents+json
content-length: 267
x-call-id: 0e8128ca-d51f-4aa1-b2f2-2fdc1c04f615
x-start-time: 1541601382939613945
strict-transport-security: max-age=15724800; includeSubDomains

Context

Consistent experience across classic and of-watchdog. Greater compatibility for users once they migrate to of-watchdog

Enhancement proposal: process controlled HTTP response code and headers

Brief summary including motivation/context

This change is to the streaming mode of of-watchdog to add a named pipe with which the running subprograms can send control messages back to of-watchdog in specify the HTTP response code and headers. This design will be done in such a way so as to not impact the behavior for any called functions which do not make use of the pipe provided. The motivation behind this change is that I need functions to be able to return specific response codes other than 200 for functions which I call.

Any design changes

The of-watchdog process will create a new named pipe when a function is called, then set the environment variable CONTROL_PIPE before calling the sub process. The of-watchdog will then listen on the control pipe file, if it recieves a JSON blob of the following format, it will use it to set the response code and headers before writing the output of the sub command. If no JSON blob is received before the subprocess begins to output on it's standard out, an HTTP response code of 200 will be sent with the first message.

Pros + Cons

Pros: Functions will now be able to return custom response codes and headers per request.
Cons: None.

Effort required up front

A few days of coding time.

Effort required for CI/CD, release, ongoing maintenance

Minimal, if any.

Migration strategy / backwards-compatibility

This change will be 100% backwards compatible with the existing program.

Mock-up screenshots or examples of how the CLI would work

N/A

fprocess cuts out command options which contain '='

Setting a command to fprocess which is containing a '=' character is being cut out after the '=' character.

Expected Behaviour

When setting fprocess="node --max_old_space_size=4096 index.js" it should execute the index.js script with the given --max_old_space_size option set to whatever value is given to it (4096 in this example).

Current Behaviour

When setting fprocess="node --max_old_space_size=4096 index.js" it fails to execute the command and gives the following output when sending a request to the function:

$ fprocess="node --max_old_space_size=4096 index.js" ./of-watchdog
2018/12/28 15:29:40 OperationalMode: streaming
2018/12/28 15:29:40 Writing lock-file to: /tmp/.lock
2018/12/28 15:29:45 Running node
2018/12/28 15:29:45 Started logging stderr from function.
2018/12/28 15:29:45 stderr: Error: missing value for flag --max_old_space_size of type int
Try --help for options
node: bad option: --max_old_space_size

2018/12/28 15:29:45 Error reading stderr: read |0: file already closed
2018/12/28 15:29:45 Took 0.008821 secs
2018/12/28 15:29:45 exit status 9

Cause of the issue

The issue is caused by the mapEnv function in config.go which is called with a slice of strings of the environment variables in key=value form.
It splits the fprocess environmemnt variable with a '=' sign separator to map fprocess as a key to the command as a value.
When the command is containing a '=' sign it splits the string to unknown number of substrings, but the mapped value (the command which is ran by of-watchdog) is only based on the first substring.
This results to fprocess=node --max_old_space_size=4096 index.js being mapped as mapped["fprocess"]="node --max_old_space_size" which causes the error.

Possible Solution

Possible fixes for the mapEnv function in https://github.com/openfaas-incubator/of-watchdog/blob/85505a7210cf413e455f8a03d74ba94d9a9fcd30/config/config.go#L89-L101

1. strings.Join in line 97:

mapped[parts[0]] = strings.Join(parts[1:], "=")

2. strings.SplitN in line 93:

parts := strings.SplitN(val, "=", 2)

Both would result to the mapping correctly as mapped["fprocess"]="node --max_old_space_size=4096 index.js".

3. setting an environment variable for the function (temporal fix):

It's possible to set environment variables in the function's YAML file as described here: https://github.com/openfaas-incubator/node10-express-template
For example node can set NODE_OPTIONS="--max-old-space-size=4096" to use the option, though this may not be suitable for all use cases.

Benchmarks

Benchmark for solution 1 (strings.Join):

func BenchmarkJoin(b *testing.B) {
	val := "fprocess=node --max_old_space_size=4096 index.js"
	parts := strings.Split(val, "=")
	for n := 0; n < b.N; n++ {
		strings.Join(parts[1:], "=")
	}
}

Result:

Running tool: C:\Go\bin\go.exe test -benchmem -run=^$ benchmarks -bench ^(BenchmarkJoin)$

goos: windows
goarch: amd64
pkg: benchmarks
BenchmarkJoin-8   	30000000	        45.8 ns/op	      48 B/op	       1 allocs/op
PASS
ok  	benchmarks	1.557s
Success: Benchmarks passed.

Benchmark for solution 2 (strings.SplitN):

func BenchmarkSplitn(b *testing.B) {
	val := "fprocess=node --max_old_space_size=4096 index.js"
	for n := 0; n < b.N; n++ {
		strings.SplitN(val, "=", 2)
	}
}

Result:

Running tool: C:\Go\bin\go.exe test -benchmem -run=^$ benchmarks -bench ^(BenchmarkSplitn)$

goos: windows
goarch: amd64
pkg: benchmarks
BenchmarkSplitn-8   	20000000	        63.0 ns/op	      32 B/op	       1 allocs/op
PASS
ok  	benchmarks	1.495s
Success: Benchmarks passed.

Overall the use of strings.SplitN (solution 2) to solve this issue seems better.

Steps to Reproduce (for bugs)

git clone https://github.com/openfaas-incubator/of-watchdog.git
go get -u github.com/openfaas-incubator/of-watchdog/config
go get -u github.com/openfaas-incubator/of-watchdog/executor
go build
fprocess="node --max_old_space_size=4096 index.js" ./of-watchdog

Context

I've deployed a nodeJS function based on https://github.com/openfaas-incubator/node10-express-template which required an increase of the maximum memory allocation for the function to run correctly.
I encountered this issue and its temporal fix (solution 3) by setting the --max_old_space_size=4096 option which caused the issue and forced me to set it as an environment variable for the function in the YAML file.

Your Environment

docker version 18.06
I'm using Docker Swarm (FaaS-netes)
Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 16.04, Windows 10 Build 17134

What is of-watchdog used for?

I've been reading through the documentation of OpenFaas, watchdog and of-watchdog, and I was wondering something. I know that watchdog is a lightweight HTTP server that is used to route requests to the function it was deployed with, but I'm wondering why you need it exactly? Can't the function just expose an HTTP server of its own and deploy it by itself?

Please correct me if I'm wrong, but Knative, another K8S serverless framework, doesn't seem to provide any solution similar to watchdog. Why not?

Don't take my question the wrong way, I am just interested in understanding the idea behind this project.

fprocess as daemon start don't work

I build a image base openresty, build script in https://github.com/feifeiiiiiiiiiii/faas_template,
occur fatal error

Expected Behaviour

I think of-watchdog should support fprocess as daemon proccess to service

Current Behaviour

Forking - openresty []
2018/09/12 19:53:51 Started logging stdout from function.
2018/09/12 19:53:51 Started logging stderr from function.
2018/09/12 19:53:51 OperationalMode: http
2018/09/12 19:53:51 Writing lock file at: /var/folders/90/txbmh4fj7qb8p9_7kpzwnx4s43bnfk/T/.lock
2018/09/12 19:53:51 Error reading stdout: EOF

Possible Solution

I think should judge error info, if err is EOF, we should ignore

Steps to Reproduce (for bugs)

try https://github.com/feifeiiiiiiiiiii/faas_template to build openresty image or use docker pull feifeiiiiiiiiiii/openresty-openfaas
use openfaas ui to create function
use kubectl logs you_pods --namespace=openfaas-fn

Context

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ): 18.05.0-ce
Are you using Docker Swarm or Kubernetes (FaaS-netes)? Kubernetes
Operating System and version (e.g. Linux, Windows, MacOS): MacOS
Link to your project or a code example to reproduce issue:
https://github.com/feifeiiiiiiiiiii/faas_template

Healthcheck for `FROM scratch` images.

Currently, we are able to run fwatchdog in a FROM scratch image, but we cannot run the default healthcheck since there is no [/test command to check for a file's existence as far as I can tell.

Possible Solution

We could add a subcommand for fwatchdog which in turn performs the health check instead of relying on a [/test command being present , i.e.

HEALTHCHECK --interval=2s CMD ["/fwatchdog", "healthcheck"]

Add Http_Transfer_Encoding env-var to all modes but http

Add Http_Transfer_Encoding env-var to all modes but http. It was added to the classic watchdog and should be added here for the streaming and forking modes.

Possible solution

Copy the design from openfaas/faas#1423

[HELP WANTED]How the share of database connection between child process implemented

I noticed that the architecture says that:
This version of the of-watchdog brings new features for high-throughput and enables re-use of expensive resources such as database connection pools or machine-learning models.
as well as the golang-http-template mentioned that.

How is this feature implemented?

HTTP mode - Pass QueryString along to upstream_url

Expected Behaviour

QueryString should be available in the function when using HTTP mode

Current Behaviour

The QueryString is not forwarded or consumed by templates

Possible Solution

Pass along to upstream_url.
Update templates to read/pass the value

diff --git a/executor/http_runner.go b/executor/http_runner.go
index c73d5cf..fa15cf4 100644--- a/executor/http_runner.go
+++ b/executor/http_runner.go
@@ -96,7 +96,13 @@ func (f *HTTPFunctionRunner) Start() error {
 // Run a function with a long-running process with a HTTP protocol for communication func (f *HTTPFunctionRunner) Run(req FunctionRequest, contentLength int64, r *http.Request, w http.ResponseWriter) error{

-       request, _ := http.NewRequest(r.Method, f.UpstreamURL.String(), r.Body)
+       upstreamURL := f.UpstreamURL.String()
+
+       if len(r.URL.RawQuery) > 0 {
+               upstreamURL += "?" + r.URL.RawQuery
+       }
+
+       request, _ := http.NewRequest(r.Method, upstreamURL,r.Body)
        for h := range r.Header {
                request.Header.Set(h, r.Header.Get(h))
        }

Support question on unlimited timeouts

Dear maintainers,

README says that execution timeout will be disabled if exec_timeout is set to 0.

Exec timeout for process exec'd for each incoming request (in seconds). Disabled if set to 0.

However, when I set exec_timeout to 0, my function terminated as soon as it starts.

Expected Behaviour

Function never terminates while it is running.

Current Behaviour

Function terminates as soon as it starts.

docker swarm log says:

$ docker service log -f myfunction
Upstream HTTP request error: Post http://127.0.0.1:5000/: context deadline exceeded
Upstream HTTP killed due to exec_timeout: 0s

Possible Solution

Maybe this line causes immediate timeout.
https://github.com/openfaas-incubator/of-watchdog/blob/bae373954932a07d89ab926457d237f03f6c60dc/executor/http_runner.go#L127

You should set timeout only when ExecTimeout is non-zero value.

ctx := context.Background()
if f.ExecTimeout != 0 {
	var cancel context.CancelFunc
	ctx, cancel = context.WithTimeout(ctx, f.ExecTimeout)
	defer cancel()
}

Steps to Reproduce (for bugs)

fetch this template https://github.com/openfaas-incubator/python-flask-template/tree/master/template/python3-flask
add ENV exec_timeout="0s" to Dockerfile
create new function from the template
build, push, deploy and invoke the function

Your Environment

Docker version 18.09.1, build 4c52b90
Docker swarm
Linux (vagrant, vm.box = "bento/centos-7.4")

Thank you.

Websockets and HTTP mode

Is it possible to use of-watchdog in HTTP mode to support websockets?

Your Environment

Docker 18.03.1
Docker Swarm (locally) and Kubernetes (on Azure, in the future)
Windows and MacOS locally, Linux on the server.

of-watchdog can't be auto scaled?

I created a of-watchdog which using go-http template.
set max_inflight: 5
with 80 Concurrent request. this was still only one pod instance on k8s

Upstream HTTP request error

I am using OpenFaaS for a long running process. (Currently 3 minute jobs but plan to scale this up to 2 hour jobs).

I am using async-function to queue these jobs with a callback url.
I am also using the golang-http template.

2019/02/20 01:04:16 stderr: 2019/02/20 01:04:16 Starting job
2019/02/20 01:06:55 Upstream HTTP request error: Post http://127.0.0.1:8081/: EOF

It is not clear to me why this is set to 127.0.0.1:8081 as no docker containers are running on this port. I assume that this is the port the of-watchdog go process is running within the container, hence I am opening the issue here.

The callback url is receiving an empty POST with 503 as the function status code.
This does not occur when the handler returns a response earlier (e.g. if there is a 4xx error before processing).

Expected Behaviour

Callback URL receives the data the handler returns.

Current Behaviour

It appears the Upstream HTTP request causes an error, which then causes the POST to the callback url to be empty.

Steps to Reproduce (for bugs)

Start with a golang-http template.
Long running process in Handler
Set environment: exec_timeout: 86400s in the stack.yml file (0s doesn't work)
Invoke function with a callback Url
tail the function logs and observe the POST error
Observe empty POST on callback server

Context

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ):

Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:47 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Docker Swarm
Operating System and version (e.g. Linux, Windows, MacOS):
Ubuntu 18.04

Add Travis / Go & OpenFaaS GitHub badges to README.md

Expected Behaviour

We should have Travis / Go & OpenFaaS GitHub badges in our README.md file just like openfaas/faas openfaas/faas-netes openfaas/faas-cli and etc.

Current Behaviour

Not shown, manually have to discover the build if it's passing/failing or a PR etc.

Possible Solution

See the repos mentioned above and retrofit for this project.

JVM never receives SIGTERM (shutdownhook never called)

Expected Behaviour

I would expect the JVM to receive SIGTERM and then terminate.
I am running a HTTP4S scala webserver in this project:
https://github.com/hejfelix/fp-exercises-and-grading/blob/master/http4s_faas/openfaas/Dockerfile

Current Behaviour

JVM never shuts down before docker container is killed

Possible Solution

Not sure, it seems like watchdog is not forwarding the shutdown hook?

Steps to Reproduce (for bugs)

Add a shutdown hook to any function running in http mode

Context

I want to be able to clean up resources, e.g. database connections, unfinished operations, etc.

Your Environment

Running on docker swarm on my macbook pro

Client: Docker Engine - Community
 Version:           18.09.0-ce-beta1
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        78a6bdb
 Built:             Thu Sep  6 22:41:53 2018
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0-ce-beta1
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       78a6bdb
  Built:            Thu Sep  6 22:49:35 2018
  OS/Arch:          linux/amd64
  Experimental:     true

Additional unit test coverage needed

Where possible let's add unit test coverage starting with config parsing/reading and then moving on to test the various handlers using https://golang.org/pkg/net/http/httptest/.

I'd like to see a series of small and well-defined PRs. Please prioritize the HTTP mode.

Alex

Serializing mode should capture stderr (optionally)

To match the classic watchdog 1-for-1 the serializing mode should capture stderr (optionally)

When combine_output is set to false stderr goes to the container logs, when true it comes back in the function response.

Example for testing: port=8081 mode=serializing fprocess="stat x" ./of-watchdog

When calling localhost:8081 you should see the output of stderr either in the container logs or in the response - currently we see it in neither place.

HTTP Executor has unnecessary channel select in the error handling

When reviewing a support issue related to the Gateway, I ended up reviewing the HTTP exector to find any context timeouts. It does have a timeout during the proxy to the function implementation, which is fine, but the processing of the error case from the HTTP client does not need the select statement it currently has.

Expected Behaviour

The client error check can be simplified because if the err is already not nil, then either context has already failed or some other error has occurred and we do not need to wait for the timeout. It should be equivalent to use this

if err != nil {

    log.Printf("Upstream HTTP request error: %s\n", err.Error())
    if reqCtx.Err() == context.DeadlineExceeded {
        log.Printf("Upstream HTTP killed due to exec_timeout: %s\n", f.ExecTimeout)
        w.Header().Set("X-Duration-Seconds", fmt.Sprintf("%f", time.Since(startedTime).Seconds()))

        w.WriteHeader(http.StatusGatewayTimeout)
        return nil
    }

    // Error unrelated to context / deadline
    w.Header().Set("X-Duration-Seconds", fmt.Sprintf("%f", time.Since(startedTime).Seconds()))

    w.WriteHeader(http.StatusInternalServerError)

    return nil
}

This is in fact, more explicit and accurate, because it only checks for timeouts and ignores cancels, so the http response code is more accurate. According to the context docs, the context error can be DeadlineExceeded or Canceled, per https://golang.org/pkg/context/#pkg-variables. We either want to handle Canceled separately or treat it as a generic error.

Current Behaviour

When the client errors, we potentially wait for the context timeout, like this

if err != nil {
    log.Printf("Upstream HTTP request error: %s\n", err.Error())


    // Error unrelated to context / deadline
    if reqCtx.Err() == nil {
        w.Header().Set("X-Duration-Seconds", fmt.Sprintf("%f", time.Since(startedTime).Seconds()))


        w.WriteHeader(http.StatusInternalServerError)


        return nil
    }


    select {
    case <-reqCtx.Done():
        {
            if reqCtx.Err() != nil {
                // Error due to timeout / deadline
                log.Printf("Upstream HTTP killed due to exec_timeout: %s\n", f.ExecTimeout)
                w.Header().Set("X-Duration-Seconds", fmt.Sprintf("%f", time.Since(startedTime).Seconds()))


                w.WriteHeader(http.StatusGatewayTimeout)
                return nil
            }


        }
    }

is there a way to configure redirects?

When in HTTP mode, redirects seem to be proxied, not actually redirected

Expected Behaviour

When I

create a server in http mode, and
send a redirect, for example, res.redirect('http://google.com')

I expect when I visit my cloud function in the browser to redirect to google.com

Current Behaviour

I visit the cloud function in the browser
the URL doesn't change (e.g. http://127.0.0.13112/function/myfunction)
the content rendered looks like the content of google.com (but the URL bar doesn't show google)

From my intuition, it looks like a proxy is happening here.

Possible Solution

Is there a way to create a flag that says something like redirects or allow_redirects or?

Steps to Reproduce (for bugs)

Create a server using http mode.
Create a redirect response in a handler.
Visit the cloud function in a browser.

Context

Would be ideal to use cloud functions for things like OAuth or other web based services

Your Environment

0.4.0/of-watchdog

Serializing,streaming mode should return 500 when exit code is non-zero

Serializing & streaming mode should return 500 when exit code is non-zero

Unlike the classic watchdog it is returning 200 despite a non-zero exit code.

Example:

port=8081 mode=serializing fprocess="stat x" ./of-watchdog

curl localhost:8081 -i
HTTP/1.1 200 OK

stat x should clearly return a non-zero exit-code and a message to stderr.

Chunked response from function is blocked

Dear maintainers,

I wrote my template that is similar to https://github.com/openfaas-incubator/python-flask-template/tree/master/template/python3-flask.

The different from the original is that my template uses stream response (or, chunked response) as follows.

@app.route("/", defaults={"path": ""}, methods=["POST", "GET"])
@app.route("/<path:path>", methods=["POST", "GET"])
def main_route(path):
    ...
    def gen():
        yield "1"
        time.sleep(1)
        yield "2"
        time.sleep(1)
        yield "3"

    return Response(gen())

When I invoke a function from the template, the function does not return response chunk by chunk.
It blocks 3 seconds, and then, returns whole the response (1, 2 and 3) all at once.

Expected Behaviour

Function returns response chunk by chunk.

Current Behaviour

Function blocks 3 seconds, and then, returns whole the response all at once.

Possible Solution

The following code blocks until whole the response returned.

https://github.com/openfaas-incubator/of-watchdog/blob/bae373954932a07d89ab926457d237f03f6c60dc/executor/http_runner.go#L173

And openfaas/faas's following code also blocks, maybe.

https://github.com/openfaas/faas/blob/04d240aac26acdaf49863dbab768b4f5cb311d84/gateway/handlers/forwarding_proxy.go#L130

Steps to Reproduce (for bugs)

fetch this template https://github.com/openfaas-incubator/python-flask-template/tree/master/template/python3-flask
edit the template as described above
create new function from the template
build, push, deploy and invoke the function

Your Environment

Docker version 18.09.1, build 4c52b90
Docker swarm
Linux (vagrant, vm.box = "bento/centos-7.4")

Thank you.

Proposal / Question: Concurrency Limiting

Expected Behaviour

It would be nice to be able to limit the number of concurrent requests in flight. For example, if I have a workload where the amount of memory per request uses up to 2GB (let's say we disable GC), and I have 9 GB of memory, I can safely keep 4 requests in parallel. If I accidentally invoke the 5th request, my container will explode, killing everything.

Current Behaviour

The watchdog will continue to accept requests, and pass them on. It's up to the workload to figure out how to handle this.

Possible Solution

Add a concurrency limit to the number of onflight requests. This could just be middleware that gets added in buildRequestHandler. If more than N requests are in flight, it would just return an HTTP 429.

One thing we may want to consider is to make it so that rather than returning a 429 immediately, we queue. I believe that if we do this, it's best we pull in https://godoc.org/golang.org/x/sync/semaphore, but there are tradeoffs here.

Watchdog should be passthrough for function process for logging

Expected Behavior

In the cases where invocation is not utilizing the stdout/err pipes for communication with the watchdog the watchdog should not modify or impose structure on the logs passing through from these event streams. This allows me as an application author to manage how my logs are presented in order to be processed downstream, for example if I wanted to use structured logging such as with JSON or CSV. Ideally the Watchdog should not mangle the output of my application except to make it available to the platform log driver.

There are some logs that need to be emitted from the watchdog itself:
• Errors parsing config
• Forked process termination
• Issues reading from the pipe files
• Errors communicating with metrics services

These of-watchdog logs should be able to be easily and optionally be skipped or selected on by log parsers/shippers so that I can have an unobscured event trail of logs as emitted by my application.

Current Behavior

There are several undesirable behaviors that exist in the current implementation that highlight the mangling that is currently being done by the Watchdog.

Lines longer than 256 bytes are split

There is a hard limit on the number of bytes that can be in a single line from the wrapped function process. For instance a message of 300 bytes emitted from the underlying function would be split to two lines of 256 bytes and the remaining 44 bytes below. Even worse it can be split across output from other go routines (such as the watchdog itself).

Example:

2019/08/09 00:01:42 stderr: 2019/08/09 00:01:42 {"abexi":"yxgdndacmbhgwhadjnba","aclra":"dbgmdiytcwloabveajbe","aczai":"nlqlibaatlyhdaopovfo","aereu":"nuzjqxmzotarlutmygms","afgie":"hlgizmhgzptxtfrgkaqq","ahugb":"qxifgcyvgcazaefizhgw","ajxwk":"afikzruuywsuwkobbuor","akgza":"altlhtuzh
2019/08/09 00:01:42 GET / - 200 OK - ContentLength: 2
2019/08/09 00:01:42 stderr: oidz","zxbqo":"tukdfklasvyafstrdpoj","zynxg":"bfrwcxacobgabksdrjdi","zzawj":"lvncjvkyrwrlixxqcahq","zzhms":"ouyykvnikbudiryewvos","zzttv":"wvtuerakfsxlplgaftry"}

Stderr and Stdout are redirected to Stderr for `HttpRunner`

These files are being piped into the same output file Stderr with a prefix. This is somewhat surprising since for the HttpRunner both of these files are written to, however Stdout does not contain any output from my wrapped application.The prefix itself isn't that bad as it can allow me to easily determine if a log line is from the wrapped process or the of-watchdog itself. However combined with the hard limit of 256 bytes splitting lines up, if I have a structured log line it becomes extremely difficult to parse these structures consistently.

Example:

stdout: My function's original log message to stdout
stderr: My function's original log message to stderr

Null bytes and extra new lines in output.

The current implementation is calling Println and passing a buffer of length 256. This is resulting in all of the bytes to be written out and causing double new lines in those cases as a newline is being copied from the original buffer and the another one is applied to the end of the byte buffer and if you're using a viewer that doesn't strip null bytes (such as a tool that writes to a file) those null bytes are cluttering up the output.

Example log file:

2019/08/26 23:11:21 stderr: Starting application server
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

Possible Solution

I originally proposed a solution involving a line based tokenizer from bufio as a way to attempt to output align the mirrored logs with those of the wrapped function application. I believe this addresses the problems the best in that log lines are persisted from the underlying wrapped application and re-emitted with the common prefixing done by the watchdog as they were originally written.

Another possible solution is to increase the read buffer to be larger than expected output via making it a tunable parameter to the of-watchdog. So for applications that are doing verbose structured logging a much larger buffer can be provided versus something closer to the default provided now for a terse application.

Steps to Reproduce (for bugs)

I tested this using a contrived http mode go function named chatty-fn that just logs requests that it receives and a bunch of other extra information. I built the chatty-fn docker container and included different watchdogs to wrap it. I then boot the docker container directly, and issue network requests directly to it.

# Within chatty-fn
docker build -t chatty-fn:latest .
docker run -p 8080:8080 chatty-fn:latest > stdout.log 2> stderr.log
curl -X POST localhost:8080 -H 'Content-Type: application/json' --data '{"hello": "world"}'

Context

I have been attempting to add structured logging to an application running in a container. Using a tool like filebeat to parse, ship, and then extract those structured logs into a searchable system. Due to the small buffer size currently implemented it makes it difficult to handle these json log lines that extend beyond 256 bytes.

Your Environment

Docker version docker version (e.g. Docker 17.0.05 ):
Docker 19.03.1
Are you using Docker Swarm or Kubernetes (FaaS-netes)?
N/A
Operating System and version (e.g. Linux, Windows, MacOS):
Linux and MacOS
Link to your project or a code example to reproduce issue:
https://github.com/cconger/chatty-fn

openfaas / of-watchdog Goto Github PK

of-watchdog's Introduction

of-watchdog

Goals

Modes

API

1. HTTP (mode=http)

1.1 Status

1.2 Description

1.3 Structured logging

1.4 Tracing / correlation IDs

2. Serializing fork (mode=serializing)

2.1 Status

2.2 Description

3. Streaming fork (mode=streaming) - default.

4. Static (mode=static)

Metrics

Configuration

of-watchdog's People

Contributors

Stargazers

Watchers

Forkers

of-watchdog's Issues

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Expected Behaviour

Current Behaviour

Steps to Reproduce (for bugs)

Context

Your Environment

Possible Solution

Related

Update Dockerfile and rebuild/release

Only Rebuild

Expected Behaviour

Current Behaviour

Possible Solution

Removing all the prefix from the log (including the timestamp)

Formatted printing for the function's response status

Context

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Brief summary including motivation/context:

Any design changes

Pros + Cons

Pros

Cons

Effort required up front

Effort required for CI/CD, release, ongoing maintenance

Migration strategy / backwards-compatibility

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Expected Behaviour

Current Behaviour

Expected Behaviour

Current Behaviour

Possible Solution