Code Monkey home page Code Monkey logo

nats-server's Introduction

NATS Logo

NATS is a simple, secure and performant communications system for digital systems, services and devices. NATS is part of the Cloud Native Computing Foundation (CNCF). NATS has over 40 client language implementations, and its server can run on-premise, in the cloud, at the edge, and even on a Raspberry Pi. NATS can secure and simplify design and operation of modern distributed systems.

License Build Release Slack Coverage Docker Downloads CII Best Practices Artifact Hub

Documentation

Contact

  • Twitter: Follow us on Twitter!
  • Google Groups: Where you can ask questions
  • Slack: Click here to join. You can ask question to our maintainers and to the rich and active community.

Contributing

If you are interested in contributing to NATS, read about our...

Roadmap

The NATS product roadmap can be found here.

Adopters

Who uses NATS? See our list of users on https://nats.io.

Security

Security Audit

A third party security audit was performed by Cure53, you can see the full report here.

Reporting Security Vulnerabilities

If you've found a vulnerability or a potential vulnerability in the NATS server, please let us know at nats-security.

License

Unless otherwise noted, the NATS source files are distributed under the Apache Version 2.0 license found in the LICENSE file.

nats-server's People

Contributors

adamkorcz avatar andyxning avatar aricart avatar bruth avatar cdevienne avatar colinsullivan1 avatar dependabot[bot] avatar derekcollison avatar devfacet avatar gcolliso avatar jarema avatar jnmoyne avatar kozlovic avatar krobertson avatar levb avatar matthiashanel avatar mauricevanveen avatar mcuadros avatar mprimi avatar neilalexander avatar nsurfer avatar philpennock avatar reubenmathew avatar ripienaar avatar scottf avatar svenfoo avatar tbeets avatar tylertreat avatar variadico avatar wallyqs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nats-server's Issues

Extract config parser?

Have you considered extracting the config package into it's own project. The format of the files appears to be really good.

Docker go get will need an additional flag for 1.4x+

If you are planning to upgrade the docker build script to utilize 1.4x golang compiles, you will most likely need to add an additional flag to the go get command:

from

RUN CGO_ENABLED=0 go get -a -ldflags '-s' github.com/apcera/gnatsd

to

RUN CGO_ENABLED=0 go get -a -ldflags '-s' --installsuffix cgo github.com/apcera/gnatsd

Although I didn't test this with your code, I did use your build scripts for my own model. I found issues with both FROM scratch and FROM busybox images. Binaries would not launch and docker would report back "file or directory" missing etc.

This is noted as a workaround:

golang/go#9344

See context in golang build:

http://golang.org/pkg/go/build/#Context

and is implemented in this script:

https://github.com/CenturyLinkLabs/golang-builder

which implements everything as published in this article:

http://blog.xebia.com/2014/07/04/create-the-smallest-possible-docker-container/

Client Types - Language

Be good if clients starting sending type information as well as names (e.g. Go, Node, Java, Ruby)

gnatsd HTTP monitoring has some issues

First, gnatsd takes a parameter to use as the HTTP monitoring port, however it doesn't actually use the passed parameter. The port is hardcoded to 6062.

Second, gnatsd always starts the HTTP server, but only registers the handlers for the server if the monitoring port is specified. The goroutine to launch the HTTP monitoring should be moved within the if statement to see if a monitoring port was specified.

Sublist multithreaded performance

I noticed that the Sublist tree which manages subscriptions uses a mutex for thread-safety, so every lookup and insert locks the tree. While I haven't looked through the gnatsd code extensively to see what the workload might normally look like, I imagine there is a lot of lock contention on the tree.

I recently implemented a similar, lock-free mechanism for managing subscriptions with wildcards so, out of curiosity, I benchmarked the two to see how they compare. Again, I'm not sure how the workloads compare in practice, but for my purposes, it's fairly parallel.

Here are the results of the benchmarks I ran. The first is a 1:1 insert-to-lookup workload. The second is a 1:3 insert-to-lookup workload. Each insertion goroutine is inserting 1000 items. Each lookup goroutine is looking up 1000 items. The timings correspond to the complete insert and lookup workload. GOMAXPROCS was set to 8; 2.6 GHz Intel Core i7.

benchmark_1_1

goroutines sublist (ms) matchbox (ms)
1 0.140295 0.375199
2 34.164212 2.427285
3 35.300207 3.018856
4 67.377877 3.166839
5 71.946041 3.908559
6 104.820086 4.147633
7 102.255676 4.930233
8 141.170446 4.895257
12 206.446077 7.528176
16 270.715069 10.086031

benchmark_1_3

goroutines sublist (ms) matchbox (ms)
1 0.425787 0.506438
2 34.532406 2.403753
3 35.66657 2.951643
4 36.015616 3.622817
5 69.265923 3.847979
6 69.915697 4.684678
7 70.707467 6.088789
8 70.569563 6.408932
12 107.247048 10.814583
16 146.281991 15.254943

As you can see, the lock-free approach scales a lot better under contention. This is also without a cache in front of its tree. I'm curious to hear your thoughts and if such a technique could be beneficial to gnatsd.

What guarantees are made by clustered mode?

It would be nice to know things like message order and delivery guarantees. For example, if there are 300 gnatsd nodes and I send a STOP and a START to subject X to two different nodes, will subscribers to X receive them in some "order"?

How does one test this code?

Please excuse if it's obvious, but I haven't done much with go,
don't know the conventions beyond setting GOROOT and
GOPATH, and running 'go build'.

Published messages are sent to consumers twice, once, or zero times when sent through certain members

This is on master, as of 6eb1dd2.

Buckle up.

We're seeing weird behavior when sending to a particular machine in our cluster of two.

Three clients subscribe to diego.staging.start. One is just nats-sub '>' for debugging. The others are actual consumers taking in messages and handling them. They normally subscribe with a queue, but we collected data with and without it.

We're doing a NATS request on diego.staging.start and expect to see a response from one (with queue) or both (without) consumers.

There are two NATS machines in our cluster, and two consumers.

Without queueing

We're seeing that, without a queue, if we send it to a particular machine, all clients receive the message twice. Sending to the other NATS machine is fine, ands both consumers receive it once.

Publish command:

nats-request -n 2 -s 'nats://nats:[email protected]:4222' diego.staging.start '{"app_id":"a","task_id":"b","download_uri":"http://example.com"}'
nats-request -n 2 -s 'nats://nats:[email protected]:4222' diego.staging.start '{"app_id":"a","task_id":"b","download_uri":"http://example.com"}'

Sending to 10.244.0.6:

The message is received by each connection twice.

Logs on 10.244.0.6:

"Client connection created", [192.168.50.1, 56959], 61]
[["CONNECT OP", "CONNECT {\"verbose\":false,\"pedantic\":false,\"user\":\"nats\",\"pass\":\"nats\"}"], "c: 61"]
[["PING OP"], "c: 61"]
[["PING OP"], "c: 61"]
[["SUB OP", "SUB _INBOX.646f990eb6a4b7079b510ee383  2"], "c: 61"]
[["PUB OP", "PUB diego.staging.start _INBOX.646f990eb6a4b7079b510ee383 64"], "c: 61"]
[["Processing Client msg: 1", "diego.staging.start", "_INBOX.646f990eb6a4b7079b510ee383", "{\"app_id\":\"a\",\"task_id\":\"b\",\"download_uri\":\"http://example.com\"}"], "c: 61"]
["Ignoring route, already processed", 61]
["Ignoring route, already processed", 61]
["Ignoring route, already processed", 61]
["Ignoring route, already processed", 61]
["Ignoring route, already processed", 61]
[["MSG OP", "MSG _INBOX.646f990eb6a4b7079b510ee383 RSID:61:2 2"], "c: 1"]
[["Processing Router msg: 43", "_INBOX.646f990eb6a4b7079b510ee383", "", "{}"], "c: 1"]
[["MSG OP", "MSG _INBOX.646f990eb6a4b7079b510ee383 RSID:61:2 2"], "c: 1"]
[["Processing Router msg: 44", "_INBOX.646f990eb6a4b7079b510ee383", "", "{}"], "c: 1"]
[["MSG OP", "MSG _INBOX.646f990eb6a4b7079b510ee383 RSID:61:2 2"], "c: 1"]
[["Processing Router msg: 45", "_INBOX.646f990eb6a4b7079b510ee383", "", "{}"], "c: 1"]
[["MSG OP", "MSG _INBOX.646f990eb6a4b7079b510ee383 RSID:61:2 2"], "c: 1"]
[["Processing Router msg: 46", "_INBOX.646f990eb6a4b7079b510ee383", "", "{}"], "c: 1"]
["Client connection closed", [192.168.50.1, 56959], 61]

Logs on 10.244.2.6:

[["SUB OP", "SUB _INBOX.646f990eb6a4b7079b510ee383  RSID:61:2"], "c: 1"]
[["SUB OP", "SUB _INBOX.646f990eb6a4b7079b510ee383  RSID:61:2"], "c: 6"]
[["MSG OP", "MSG diego.staging.start RSID:4:2 _INBOX.646f990eb6a4b7079b510ee383 64"], "c: 1"]
[["Processing Router msg: 82", "diego.staging.start", "_INBOX.646f990eb6a4b7079b510ee383", "{\"app_id\":\"a\",\"task_id\":\"b\",\"download_uri\":\"http://example.com\"}"], "c: 1"]
[["MSG OP", "MSG diego.staging.start QRSID:2:1 _INBOX.646f990eb6a4b7079b510ee383 64"], "c: 6"]
[["Processing Router msg: 34", "diego.staging.start", "_INBOX.646f990eb6a4b7079b510ee383", "{\"app_id\":\"a\",\"task_id\":\"b\",\"download_uri\":\"http://example.com\"}"], "c: 6"]
[["PUB OP", "PUB _INBOX.646f990eb6a4b7079b510ee383 2"], "c: 30"]
[["Processing Client msg: 88", "_INBOX.646f990eb6a4b7079b510ee383", "", "{}"], "c: 30"]
["Ignoring route, already processed", 30]
[["PUB OP", "PUB _INBOX.646f990eb6a4b7079b510ee383 2"], "c: 30"]
[["Processing Client msg: 89", "_INBOX.646f990eb6a4b7079b510ee383", "", "{}"], "c: 30"]
["Ignoring route, already processed", 30]
[["PUB OP", "PUB _INBOX.646f990eb6a4b7079b510ee383 2"], "c: 31"]
[["Processing Client msg: 88", "_INBOX.646f990eb6a4b7079b510ee383", "", "{}"], "c: 31"]
["Ignoring route, already processed", 31]
[["PUB OP", "PUB _INBOX.646f990eb6a4b7079b510ee383 2"], "c: 31"]
[["Processing Client msg: 89", "_INBOX.646f990eb6a4b7079b510ee383", "", "{}"], "c: 31"]
["Ignoring route, already processed", 31]

nats-sub '>'

[#418] Received on [diego.staging.start] : '{"app_id":"a","task_id":"b","download_uri":"http://example.com"}'
[#419] Received on [diego.staging.start] : '{"app_id":"a","task_id":"b","download_uri":"http://example.com"}'
[#420] Received on [_INBOX.646f990eb6a4b7079b510ee383] : '{}'
[#421] Received on [_INBOX.646f990eb6a4b7079b510ee383] : '{}'
[#422] Received on [_INBOX.646f990eb6a4b7079b510ee383] : '{}'
[#423] Received on [_INBOX.646f990eb6a4b7079b510ee383] : '{}'

Sending to 10.244.2.6:

The message is received by each client once (as expected).

Logs on 10.244.0.6:

[["SUB OP", "SUB _INBOX.b27c82c8c1f1b7562a166b2b95  RSID:68:2"], "c: 3"]
[["SUB OP", "SUB _INBOX.b27c82c8c1f1b7562a166b2b95  RSID:68:2"], "c: 1"]

Logs on 10.244.2.6 (which we DID send to):

["Client connection created", [192.168.50.1, 56933], 68]
[["CONNECT OP", "CONNECT {\"verbose\":false,\"pedantic\":false,\"user\":\"nats\",\"pass\":\"nats\"}"], "c: 68"]
[["PING OP"], "c: 68"]
[["PING OP"], "c: 68"]
[["SUB OP", "SUB _INBOX.b27c82c8c1f1b7562a166b2b95  2"], "c: 68"]
[["PUB OP", "PUB diego.staging.start _INBOX.b27c82c8c1f1b7562a166b2b95 64"], "c: 68"]
[["Processing Client msg: 1", "diego.staging.start", "_INBOX.b27c82c8c1f1b7562a166b2b95", "{\"app_id\":\"a\",\"task_id\":\"b\",\"download_uri\":\"http://example.com\"}"], "c: 68"]
[["PUB OP", "PUB _INBOX.b27c82c8c1f1b7562a166b2b95 2"], "c: 31"]
[["Processing Client msg: 87", "_INBOX.b27c82c8c1f1b7562a166b2b95", "", "{}"], "c: 31"]
[["PUB OP", "PUB _INBOX.b27c82c8c1f1b7562a166b2b95 2"], "c: 30"]
[["Processing Client msg: 87", "_INBOX.b27c82c8c1f1b7562a166b2b95", "", "{}"], "c: 30"]
["Client connection closed", [192.168.50.1, 56933], 68]

With Queueing

With a queue, our nats-sub always sees the request message, but if we send it to the same particular machine, no consumers receive the message.

Publish command:

nats-request -n 1 -s 'nats://nats:[email protected]:4222' diego.staging.start '{"app_id":"a","task_id":"b","download_uri":"http://example.com"}'
nats-request -n 1 -s 'nats://nats:[email protected]:4222' diego.staging.start '{"app_id":"a","task_id":"b","download_uri":"http://example.com"}'

I've captured the full logs of both machines for the full sequence; comments are inline showing the timing of the command.

Sending to 10.244.0.6

nats-sub

[#424] Received on [diego.staging.start] : '{"app_id":"a","task_id":"b","download_uri":"http://example.com"}'
# no response from consumers

Logs on 10.244.0.6

[&{Host:10.244.0.6 Port:4222 Trace:true Debug:true NoLog:false NoSigs:false Logtime:false MaxConn:65536 Username:nats Password:nats Authorization: PingInterval:2m0s MaxPingsOut:2 HTTPPort:0 SslTimeout:0.5 AuthTimeout:15 MaxControlLine:1024 MaxPayload:1048576 ClusterHost:10.244.0.6 ClusterPort:4223 ClusterUsername:nats ClusterPassword:nats ClusterAuthTimeout:15 Routes:[nats-route://nats:[email protected]:4223] ProfPort:0 PidFile:/var/vcap/sys/run/nats/nats.pid LogFile:/var/vcap/sys/log/nats/nats.log}]
["DEBUG is on"]
["TRACE is on"]
["Starting gnatsd version 0.4.6"]
["Listening for route connections on 10.244.0.6:4223"]
["Listening for client connections on 10.244.0.6:4222"]
["gnatsd is ready"]
["Trying to connect to route on 10.244.2.6:4223"]
["Error trying to connect to route: dial tcp 10.244.2.6:4223: connection refused"]
["Route connection created", [10.0.2.15, 43494], 1]
["Route sent local subscriptions", 1]
[["CONNECT OP", "CONNECT {\"verbose\":false,\"pedantic\":false,\"user\":\"nats\",\"pass\":\"nats\",\"ssl_required\":false,\"name\":\"2319669868152a7ed5f753d50352ba41\"}"], "c: 1"]
["Registering remote route", "2319669868152a7ed5f753d50352ba41"]
[["SUB OP", "SUB diego.staging.start diego.stagers QRSID:2:1"], "c: 1"]
[["SUB OP", "SUB >  RSID:3:2"], "c: 1"]
["Trying to connect to route on 10.244.2.6:4223"]
["Route connection created", [10.244.2.6, 4223], 2]
["Route connect msg sent", [10.244.2.6, 4223], 2]
["Route sent local subscriptions", 2]
["Detected duplicate remote route", "2319669868152a7ed5f753d50352ba41", [10.244.2.6, 4223], 2]
["Router connection closed", [10.244.2.6, 4223], 2]
["Attempting reconnect for solicited route", 2]
[["SUB OP", "SUB diego.staging.start diego.stagers QRSID:2:1"], "c: 2"]
[["SUB OP", "SUB >  RSID:3:2"], "c: 2"]
["Trying to connect to route on 10.244.2.6:4223"]
["Route connection created", [10.244.2.6, 4223], 3]
["Route connect msg sent", [10.244.2.6, 4223], 3]
["Route sent local subscriptions", 3]
["Registering remote route", "2319669868152a7ed5f753d50352ba41"]
[["SUB OP", "SUB diego.staging.start diego.stagers QRSID:2:1"], "c: 3"]
[["SUB OP", "SUB >  RSID:3:2"], "c: 3"]

# Sending to 10.244.0.6 (consumers did not receive anything):
[["SUB OP", "SUB diego.staging.start diego.stagers QRSID:6:1"], "c: 1"]
[["SUB OP", "SUB diego.staging.start diego.stagers QRSID:6:1"], "c: 3"]
["Client Ping Timer", [10.0.2.15, 43494], 1]
[["PING OP"], "c: 1"]
[["PONG OP"], "c: 1"]
["Client Ping Timer", [10.244.2.6, 4223], 3]
[["PING OP"], "c: 3"]
[["PONG OP"], "c: 3"]
["Client connection created", [192.168.50.1, 57202], 4]
[["CONNECT OP", "CONNECT {\"verbose\":false,\"pedantic\":false,\"user\":\"nats\",\"pass\":\"nats\"}"], "c: 4"]
[["PING OP"], "c: 4"]
[["PING OP"], "c: 4"]
[["SUB OP", "SUB _INBOX.542fde2bc4851d06c7d46ac09e  2"], "c: 4"]
[["PUB OP", "PUB diego.staging.start _INBOX.542fde2bc4851d06c7d46ac09e 64"], "c: 4"]
[["Processing Client msg: 1", "diego.staging.start", "_INBOX.542fde2bc4851d06c7d46ac09e", "{\"app_id\":\"a\",\"task_id\":\"b\",\"download_uri\":\"http://example.com\"}"], "c: 4"]
["Ignoring route, already processed", 4]
["Ignoring route, already processed", 4]

# Sending to 10.244.2.6 (functioned properly):
["Client connection closed", [192.168.50.1, 57202], 4]
[["SUB OP", "SUB _INBOX.dd5458fa34dca53c342e7834df  RSID:7:2"], "c: 3"]
[["SUB OP", "SUB _INBOX.dd5458fa34dca53c342e7834df  RSID:7:2"], "c: 1"]

Sending to 10.244.2.6

nats-sub

[#425] Received on [diego.staging.start] : '{"app_id":"a","task_id":"b","download_uri":"http://example.com"}'
# consumer responds:
[#426] Received on [_INBOX.dd5458fa34dca53c342e7834df] : '{}'

Logs on 10.244.2.6

[&{Host:10.244.2.6 Port:4222 Trace:true Debug:true NoLog:false NoSigs:false Logtime:false MaxConn:65536 Username:nats Password:nats Authorization: PingInterval:2m0s MaxPingsOut:2 HTTPPort:0 SslTimeout:0.5 AuthTimeout:15 MaxControlLine:1024 MaxPayload:1048576 ClusterHost:10.244.2.6 ClusterPort:4223 ClusterUsername:nats ClusterPassword:nats ClusterAuthTimeout:15 Routes:[nats-route://nats:[email protected]:4223] ProfPort:0 PidFile:/var/vcap/sys/run/nats/nats.pid LogFile:/var/vcap/sys/log/nats/nats.log}]
["DEBUG is on"]
["TRACE is on"]
["Starting gnatsd version 0.4.6"]
["Listening for route connections on 10.244.2.6:4223"]
["Listening for client connections on 10.244.2.6:4222"]
["gnatsd is ready"]
["Trying to connect to route on 10.244.0.6:4223"]
["Route connection created", [10.244.0.6, 4223], 1]
["Route connect msg sent", [10.244.0.6, 4223], 1]
["Route sent local subscriptions", 1]
["Registering remote route", "6d10583879befb490c2b15e4bbdb1259"]
["Client connection created", [10.0.2.15, 45175], 2]
[["CONNECT OP", "CONNECT {\"user\":\"nats\",\"pass\":\"nats\",\"verbose\":true,\"pedantic\":true}"], "c: 2"]
[["SUB OP", "SUB diego.staging.start diego.stagers 1"], "c: 2"]
["Client connection created", [192.168.50.1, 57086], 3]
[["CONNECT OP", "CONNECT {\"verbose\":false,\"pedantic\":false,\"user\":\"nats\",\"pass\":\"nats\"}"], "c: 3"]
[["SUB OP", "SUB >  2"], "c: 3"]
[["PING OP"], "c: 3"]
["Route connection created", [10.0.2.15, 43845], 4]
["Route sent local subscriptions", 4]
[["CONNECT OP", "CONNECT {\"verbose\":false,\"pedantic\":false,\"user\":\"nats\",\"pass\":\"nats\",\"ssl_required\":false,\"name\":\"6d10583879befb490c2b15e4bbdb1259\"}"], "c: 4"]
["Detected duplicate remote route", "6d10583879befb490c2b15e4bbdb1259", [10.0.2.15, 43845], 4]
["Router connection closed", [10.0.2.15, 43845], 4]
["Route connection created", [10.0.2.15, 43852], 5]
["Route sent local subscriptions", 5]
[["CONNECT OP", "CONNECT {\"verbose\":false,\"pedantic\":false,\"user\":\"nats\",\"pass\":\"nats\",\"ssl_required\":false,\"name\":\"6d10583879befb490c2b15e4bbdb1259\"}"], "c: 5"]
["Registering remote route", "6d10583879befb490c2b15e4bbdb1259"]

# Sending to 10.244.0.6 (consumers did not receive anything):
["Client connection created", [10.0.2.15, 55657], 6]
[["CONNECT OP", "CONNECT {\"user\":\"nats\",\"pass\":\"nats\",\"verbose\":true,\"pedantic\":true}"], "c: 6"]
[["SUB OP", "SUB diego.staging.start diego.stagers 1"], "c: 6"]
["Client Ping Timer", [10.244.0.6, 4223], 1]
[["PING OP"], "c: 1"]
[["PONG OP"], "c: 1"]
["Client Ping Timer", [10.0.2.15, 45175], 2]
[["PONG OP"], "c: 2"]
[["PING OP"], "c: 3"]
["Client Ping Timer", [192.168.50.1, 57086], 3]
[["PONG OP"], "c: 3"]
["Client Ping Timer", [10.0.2.15, 43852], 5]
[["PING OP"], "c: 5"]
[["PONG OP"], "c: 5"]
[["SUB OP", "SUB _INBOX.542fde2bc4851d06c7d46ac09e  RSID:4:2"], "c: 1"]
[["SUB OP", "SUB _INBOX.542fde2bc4851d06c7d46ac09e  RSID:4:2"], "c: 5"]
[["MSG OP", "MSG diego.staging.start RSID:3:2 _INBOX.542fde2bc4851d06c7d46ac09e 64"], "c: 1"]
[["Processing Router msg: 1", "diego.staging.start", "_INBOX.542fde2bc4851d06c7d46ac09e", "{\"app_id\":\"a\",\"task_id\":\"b\",\"download_uri\":\"http://example.com\"}"], "c: 1"]



["Client Ping Timer", [10.0.2.15, 55657], 6]
[["PONG OP"], "c: 6"]


# Sending to 10.244.2.6 (worked properly):
["Client connection created", [192.168.50.1, 57203], 7]
[["CONNECT OP", "CONNECT {\"verbose\":false,\"pedantic\":false,\"user\":\"nats\",\"pass\":\"nats\"}"], "c: 7"]
[["PING OP"], "c: 7"]
[["PING OP"], "c: 7"]
[["SUB OP", "SUB _INBOX.dd5458fa34dca53c342e7834df  2"], "c: 7"]
[["PUB OP", "PUB diego.staging.start _INBOX.dd5458fa34dca53c342e7834df 64"], "c: 7"]
[["Processing Client msg: 1", "diego.staging.start", "_INBOX.dd5458fa34dca53c342e7834df", "{\"app_id\":\"a\",\"task_id\":\"b\",\"download_uri\":\"http://example.com\"}"], "c: 7"]
[["PUB OP", "PUB _INBOX.dd5458fa34dca53c342e7834df 2"], "c: 2"]
[["Processing Client msg: 1", "_INBOX.dd5458fa34dca53c342e7834df", "", "{}"], "c: 2"]
["Client connection closed", [192.168.50.1, 57203], 7]

Change the project description to optimize searching in Github

I was trying to search "message queue" built with Golang on Github but only get a few client wrappers (ordered by most stars). Later a friend on social network recommends NATS and I realize what's wrong with the searching:

Please just change the project description to something like: "NATS: an open-source, high-performance, lightweight cloud message queue system".

How to actively connect to a cluster port?

It looks as if the only way to register new routes is to add them to the config file and then restart the main gnatsd server (that which was launched with this sample file).

Reading the source (the routeAcceptLoop function), I understand that new routes (gnatsd servers) can actively connect to the cluster port to register themselves, however I don't see an option to do so in gnatds -h or server.go.

Is it yet to be implemented?

remove route subscribe when close client connection.

client.go:

srv := c.srv
typ := c.typ
c.mu.Unlock()

           // Remove clients subscriptions.
    for _, s := range subs {
        sub, ok := s.(*subscription)
        if ok {
            if typ == CLIENT {
                srv.broadcastUnSubscribe(sub)
            }
            srv.sl.Remove(sub.subject, sub)
        }
    }

Persistence

Does gnatsd support messaging persistence, if not, how can I implement it.

Crash gnatsd

I tried to create a Haskell NATS binding and made gnatsd crash quite easily - I happened to send a command in several TCP packets. TCP is a stream oriented protocol, so that should be perfectly correct. However, I can crash NATS quite easily. This program causes parse errors:

#!/usr/bin/python
import socket
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 4222))
print s.recv(1024)
s.send('CONNECT ')
time.sleep(0.1)
s.send('{"verbose":true,"ssl_required":false,"user":"test","pedantic":true,"pass":"password"}')
time.sleep(0.1)
s.send('\r\n')
print s.recv(1024)

And this one crashes gnatsd:

#!/usr/bin/python
import socket
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 4222))
print s.recv(1024)
s.send('CONNECT {"verbose":true,"ssl_required":false,"user":"test","pedantic":true,"pass":"password"}')
time.sleep(0.1)
s.send('\r\n')
print s.recv(1024)

BTW: the protocol design isn't nice. The INBOXes should not be assigned randomly, the responses should be somehow tagged (i.e. IMAP style). The cluster behaviour is undocumented; I guess there are no guarantees as to what would happen on crashes/network partitions etc.

go test fails if port 4242 is already taken

If port 4242 is already bind by other software test server does not fail because it can't use that port.

$ go test ./...
?       github.com/apcera/gnatsd    [no test files]
?       github.com/apcera/gnatsd/auth   [no test files]
ok      github.com/apcera/gnatsd/conf   0.004s
ok      github.com/apcera/gnatsd/hash   0.005s
ok      github.com/apcera/gnatsd/hashmap    0.007s
ok      github.com/apcera/gnatsd/logger 0.524s
ok      github.com/apcera/gnatsd/server 2.019s
ok      github.com/apcera/gnatsd/sublist    0.076s
--- FAIL: TestSendRouteSubAndUnsub (0.01s)
    test.go:145: Response did not match expected:
            Received:'"\x80c\x00\x00\x00A-18782|com.code42.messaging.security.SecurityProviderReadyMessage"'
            Expected:'INFO\s+([^\r\n]+)\r\n'

        6 - /Users/parkerkane/prj/Personal/gnatsd/src/github.com/apcera/gnatsd/test/routes_test.go:88
        7 - /usr/local/Cellar/go/1.4.2/libexec/src/testing/testing.go:447
        8 - /usr/local/Cellar/go/1.4.2/libexec/src/runtime/asm_amd64.s:2232
--- FAIL: TestRouteForwardsMsgFromClients (0.34s)
    test.go:145: Response did not match expected:
            Received:'"\x80c\x00\x00\x00A-18782|com.code42.messaging.security.SecurityProviderReadyMessage\xb6\xa2\x00\x00\x00\""'
            Expected:'INFO\s+([^\r\n]+)\r\n'

        6 - /Users/parkerkane/prj/Personal/gnatsd/src/github.com/apcera/gnatsd/test/routes_test.go:151
        7 - /usr/local/Cellar/go/1.4.2/libexec/src/testing/testing.go:447
        8 - /usr/local/Cellar/go/1.4.2/libexec/src/runtime/asm_amd64.s:2232
--- FAIL: TestRouteForwardsMsgToClients (0.01s)
    test.go:145: Response did not match expected:
            Received:'"\x80c\x00\x00\x00A-18782|com.code42.messaging.security.SecurityProviderReadyMessage"'
            Expected:'INFO\s+([^\r\n]+)\r\n'

        6 - /Users/parkerkane/prj/Personal/gnatsd/src/github.com/apcera/gnatsd/test/routes_test.go:183
        7 - /usr/local/Cellar/go/1.4.2/libexec/src/testing/testing.go:447
        8 - /usr/local/Cellar/go/1.4.2/libexec/src/runtime/asm_amd64.s:2232
--- FAIL: TestRouteOnlySendOnce (0.01s)
    test.go:145: Response did not match expected:
            Received:'"\x80c\x00\x00\x00A-18782|com.code42.messaging.security.SecurityProviderReadyMessage"'
            Expected:'INFO\s+([^\r\n]+)\r\n'

        6 - /Users/parkerkane/prj/Personal/gnatsd/src/github.com/apcera/gnatsd/test/routes_test.go:230
        7 - /usr/local/Cellar/go/1.4.2/libexec/src/testing/testing.go:447
        8 - /usr/local/Cellar/go/1.4.2/libexec/src/runtime/asm_amd64.s:2232
--- FAIL: TestRouteQueueSemantics (0.01s)
    test.go:145: Response did not match expected:
            Received:'"\x80c\x00\x00\x00A-18782|com.code42.messaging.security.SecurityProviderReadyMessage"'
            Expected:'INFO\s+([^\r\n]+)\r\n'

        6 - /Users/parkerkane/prj/Personal/gnatsd/src/github.com/apcera/gnatsd/test/routes_test.go:259
        7 - /usr/local/Cellar/go/1.4.2/libexec/src/testing/testing.go:447
        8 - /usr/local/Cellar/go/1.4.2/libexec/src/runtime/asm_amd64.s:2232
--- FAIL: TestMultipleRoutesSameId (0.22s)
    test.go:145: Response did not match expected:
            Received:'"\x80c\x00\x00\x00A-18782|com.code42.messaging.security.SecurityProviderReadyMessage"'
            Expected:'INFO\s+([^\r\n]+)\r\n'

        6 - /Users/parkerkane/prj/Personal/gnatsd/src/github.com/apcera/gnatsd/test/routes_test.go:392
        7 - /usr/local/Cellar/go/1.4.2/libexec/src/testing/testing.go:447
        8 - /usr/local/Cellar/go/1.4.2/libexec/src/runtime/asm_amd64.s:2232
--- FAIL: TestRouteResendsLocalSubsOnReconnect (0.01s)
    test.go:145: Response did not match expected:
            Received:'"\x80c\x00\x00\x00A-18782|com.code42.messaging.security.SecurityProviderReadyMessage"'
            Expected:'INFO\s+([^\r\n]+)\r\n'

        6 - /Users/parkerkane/prj/Personal/gnatsd/src/github.com/apcera/gnatsd/test/routes_test.go:428
        7 - /usr/local/Cellar/go/1.4.2/libexec/src/testing/testing.go:447
        8 - /usr/local/Cellar/go/1.4.2/libexec/src/runtime/asm_amd64.s:2232
--- FAIL: TestAutoUnsubPropogation (0.02s)
    test.go:145: Response did not match expected:
            Received:'"\x80c\x00\x00\x00A-18782|com.code42.messaging.security.SecurityProviderReadyMessage"'
            Expected:'INFO\s+([^\r\n]+)\r\n'

        6 - /Users/parkerkane/prj/Personal/gnatsd/src/github.com/apcera/gnatsd/test/routes_test.go:479
        7 - /usr/local/Cellar/go/1.4.2/libexec/src/testing/testing.go:447
        8 - /usr/local/Cellar/go/1.4.2/libexec/src/runtime/asm_amd64.s:2232
FAIL
FAIL    github.com/apcera/gnatsd/test   8.757s

Details

$ (cd src/github.com/apcera/gnatsd && git rev-parse HEAD)
7f794ad78fcea09114b3bf7dd6b7079dc353ac79

$ go version
go version go1.4.2 darwin/amd64

wire protocol?

Where i can find the wire protocols specs?
Its needed for any who want build a driver for unsupported languages (python, php, etc)

comment typo

Know it's a nit-pic, but line 27 of hash.go should reference "Fowler-..." not "Fowle-..."

Thanks for sharing the package. I was looking for something EXACTLY like this.

Priority over Wildcards ?

Hi,

Is there any mechanism for sending a message to a node who listen specificly to a topic, but if nobody has subscribed to this specifi topic, then we will try to query subscribers with wildcard.

Bellow is desired behaviour :

NATS.subscribe( "foo.bar.>" , :queue => 'job.workers') { |msg| puts "I do wildcard!" }
NATS.subscribe( "foo.bar.qux" , :queue => 'job.workers') { |msg| puts "I do qux!" }


NATS.request('foo.bar.qux') { |response| puts "QUX got a response: '#{response}'" }
NATS.request('foo.bar.baz') { |response| puts "BAZ got a response: '#{response}'" }

Desired output :

QUX got a response: I do qux!
BAZ got a response: I do wildcard!

Multi-node gnatsd fails to reconnect

Just wanted to give you a heads up.

We have hit a certain situation where the connection between 2 gnatsd breaks and fails to reconnect.
I am running more tests to identify the exact situation that causes this behavior.
I have already run across a few issues, so of which I will provide pull requests for.

  1. Add routes and remotes to varz. This just helps us understand when we are in a bad way.
  2. The remove self-referencing code is a bit broken. It disallows you from starting 2 gnatsd on the same box. It should also ignore routes that are using a different port. Something else looks funny but it will all be fixed at the same time.
  3. If you have 2 gnatsd and the cluster config has defined each other, on startup, there is a slight race to processRouteInfo between the solicitRoute path and the non-solicited path. A sleep in route.go:317 will cause a looping effect. Normally it will resolve itself eventually.

Another piece of code that I found troubling is client.processRouteInfo:client.goL206
If two routines execute this, they can both return false and then the last one will win as the "remote" client.

i have do some test for ruby nats server vs go nats server

i use your benchmark/pub.rb and sub.rb for test.

after my test ,i find that
1.when message's byte length larger than 3k ,there is no different between ruby nats server and go nats server in cpu usage.but less than 3k,go nats server is more better than ruby nats server.i will continue to test this and report.
2.ruby nats server use 20M memory vs go natst server use 3M
3.when i change message nums from 100000 to 1000000 ,in ruby nats server mode,there will be error if i choose 16bytes message.
thanks.

Error occurs if queue name contains white space

Hi,
If the queue name contains white space, e.g. "job worker", the gnatsd will return an error "processSub Parse Error". There's nowhere in the document mentions queue name.

...
nc.QueueSubscribe("foo", "job worker", func(msg *nats.Msg) {
        log.Printf("Received on [%s]: '%s'\n", msg.Subject, string(msg.Data))
    })
...

Performance question

I'm working on some benchmarks for various messaging systems by measuring throughput. I've run into a strange problem where messages just stop being received after a certain point with a single publisher and consumer. For example, if I send 2000 messages consecutively, I don't receive any messages after 805.

Am I hitting a max queue size? 805 doesnt seem like very many messages, especially when they're only 1kb. If it helps, the code I'm using to test this is here: https://github.com/tylertreat/mq-benchmarking/blob/gnatsd/benchmark/mq/gnatsd.go

How to stop an embedded server

Once gnatsd is embedded and running in a different routine. How do you stop the server so that the program can exit? I saw a previous post on how to run a gnatsd server:

#64

panic when connecting using ruby nats

the dea_ng nats client can crash gnatsd, only when it is run with cluster configuration specified:

[...]
["Slow Consumer Detected", [127.0.0.1, 57637], 2]
panic: runtime error: slice bounds out of range

goroutine 10 [running]:
github.com/apcera/gnatsd/server.(*client).parse(0xc200134800, 0xc200145000, 0x8000, 0x8000, 0x8000, ...)
        /s/go/src/github.com/apcera/gnatsd/server/parser.go:446 +0x264f
github.com/apcera/gnatsd/server.(*client).readLoop(0xc200134800)
        /s/go/src/github.com/apcera/gnatsd/server/client.go:144 +0x15c
created by github.com/apcera/gnatsd/server.(*client).initClient
        /s/go/src/github.com/apcera/gnatsd/server/client.go:122 +0x18b

i'll see if there is a direct repo for this.

Minor Suggestion: cacheline padding

type stats struct {
inserts uint64
removes uint64
matches uint64
cacheHits uint64
since time.Time
}

For your consideration: cacheline padding. Not tested.

ex:

type stats struct {
cachepad1 [8]int64
inserts uint64
cachepad2 [7]int64
removes uint64
cachepad3 [7]int64
matches uint64
cachepad4 [7]int64
cacheHits uint64
cachepad5 [7]int64
since time.Time
cachepad6 int32
cachepad7 [5]int64
}

for any high reference counters which might be shared between thread/procs. Improves performance.

yum

Is it possible to embed gnatsd into my application?

I'm working on distributed database and want to use NATS as a messaging middleware. The problem is that gnatsd is a separate deployment step. Client needs to deploy gnatsd first and only after that he or she will be able to deploy my system. I want to embed it into my application so client can only deploy once.

Another point is that gnats endpoints list will be managed using distributed configuration service (consul) so I will need to update gnatsd config at runtime. I have no idea how to do this if gnatsd will work in separate process.

Increasing memory usage with clustered NATS servers

The Cloud Foundry Runtime team has been using gnatsd in clustered mode again in cf-release, and we've noticed increased memory usage on some of the nodes in the cluster. Specifically, when we have two nodes in the cluster, we're seeing the memory usage on one of the nodes increase roughly linearly over time. We've seen this in at least three different environments running this two-node configuration. In our largest environment, this usage has gone from a baseline of 200M to nearly 1G in about 2 days.

We'd of course be happy to coordinate with you on exploring and resolving this issue, especially if we can provide more extensive testing of different clustering configurations, or more information in general.

Also, we did notice that server/server.go includes the net/http/pprof package for profiling support, but it doesn't seem possible to turn it on when using gnatsd with a config file: the config file parser always leaves the options struct's ProfPort set to 0, and MergeOptions won't let the CLI flags override that. Would you be receptive to a pull request making that jointly configurable?

Thanks very much,
@ematpl and the CF Runtime team

Support durable messages for disaster recovery

Most message queues have durable messages for a good reason. If something happens to a machine you can turn one of the replicated nodes into a primary node (the process is known as failover and is implemented in gnatsd if I understand correctly).
I suggest in a cluster a node can have to modes: primary and slave.
slaves are write only, they dedicate all their resources to writing as many messages to disk as possible and clients are unable to connect to them.
This design decision allows the node to dedicate it's CPU and I/O resources completely to writing messages to the disk.
The primary nodes are not durable and cannot be turned to be durable.
This design decision allows the node to dedicate all of it's resources to receiving and sending messages.

When a primary node fails one of the slave nodes that replicates this primary node is turned to a primary node.
This means that messages will not be durable anymore which is desired for reasons stated above.
When the primary node comes back to live it becomes the new slave of the old slave node.

The election of the primary node can be done using a consensus algorithm like Raft.

If a cluster does not have any slave nodes a big red warning should appear which can be disabled using a configuration option.

I hope I did not miss anything.

Dockerfile not works

The current Dockerfile fails when you try to run it because the docker bin not is available at the google/golang:1.3

core@front ~/gnatsd/docker $ docker build -t scratch .
...

core@front ~/gnatsd/docker $ docker run -t -i scratch
/bin/sh: 1: docker: not found

Why make this complex nested Dockerfile? I made my own as follows:

FROM google/golang:1.3

MAINTAINER Derek Collison <[email protected]>

RUN CGO_ENABLED=0 go get -a -ldflags '-s' github.com/apcera/gnatsd

ENTRYPOINT ["gnatsd", "-p", "4222", "-m", "8333"]

And works like a charm

gnatsd crashes on Windows i386 (and probably on any Intel 32bit platform)

The crash happens as soon as a client connects, and it's a null pointer exception on the atomic add:

// Lock should be held
func (c *client) initClient() {
s := c.srv
c.cid = atomic.AddUint64(&s.gcid, 1) //////// crash here

There are no null pointers but AddUint64() crashes if the pointer is not 64bit aligned ("fix" added on March 12: https://code.google.com/p/go/source/detail?r=09cc9661f4ee29dca7da92ae8916cefded775bb5&path=/src/pkg/sync/atomic/asm_386.s)

Suggestion: align the pointer or make s.gcid (and c.cid) int32 ?

publishes "foo.bar" messages to "foo" subscribers (but not first subscriber)

to reproduce:

  1. start gnatsd locally: gnatsd -p 4223
  2. publish a message: nats-pub "foo.bar" -s nats://nats:nats@localhost:4223
  3. start subscriber: nats-sub "foo" -s nats://nats:nats@localhost:4223
  4. publish the message again: nats-pub "foo.bar" -s nats://nats:nats@localhost:4223

expected: the message in step 4 does not show up

actual: the message in step 4 shows up

interesting note: if you exit the subscriber, then restart it, it will no longer receive these messages.

We see this with origin/master ddb54bb

Windows - runtime error - invalid memory pointer

windows 7, 64-bit, go v1.42, current and 1.4 branch of gnats fails with following:

C:\Users\jmatthew\Development\gnatsd>gnatsd.exe
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x30 pc=0x4021cc]

goroutine 1 [running]:
main.configureLogger(0xc0820256c0, 0xc08207a3c0)
        C:/Users/jmatthew/Development/gnatsd/gnatsd.go:139 +0x23c
main.main()
        C:/Users/jmatthew/Development/gnatsd/gnatsd.go:103 +0xcc3

goroutine 7 [runnable]:
github.com/apcera/gnatsd/server.func┬╖007()
        C:/Go/src/github.com/apcera/gnatsd/server/server.go:148
created by github.com/apcera/gnatsd/server.(*Server).handleSignals
        C:/Go/src/github.com/apcera/gnatsd/server/server.go:155 +0x163

goroutine 6 [syscall]:
os/signal.loop()
        c:/go/src/os/signal/signal_unix.go:21 +0x26
created by os/signal.init┬╖1
        c:/go/src/os/signal/signal_unix.go:27 +0x3c

C:\Users\jmatthew\Development\gnatsd>

Information about windows, go and git branch

C:\Users\jmatthew\Development\gnatsd>go version
go version go1.4.2 windows/amd64

C:\Users\jmatthew\Development\gnatsd>git branch
* 1.4
  master

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.