Code Monkey home page Code Monkey logo

Comments (11)

shinji62 avatar shinji62 commented on July 30, 2024

Hi,

Which version are you using ?
30 000/min is not that big 500 /sec if you have 6 nozzle that's mean around 80 logs by nozzle.
I am not sure if the loggregator send in the Round Robin way to all of 6.

Btw when this happen what cf app youapp give you ? We use a small on disk database so maybe this DB is full..

Even if you capture only "LogMessage" the nozzle receive all traffic, because the filtering is done on the nozzle level.

from firehose-to-syslog.

ahahtyler avatar ahahtyler commented on July 30, 2024

I'm using version 1.3.1

Looking at the "Log Event Totals" messages, the incoming logs seem to be evenly distributed among the instances. Whenever there is an increase in log volume, all of the events processed by each instance increase, similarly when the volume decreases so do the events processed across all the instances.

While examining the logs it looks like the issue is related to the instance not responding to a ping in time.

Tue Apr 05 2016 08:58:53 GMT-0400 (Eastern Standard Time) [APP] ERR [2016-04-05 12:58:53.564120254 +0000 UTC] Exception occurred! Message: Firehose Error! Details: websocket: close 1008 Client did not respond to ping before keep-alive timeout expired.

Tue Apr 05 2016 08:58:53 GMT-0400 (Eastern Standard Time) [APP] OUT We have processed 0 events from the firehose at wss://doppler.domain.com:443 over the last 9.9 seconds and 77575 total events since startup

#0 running 2016-04-05 08:33:09 AM 5.3% 23.6M of 512M 14.8M of 512M
#1 running 2016-04-05 08:33:09 AM 0.0% 25.8M of 512M 14.8M of 512M
#2 running 2016-04-05 08:33:09 AM 5.0% 21.6M of 512M 14.8M of 512M
#3 running 2016-04-05 08:33:09 AM 14.5% 21.6M of 512M 14.8M of 512M
#4 running 2016-04-05 08:33:09 AM 10.7% 23.6M of 512M 14.8M of 512M
#5 running 2016-04-05 08:33:09 AM 5.0% 19.6M of 512M 14.8M of 512M

After the instances fails to respond to ping before the keep-alive timeout, it's a hit or miss weather it will come back. From the error above, instance 1 started processing logs again within 10 - 15 minutes. However, I've witnessed it where all of the instances went down and stopped processes logs for 24+ hours.

Is there a way to increase the keep-alive timeout?

from firehose-to-syslog.

shinji62 avatar shinji62 commented on July 30, 2024

Thanks.
Seems the noaa lib have been updated, so I will updated firehose-to-syslog too.
How many loggregator / doppler to you have ?

By curiousity do you have any network equipment preventing the keep-alive (firewall)?

from firehose-to-syslog.

ahahtyler avatar ahahtyler commented on July 30, 2024

In each of the cloud foundry foundations we have 2 Doppler and 2 Loggregator VMs (1 vCPU/ 2GB mem).

There shouldn't be any network equipment preventing the keep-alive. The firehose-to-syslog nozzle is running as an application. From the garden container it's running in to the traffic controller, there's nothing blocking it's path.

from firehose-to-syslog.

shinji62 avatar shinji62 commented on July 30, 2024

Ok, I will try to update the middleware.

from firehose-to-syslog.

shinji62 avatar shinji62 commented on July 30, 2024

@ahahtyler Please compile from this branch feature/update_nooa_keepalive using go1.6

$godep go build

There is a new options

  --fh-keep-alive=25s            Keep Alive duration for the firehose consumer

Let me know

from firehose-to-syslog.

ahahtyler avatar ahahtyler commented on July 30, 2024

Built and running. Will let it run over the weekend and see if it still times out.

from firehose-to-syslog.

shinji62 avatar shinji62 commented on July 30, 2024

Still happening ???

from firehose-to-syslog.

ahahtyler avatar ahahtyler commented on July 30, 2024

Sorry for the late reply. Extending the keep alive seems to have done the trick. Thanks for the help!

from firehose-to-syslog.

shinji62 avatar shinji62 commented on July 30, 2024

Ok, I will cut off new release this week.

Thanks

from firehose-to-syslog.

chenziliang avatar chenziliang commented on July 30, 2024

@shinji62 I am not following why this is a keepalive setting in client since after digging into the source code of server side websocket implementation and client side websocket implementation, it appears the keepalive is initiated by server instead of client ? When server doesn't receive Pong response from client, it will report policy violation and timeout error.

--fh-keep-alive is used for websocket client reading packets timeout, it seems not related to keepalive to me.

https://github.com/cloudfoundry-attic/loggregatorlib/blob/master/server/websocket_keepalive.go

Thank you !

from firehose-to-syslog.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.