Code Monkey home page Code Monkey logo

sd-agent's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sd-agent's Issues

Uncaught exception in getCPUStats

2013-01-10 11:23:43,596 - main - DEBUG - getCPUStats: start
2013-01-10 11:23:43,596 - main - DEBUG - getCPUStats: linux2
2013-01-10 11:23:43,602 - main - DEBUG - Process already terminated
2013-01-10 11:23:43,603 - main - ERROR - getCPUStats: exception = Traceback (most recent call last):
  File "/home/ubuntu/sd-agent/checks.py", line 407, in getCPUStats
    proc.kill()
UnboundLocalError: local variable 'proc' referenced before assignment

sd-agent should have tests

It took us multiple days to create an upstart script for sd-agent, due to obscure logic in the argument parsing. I would have written tests, but the current code isn't suitable for unit testing.

Refactoring should give:

  • Module __main__ code converted to functions
  • Large functions converted to small functions (single responsibility principle)

Postback connections not being closed

Customer has reported in https://serverdensity.uservoice.com/admin/tickets/501 that they are seeing packets sent from our load balancer to their servers which shouldn't be being seen.

Jul 9 21:00:10 log01 kernel: [1210070.740609] IPTABLES_REJECT IN=eth0 OUT= 
MAC=40:40:80:9e:41:8d:64:00:f1:cd:1f:7f:08:00 SRC=208.43.117.99 
DST=IP LEN=40 TOS=0x00 PREC=0x00 TTL=246 ID=25142 DF PROTO=TCP 
SPT=443 DPT=50606 WINDOW=7 RES=0x00 RST URGP=0 
Jul 9 21:01:13 log01 kernel: [1210133.001049] IPTABLES_REJECT IN=eth0 OUT= 
MAC=40:40:80:9e:41:8d:64:00:f1:cd:1f:7f:08:00 SRC=208.43.117.99 
DST=IP LEN=40 TOS=0x00 PREC=0x00 TTL=245 ID=16712 DF PROTO=TCP 
SPT=443 DPT=50607 WINDOW=7 RES=0x00 RST URGP=0 
Jul 9 21:02:16 log01 kernel: [1210195.863136] IPTABLES_REJECT IN=eth0 OUT= 
MAC=40:40:80:9e:41:8d:64:00:f1:cd:1f:7f:08:00 SRC=208.43.117.99 
DST=IP LEN=40 TOS=0x00 PREC=0x00 TTL=246 ID=11370 DF PROTO=TCP 
SPT=443 DPT=50608 WINDOW=7 RES=0x00 RST URGP=0

These appear to be coming from our load balancer - they're small packets sourced from port 443, which indicates it's part of the SSL handshake.

Having spoken to the load balancer vendor, they have reported:

Our vendor has reviewed the load balancer and observed the traffic generated to the clients IP. The traffic being dropped is due > to RST packets being sent to the client attempting to close the session. The client is not closing the session after the LB sends a > FIN and it goes idle after 1 minute at which point the LB begins to send the RST packets seen. You will have to review with the
client why the sessions are not closing after the LB sends a FIN and it continues to send ACK's.

The packets match the servers in the customer's account and link up with agent postbacks so we need to see why the agent isn't closing connections.

Error with HTTP API request

I have an error with the sd-agent (2.1.5). I have updated it to 2.1.6 but still complaining. The status page of serverdensity does not report any problem. After 12 hours with the issues, I think could be problem related with the agent

Nov 12 19:28:28 node105 sd.forwarder[33295]: ERROR (sdagent.py:274): Response: HTTPResponse(_body=None,buffer=None,code=599,effective_url='https://cobrowser.agent.serverdensity.io/intake/?agent_key=REMOVED',error=HTTPError('HTTP 599: Timeout',),headers={},reason='Unknown',request=<tornado.httpclient.HTTPRequest object at 0x7f1aec694e10>,request_time=20.00111699104309,time_info={})

Can you help me out?

Uncaught exception in getIOStats

2013-01-10 11:23:43,587 - main - DEBUG - getIOStats: start
2013-01-10 11:23:43,587 - main - DEBUG - getIOStats: linux2
2013-01-10 11:23:43,595 - main - ERROR - getIOStats: exception = Traceback (most recent call last):
  File "/home/ubuntu/sd-agent/checks.py", line 529, in getIOStats
    proc = subprocess.Popen(['iostat', '-d', '1', '2', '-x', '-k'], stdout=subprocess.PIPE, close_fds=True)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

DNS Interation code causes an agent hang if Connection Refused

2013-03-09 02:00:46,830 - main - INFO - doPostBack: attempting postback:
2013-03-09 02:00:46,838 - main - ERROR - doPostBack: URLError = <urlopen error [Errno 111] Connection refused>
2013-03-09 02:00:46,839 - main - INFO - doPostBack: Retrying postback with DNS lookup iteration

Agent stops.

Location of Log File when Running Agent Manually

Mac and Raspberry Pi require the agent to be run with "python agent.py start". This puts the logs in /tmp/, this can be a small partition that can fill quickly.

As a work around you can create a symlink from /tmp/sd-agent.log to /dev/null.

"IndexError: list index out of range" on Mac OS Yosemite 10.10.2 (14C78c)

CTFs-MacBook-Pro:sd-agent cesar$ python agent.py start
Starting...
Started
CTFs-MacBook-Pro:sd-agent cesar$
Traceback (most recent call last):
File "agent.py", line 406, in
daemon.start()
File "/Users/cesar/Server Density/sd-agent/daemon.py", line 125, in start
self.run()
File "agent.py", line 338, in run
c.doChecks(s, True, systemStats) # start immediately (case 28315)
File "/Users/cesar/Server Density/sd-agent/checks.py", line 2624, in doChecks
memory = self.getMemoryUsage()
File "/Users/cesar/Server Density/sd-agent/checks.py", line 1181, in getMemoryUsage
data = {'physUsed': physParts[0], 'physFree': physParts[2], 'swapUsed': swapParts[1], 'swapFree': swapParts[2], 'cached': 'NULL'}
IndexError: list index out of range

Cannot disable log_to_syslog

Hello.

I've found that it's impossible for now to disable log forwarding to syslog. Even with log_to_syslog: no I still get log from sd-agent to /var/log/syslog.

I tested this on Ubuntu 18.04, but I think it's related to any Linux with systemd.

After a little research, I found that this happens because in Systemd StandardOutput goes by default to journald, which mirror to syslog. And because of that, I have sd-agent logs both in /var/log/sd-agent/* and in /var/log/syslog.
https://www.freedesktop.org/software/systemd/man/systemd.exec.html

I propose to add to all systemd units sd-agent.service and sd-agent-forwarder.service these two lines to suppress log mirroring. And maybe for other '.service' files too.

[Service]
...
...
StandardOutput = null
StandardError = null

init.d script needs some love

At present the init.d script has some weird things going on, such as start restarting, restart attempting to stop the service twice, and no exposed status subcommand (yet there is one in the agent code base).

It'd be great to give this an update.

EDIT

In addition it would be excellent if the status command responded with correct error codes. At present it always exits with code 0, when in reality it should probably conform to these:

code meaning
0 program is running or service is OK
1 program is dead and /var/run pid file exists
2 program is dead and /var/lock lock file exists
3 program is not running
4 program or service status is unknown

Agent crash on time change on CentOS

A customer reported in https://serverdensity.uservoice.com/admin/tickets/602 that their agent crashed and had to be restarted on a time change on CentOS 5.8 when using ntpdate to retrieve the NTP latest time and set it.

Ticket contains log showing agent apparently stopping after the time change, with several stop/start cycles. Customer reported they only issued the restart command once.

The time went from 2012-07-17 15:20:49,681 to 2012-07-17 15:17:56,803 (backwards) so this may have broken sched

Postback response: Invalid hash or HTTP Error 500

on aws-prod-apac-singapore-exm-a1 the agent is logging several:
INFO - Postback response: Invalid hash
ERROR - doPostBack: URLError =
causing the OK to happen very infrequently:
2012-10-17 06:38:21,677 - main - INFO - Postback response: OK
2012-10-17 06:54:27,131 - main - INFO - Postback response: OK
2012-10-17 07:03:29,697 - main - INFO - Postback response: OK
2012-10-17 07:05:59,815 - main - INFO - Postback response: OK
2012-10-17 07:08:31,264 - main - INFO - Postback response: OK
2012-10-17 07:09:46,696 - main - INFO - Postback response: OK
2012-10-17 07:14:52,905 - main - INFO - Postback response: OK
2012-10-17 07:23:52,995 - main - INFO - Postback response: OK
2012-10-17 07:26:22,670 - main - INFO - Postback response: OK
2012-10-17 07:28:55,169 - main - INFO - Postback response: OK
2012-10-17 07:30:02,186 - main - INFO - Postback response: OK
2012-10-17 07:32:40,963 - main - INFO - Postback response: OK
2012-10-17 07:35:38,577 - main - INFO - Postback response: OK
2012-10-17 07:43:05,483 - main - INFO - Postback response: OK

looking at rax-uk-lon-exm-a1 the problem occurs much less frequently:
2012-10-17 07:27:40,094 - main - ERROR - doPostBack: HTTPError = HTTP Error 500: Internal Server Error

2012-10-17 06:37:37,922 - main - INFO - Postback response: Invalid hash
2012-10-17 06:38:54,186 - main - INFO - Postback response: Invalid hash
2012-10-17 06:53:53,426 - main - INFO - Postback response: Invalid hash
2012-10-17 07:08:32,205 - main - INFO - Postback response: Invalid hash
2012-10-17 07:10:50,174 - main - INFO - Postback response: Invalid hash
2012-10-17 07:14:28,047 - main - INFO - Postback response: Invalid hash
2012-10-17 07:18:00,284 - main - INFO - Postback response: Invalid hash
2012-10-17 07:20:31,738 - main - INFO - Postback response: Invalid hash
2012-10-17 07:24:06,037 - main - INFO - Postback response: Invalid hash
2012-10-17 07:26:25,935 - main - INFO - Postback response: Invalid hash
2012-10-17 07:45:01,988 - main - INFO - Postback response: Invalid hash

This may be related to:
https://serverdensity.uservoice.com/admin/tickets/1595

Could it be some network/connectivity issue?

... probably this is not an agent problem, but currently is the only place where it's visible. Let's start here to track it down.

Make agent ignore helper files in plugins dir

sometimes there is a need to place some helper python files right next to the plugins but sd agent fires exception like

2014-11-17 07:35:49,229 - main - ERROR - getPlugins (base_plugin): exception = Traceback (most recent call last):
  File "/usr/bin/sd-agent/checks.py", line 2406, in getPlugins
    pluginClass = getattr(importedPlugin, pluginName)
AttributeError: 'module' object has no attribute 'base_plugin'

In current case base_plugin.py contains parent class for all plugins we are using

Would be great if there will be some way to handle that case nicely.

16% of system memory is taken by the agent

Are there any sort of optimizations I can do to limit the memory usage of the agent? I am on a 1GB VM and 16% of the memory is consistently being used by the sd-agent.

Broken Updater on OSX

I noticed that 1.11.4 for OS X was released and went to update my 1.11.3 install on my one OS X box I'm monitoring. When I ran /usr/bin/python /usr/local/sd-agent/agent.py update I was given the following output:

Checking if there is a new version
A new version is available.
Downloading agent.py
Downloading checks.py
Downloading daemon.py
Downloading LICENSE
Downloading LICENSE-minjson
Downloading minjson.py
Downloading plugins.py
Downloading sd-agent.init
Updating agent.py
Updating checks.py
Updating daemon.py
Updating LICENSE
Updating LICENSE-minjson
Updating minjson.py
Updating plugins.py
Updating sd-agent.init
Update completed. Please restart the agent (python agent.py restart).

After this I restarted the agent to discover that it had not actually updated. Digging a little further, I found that the files were all placed in my working directory instead of where the tool was. So I cleaned those up, cd'd into the /usr/local/sd-agent/ directory and re-ran the update command. It updated successfully. As I was logged in as root, I don't think permissions could have been the issue.

It would be great if the OS X version of the daemon would base where it locates the updated files based on where the tool is located from vs where it is being called from.

Minor bug for sure, but something I'll have to remember when updating until fixed. :)

Support virtualenv

The more plugins box has and more code gets into generating alerts it becomes hard to maintain dependencies, especially when they are in conflict with globally installed libraries (like lxml or snmp)

Assuming that agent is running as a python process it could do it in own environment or at least have ability to do so.

Maybe you could consider that a future releases feature.

DeprecationWarning when starting sd-agent-2.1.0-1.el6.x86_64

Hi,

We have updated our 2.0.5 agent to 2.1.0, and see the following notice below:

/etc/init.d/sd-agent restart
Stopping Server Density Agent (stopping supervisord) sd-agent
/usr/share/python/sd-agent/lib/python2.6/site-packages/sd_agent-2.1.0-py2.6.egg/utils/dockerutil.py:67: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  log.debug(ex.message)
Starting Server Density Agent (using supervisord) sd-agent

We have seen this on CentOS release 6.8 (Final). It looks like nothing is broken, but it would be nice that this message is omitted in a next release.

Mongo 2.2.0 reporting not working

MongoDB reporting not working for me.

From /var/log/sd-agent/sd-agent.log

main - ERROR - getMongoDBStatus: globalLock KeyError exception - 'ratio'

There's no such ratio key in globalLock.

> db.serverStatus()

       "globalLock" : {
                "totalTime" : NumberLong("448684019000"),
                "lockTime" : NumberLong("4085636965"),
                "currentQueue" : {
                        "total" : 0,
                        "readers" : 0,
                        "writers" : 0
                },
                "activeClients" : {
                        "total" : 0,
                        "readers" : 0,
                        "writers" : 0
                }
        }, 

Here's my config, the connection works OK.

mongodb_server: mongodb://admin:******@localhost
mongodb_dbstats: yes
mongodb_replset: no

Agent crash on postback timeout

Agent appears to be crashing when the postback times out, e.g.

2012-07-20 07:11:48,614 - main - ERROR - doPostBack: URLError =

Should be dealing with the issue instead of crashing out.

Fails to authenticate fetching stats for RabbitMQ 3.0.1

2013-01-10 14:01:08,439 - main - DEBUG - getRabbitMQStatus: start
2013-01-10 14:01:08,440 - main - DEBUG - getRabbitMQStatus: config set
2013-01-10 14:01:08,440 - main - DEBUG - getRabbitMQStatus: attempting authentication setup
2013-01-10 14:01:08,440 - main - DEBUG - getRabbitMQStatus: attempting urlopen
2013-01-10 14:01:08,442 - main - ERROR - Unable to get RabbitMQ status - HTTPError = HTTP Error 401: Unauthorized

Isolating the code and running from command line:

>>> request = urllib2.urlopen(req)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 438, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 625, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Unauthorized

Please add systemd unit files

Hello there!

We use sd-agent for one of our customers and plan to offer it as a BYOL integration for others very soon. We recently switched to Ubuntu 16.04 LTS. Since systemd is included for this version of Ubuntu, we do not use supervisor. The sd-agent package includes a supervisor config, but not a systemd unit file. I wanted to reach out to request that this be added to the official packages for Ubuntu 16.04 LTS. I have the basic systemd unit files already made. They are two separate .service files instead of the combined format you have for supervisor, but it seems to work.

Since I wasn't sure how to organize this into your repo, I will simply provide the code snippets here:

/etc/systemd/system/serverdensity-collector.service

[Unit]
Description=serverdensity-collector.service
After=networking.service

[Service]
Type=simple
ExecStart=/usr/share/python/sd-agent/bin/python /usr/share/python/sd-agent/agent.py foreground --use-local-forwarder
User=sd-agent
ExecStop=/bin/kill -s QUIT $MAINPID
PIDFile=/run/sd-agent/sd-agent.pid
Restart=always
RestartSec=5
Environment="PYTHONPATH=/usr/share/python/sd-agent,LANG=POSIX"


[Install]
WantedBy=multi-user.target

/etc/systemd/system/serverdensity-forwarder.service

[Unit]
Description=serverdensity-collector.service
After=networking.service

[Service]
Type=simple
ExecStart=/usr/share/python/sd-agent/bin/python /usr/share/python/sd-agent/sdagent.py
User=sd-agent
ExecStop=/bin/kill -s QUIT $MAINPID
PIDFile=/run/sd-agent/sd-agent-forwarder.pid
Restart=always
RestartSec=5
Environment="PYTHONPATH='/usr/share/python/sd-agent:$PYTHONPATH'"


[Install]
WantedBy=multi-user.target

After these units are installed, sudo systemctl daemon-reload must be executed.

I will leave it to you for final implementation and integration decisions, thank you for having this project on Github as it made it very easy for us to send you this info!

Thanks,

Arman.

APT package

The sd-agent package does not appear to respect /usr/sbin/policy-rc.d, possibly as a result of not using invoke-rc.d.

As this is a packaging issue I appreciate this may not be the best place for it, but wasn't sure where was.

IndexError when running agent on Open Elec on a Pi

2015-09-24 22:33:56,164 - main - ERROR - getIOStats: exception = Traceback (most recent call last):
  File "/storage/sd-agent/checks.py", line 576, in getIOStats
    recentStats = stats.split('Device:')[2].split('\n')
IndexError: list index out of range

OSX Yosemite Agent Issue

The agent is reporting properly and appears to be functioning but I'm getting this error about once ever 60 secs:

sar: drivepath sync code error -4

uname -a

Darwin tunnels-Mac-mini.local 14.0.0 Darwin Kernel Version 14.0.0: Fri Sep 19 00:26:44 PDT 2014; root:xnu-2782.1.97~2/RELEASE_X86_64 x86_64

All links in the README file are dead (404)!

It seems that all the links contained in the README file are dead (404)!

screenshot from 2018-08-27 14-45-33

For instance:

This is the source code for the Server Density agent (v2). If you're looking to install the agent, we also provide pre-packaged binaries for most operating systems.
https://support.serverdensity.com/hc/en-us/articles/213625957-Officially-supported-Linux-distros

See agent release notes.
https://support.serverdensity.com/hc/en-us/articles/213513688-Agent-release-notes

C.

Source does not conform to PEP-8

Hi!

Your code is not legible to customers because it does not adhere to latest Python standards. See below for the output of a PEP8 run.

Would you be interested in a pull request that fixes all this? It is only useful if you start using PEP8 as a guard in your builds yourself ๐Ÿ˜„

Why fix this?

  1. Customer can review your code more easily.
  2. Bugs are easier to spot if you adhere to standards.
  3. You'll get more pull requests if you make it easier to contribute.

For example: If I try to contribute now using my standard toolchain (PyCharm), I have to be careful not to use spaces (as recommended in PEP8) instead of tabs, because otherwise Python fails.

Allard

4 E101 indentation contains mixed spaces and tabs
3 E111 indentation is not a multiple of four
5 E122 continuation line missing indentation or outdented
1 E125 continuation line with same indent as next logical line
2 E126 continuation line over-indented for hanging indent
1 E128 continuation line under-indented for visual indent
1 E201 whitespace after '{'
2 E202 whitespace before '}'
40 E203 whitespace before ':'
49 E231 missing whitespace after ','
24 E251 unexpected spaces around keyword / parameter equals
48 E261 at least two spaces before inline comment
29 E265 block comment should start with '# '
23 E302 expected 2 blank lines, found 1
3 E303 too many blank lines (2)
1 E401 multiple imports on one line
362 E501 line too long (87 > 79 characters)
1 E701 multiple statements on one line (colon)
5 E703 statement ends with a semicolon
25 E711 comparison to None should be 'if cond is None:'
25 E712 comparison to False should be 'if cond is False:' or 'if not cond:'
1 E713 test for membership should be 'not in'
2319 W191 indentation contains tabs
25 W291 trailing whitespace
1 W292 no newline at end of file
81 W293 blank line contains whitespace
1 W391 blank line at end of file
8 W602 deprecated form of raising exception

Incompatibility with Ubuntu 20.04

Hello,

The current package for Ubuntu 20.04 is still relying on Python 2 which is officially deprecated and I'm getting weird errors when trying to update my servers:

dpkg: error processing package python2-minimal (--configure):
 installed python2-minimal package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 python2-minimal
E: Sub-process /usr/bin/dpkg returned an error code (1)

Are you going to provide your customers with a Python 3 compliant package?

Thanks

Uncaught error if network goes away

2012-10-20 06:57:28,609 - main - ERROR - doPostBack: Exception = Traceback (most recent call last):
  File "/path/to/src/checks.py", line 2291, in doPostBack
    response.close()
UnboundLocalError: local variable 'response' referenced before assignment

GPG key has expired

I'm seeing this error when I try to update my apt repositories:

# apt-get update | grep -i serverdensity
Get:18 https://archive.serverdensity.com/ubuntu all InRelease [3.997 B]
Err:18 https://archive.serverdensity.com/ubuntu all InRelease
  The following signatures were invalid: EXPKEYSIG 3B2F6FF074371316 Server Density <[email protected]>
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://archive.serverdensity.com/ubuntu all InRelease: The following signatures were invalid: EXPKEYSIG 3B2F6FF074371316 Server Density <[email protected]>
W: Failed to fetch https://archive.serverdensity.com/ubuntu/dists/all/InRelease  The following signatures were invalid: EXPKEYSIG 3B2F6FF074371316 Server Density <[email protected]>
W: Some index files failed to download. They have been ignored, or old ones used instead.

I also tried to reinstall the keys with curl -Ls https://archive.serverdensity.com/sd-packaging-public.key | sudo apt-key add - but nothing changed.

Can you please fix it?

Server DNS failure, agent didn't fall back to secondary DNS

The primary DNS server which all of servers use failed completely (i.e ;; Got SERVFAIL reply from
216.187.XX.XX, trying next server) .

All other software and services
switched onto using the secondary server (as defined in /etc/resolv.conf)
however sd-agent continues to only reference the failed primary DNS server,
even restarting sd-agent doesn't fix it. The only way to make sd-agent work
again is to transpose the primary and secondary lines within
/etc/resolv.conf in all of my servers just to make sd-agent report back
again.

Reported on RHEL6, python 2.6.6

Error when logging running in foreground mode

python sd-agent/agent.py foreground

Traceback (most recent call last):
File "/usr/lib/python2.7/logging/handlers.py", line 77, in emit
if self.shouldRollover(record):
File "/usr/lib/python2.7/logging/handlers.py", line 156, in shouldRollover
msg = "%s\n" % self.format(record)
File "/usr/lib/python2.7/logging/init.py", line 723, in format
return fmt.format(record)
File "/usr/lib/python2.7/logging/init.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib/python2.7/logging/init.py", line 328, in getMessage
msg = msg % self.args

sd-agent crashes when memory is full

..which is kind of understandable, however it would be nice if the init script would restart it after some time.

Better yet, sd-agent should keep on reporting what it can, and not crash (because of fork() errors).

init.d script missing some LSB options

Init script is missing options to make it LSB complete.

Running

/usr/sbin/update-rc.d sd-agent defaults

on the script from the github checkout gives the following output:

update-alternatives: using /usr/bin/sar.sysstat to provide /usr/bin/sar (sar) in auto mode
insserv: Script sd-agent is broken: incomplete LSB comment.
insserv: missing `Required-Start:' entry: please add even if empty.
insserv: missing `Required-Stop:'  entry: please add even if empty.
insserv: missing `Default-Start:'  entry: please add even if empty.
insserv: missing `Default-Stop:'   entry: please add even if empty.
insserv: Default-Start undefined, assuming empty start runlevel(s) for script `sd-agent'
insserv: Default-Stop  undefined, assuming empty stop  runlevel(s) for script `sd-agent'

Add missing options to stop this message appearing.

RabbitMQ Credentials

The agent config has example RabbitMQ credentials by default. If these are removed by a user who does not wish to monitor RabbitMQ the agent throws an error. The agent needs to be capable of handling default and blank credentials.

sd-agent-elastic timeout should be configurable

Hi,

We are testing with the sd-agent-elastic plugin. Most things work like expected. However, when you have a big cluster, with some latency, the default timeout of 5 seconds could be problematic. See below for exact example:

https://github.com/serverdensity/sd-agent/blob/master/checks.d/elastic.py#L44

Is it possible to make this adjustable in the configuration of elastic.yaml? So that there is a timeout option you can alter from the default of 5 seconds?

Better default install location

By default, it seems the agent and related files are installed into a directory at /usr/bin/sd-agent. Typically, *nix platforms do not have subdirectories under /usr/bin because bin directories usually only contain executables. The Filesystem Hierarchy Standard actually insists on it.

A better location might be /opt/sd-agent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.