Code Monkey home page Code Monkey logo

pystatsd's Introduction

Introduction

pystatsd is a client and server implementation of Etsy's brilliant statsd server, a front end/proxy for the Graphite stats collection and graphing server.

pystatsd is tested on Python 2.7 and 3.8.

Status

Reviewing and merging pull requests, bringing stuff up to date, with tests!

lint_python

Usage

See statsd_test for sample usage:

from pystatsd import Client, Server

srvr = Server(debug=True)
srvr.serve()

sc = Client('example.org',8125)

sc.timing('python_test.time',500)
sc.increment('python_test.inc_int')   # or sc.incr()
sc.decrement('python_test.decr_int')  # or sc.decr()
sc.gauge('python_test.gauge', 42)

Building a Debian Package

To build a debian package, run dpkg-buildpackage -rfakeroot

Upstart init Script

Upstart is the daemon management system for Ubuntu.

A basic upstart script has been included for the pystatsd server. It's located under init/, and will be installed to /usr/share/doc if you build/install a .deb file. The upstart script should be copied to /etc/init/pystatsd.conf and will read configuration variables from /etc/default/pystatsd. By default the pystatsd daemon runs as user 'nobody' which is a good thing from a security perspective.

Troubleshooting

You can see the raw values received by pystatsd by packet sniffing:

$ sudo ngrep -qd any . udp dst port 8125

You can see the raw values dispatched to carbon by packet sniffing:

$ sudo ngrep -qd any stats tcp dst port 2003

pystatsd's People

Contributors

02strich avatar 0xdec0de avatar cclauss avatar directionless avatar fennb avatar heckj avatar jbuchbinder avatar jburnham avatar jfred avatar joeshaw avatar jtsoi avatar kastner avatar maralla avatar matterkkila avatar mlongtin0 avatar mrwacky42 avatar northisup avatar pneff avatar rjbs avatar robbyt avatar samuel avatar sivy avatar themartorana avatar vhermecz avatar vuksanv avatar vvuksan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pystatsd's Issues

sending "self" as an argument is causing some errors

Every place you're calling self.update_stats(...) you seem to be explicitly passing self, which causes an error in arg counts:
self.update_stats(self, stats, time, sample_rate)

should be:
self.update_stats(stats, time, sample_rate)

Add timeout to the graphite socket

Hi,

as far as I can tell, currently there is no timeout set on the socket which always comes back to bite you. I had issues (with the previous version) with the daemon process receiving metrics but not sending anything to graphite. My fix (which I have not extensively tested) was to set a half flush_interval timeout on the graphite socket.

Cheers
Dimo

occasional receive socket failures

We have had to continually monitor and restart pystatsd. After an aparently random time interval, pystatds will start sending zero for all values.

It appears that the server portion does not maintain a robust network connection. An strace showed that the failed pystatsd did not report any recvfrom calls. I am guessing that a socket handling exception was raised and unhandled.

How to combine pystatsd and inotify ?

Hi, I used Pystatsd to read log file Nginx and send metric ( http response) to StatsD and it work well. Now I divide this log file in many log files and send them to a directory every minitue, theys are named as : log1, log2, .... I want to read log from these files , and with read log files, I want to remove them. Does we can do this with StatsD ?

README.md is missing on pip package

Hi,
It seems that the README.md is missing on the last version of pip package.

user1@vm-538:~$ pip install pystatsd
Downloading/unpacking pystatsd
  Downloading pystatsd-0.1.8.tar.gz
  Running setup.py egg_info for package pystatsd
    Traceback (most recent call last):
      File "", line 14, in 
      File "/home/user1/build/pystatsd/setup.py", line 18, in 
        long_description=read('README.md'),
      File "/home/user1/build/pystatsd/setup.py", line 6, in read
        return open(os.path.join(os.path.dirname(__file__), fname)).read()
    IOError: [Errno 2] No such file or directory: '/home/user1/build/pystatsd/README.md'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
  File "", line 14, in 
  File "/home/user1/build/pystatsd/setup.py", line 18, in 
    long_description=read('README.md'),
  File "/home/user1/build/pystatsd/setup.py", line 6, in read
    return open(os.path.join(os.path.dirname(__file__), fname)).read()
IOError: [Errno 2] No such file or directory: '/home/user1/build/pystatsd/README.md'

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /home/user1/build/pystatsd
Storing complete log in /home/user1/.pip/pip.log
user1@vm-538:~$ ls -l /home/user1/build/pystatsd
total 28
drwxrwxr-x 2 user1 user1 4096 Jun 26 08:58 bin
drwxrwxr-x 2 user1 user1 4096 Jun 26 08:58 pip-egg-info
-rw-rw-r-- 1 user1 user1 2820 Jun 26 08:58 PKG-INFO
drwxrwxr-x 2 user1 user1 4096 Jun 26 08:58 pystatsd
drwxrwxr-x 2 user1 user1 4096 Jun 26 08:58 pystatsd.egg-info
-rw-rw-r-- 1 user1 user1   59 Jun 26 08:58 setup.cfg
-rw-rw-r-- 1 user1 user1  754 Jun 26 08:58 setup.py

Regards,
J

pystatd server doesn't process floating point values

Using Client.update_stats() to send in a floating point value over UDP -

i.e. client.update_stats("example.value", 0.04)

When the pystatd server receives this, it dies (Exception thrown) expecting an integer value only.

No license for code

The COPYING file is empty. The only license I could find was in the debian folder, but that only states the license of the files in the debian folder. So what license is this code under?

using counters causes py-statsd to crash

pystatsd has been working fine until today, when I started to send increments. If I sent one, everything was fine. If I sent a second, I'd get a type error. There was code trying to use an int as an iterator.

Only being able to count one item on the counter... not so useful! :)

Race condition between __record_timer() and flush()

Occasionally our pystatsd instance freezes, maybe in a similar way as in #40. Strace showed a KeyError in the following line:

See server.py:118

if key not in self.timers:
            self.timers[key] = [ [], ts ]
self.timers[key][0].append(float(value or 0))
self.timers[key][1] = ts  # <--- KeyError

It seems that the check ensures that the key is always present in self.timers. However, there's a second thread running, flushing the contents of self.timers periodically, and also deleting the contents of self.timers.

See server.py:228

del(self.timers[k])

When the flush happens right between the check in line 115 and the access in lines 117 and 118 then a KeyError will be raised. We think this is exactly what happens.

We haven't seen it happen in self.__record_counter, but it looks vulnerable in the same way.

We haven't had time yet to come up with a patch, but maybe something like this would work:

Use a threading.Lock() to synchronize access to self.timers and self.counters. The self.flush() involves network communication and can therefore take a long time. We don't want to block the receiving code during that time. Therefore maybe just use the lock to (deep-)copy self.timers and reset it. Then release the lock and flush from the copy. (Or not copy at all, but just switch references, i.e. assign self.timers to a second variable pointing to the same dictionary and then assign self.timers an empty dict and continue flushing from the new variable.)

Importing in sitecustomize.py breaks python command line

This is related to a setproctitle issue (which unfortunately is marked closed): dvarrazzo/py-setproctitle#49

Importing pystatsd imports the pystatsd.server module, which attempts to import setproctitle, if it is available.

As a consequence of the linked issue, importing pystats within a sitecustomize.py will cause python to fail to execute scripts with command line arguments (if setproctitle is available):

$ touch test.py
$ echo "import pystatsd" > sitecustomize.py
$ export PYTHONPATH=.:$PYTHONPATH
$ python test.py --some-arg
python test.py --some-arg: can't open file 'test.py --some-arg': [Errno 2] No such file or directory

Absent an upstream fix, it may be worth somehow adding an option to disable use of setproctitle, and/or lazily importing it to work around the issue.

Thanks!

Error while easy-installing pystatsd

There seems to be an issue with easy installing pystatsd using easy-install (distribute 0.6.45) . Error below

Writing /var/folders/jd/g7fjfbn57695542zy0j_84b40000gn/T/easy_install-osZU9C/pystatsd-0.1.7/setup.cfg
Running pystatsd-0.1.7/setup.py -q bdist_egg --dist-dir /var/folders/jd/g7fjfbn57695542zy0j_84b40000gn/T/easy_install-osZU9C/pystatsd-0.1.7/egg-dist-tmp-Ri9Icm
error: /var/folders/jd/g7fjfbn57695542zy0j_84b40000gn/T/easy_install-osZU9C/pystatsd-0.1.7/README.md: No such file or directory

Client.update_stats can send floats, but server cannot handle

When I send a float value to a server using a call like:
sc.update_stats('key', 300.0)

The client sends the message, but on the server side I see:

Traceback (most recent call last):
File "/usr/bin/pystatd-server", line 5, in
run_server()
File "/usr/lib/pymodules/python2.6/pystatsd/server.py", line 179, in run_server
daemon.run(options)
File "/usr/lib/pymodules/python2.6/pystatsd/server.py", line 153, in run
options.graphite_port)
File "/usr/lib/pymodules/python2.6/pystatsd/server.py", line 140, in serve
self.process(data)
File "/usr/lib/pymodules/python2.6/pystatsd/server.py", line 63, in process
self.counters[key] += int(fields[0] or 1) * (1 / sample_rate)
ValueError: invalid literal for int() with base 10: '300.0'

New Release

Hello,

I was wondering if we can get a new release of pystatsd that does not require argparse (as it is currently in master)

Perhaps conditionally including argparse for python <= 2.6 i.e.

extras_require={
':python_version == "2.6"': [
'argparse',
],

Thanks a lot.

Carlos

Failed to install from sdist

When distributing the version 0.1.7 using sdist it fails with the message something like 'File README.md not found'.

Cheers,
Ralph

Sample rate can > 1 ???

Hi all,
I have a variable count to handle the number of IP unique in a log files. So, I want to send it to py-stasd server using increment . Can I use : increment("ipunique", count) or not ? Because I don't understand well the sample rate, so I hope your advise. Thanks !

Set type is implemented incorrectly (it actually behaves like a counter)

I've been working at an organisation running pystatsd for a while. No one could explain to me we set() was the right way to implement absolute value counters (as opposed to rate counter) and every time I've read the original statsd docks I've been baffled by our implementation is so contradictory to them.

I even read the nodejs source and couldn't understand the behaviour I was actually seeing. I've just realised we //actually// use pystatsd even though it's always just referred to as 'statsd' and this implementation is totally different to the original on a couple of important counts (pun intended).

  1. Original definition of set is that it counts 'unique' things. That is, if you do:
statsd.set('uniques', 1234);
statsd.set('uniques', 1234);
statsd.set('uniques', 5678);

and nothing else in one time window, you should see 2 as the value recorded to graphite/backend. See https://github.com/etsy/statsd/blob/master/backends/graphite.js#L147 in original implementation - it takes set.values().length as the count which is https://github.com/etsy/statsd/blob/master/lib/set.js#L23-L29.

values: function() {
    var values = [];
    for (var value in this.store) {
      values.push(value);
    }
    return values;
  }

In other words for each //key// in the object, push it to values array, then the total number of these (distinct) keys is counted.

  1. Often you actually want an absolute count of something not just an averaged rate which statsd counters give. Your set implementation actually does that (incorrectly). Original statsd provides that ability by enabling option to flush_counts on counters: https://github.com/etsy/statsd/blob/master/backends/graphite.js#L111-L114

Overall, we end up using set('foo', 1) for all out "absolute" counters, but today I finally worked out why that works for us despite every bit of statsd related documentation describing sets totally differently.

Honestly I'm not even sure what to request at this point - we have tons of code now relying on this broken/inconsistent behaviour, and have never had any real need for the original version of set so changing it would actually be pretty bad for us.

But perhaps you could put something prominent in the README that explains this to the next poor soul who tries to make sense of it all...

Thanks for great project none the less!

Cannot recover from data parse failure

If a metric name contains invalid characters such as spaces or possibly certain punctuation characters, the py-statsd daemon reports an error [1] and thereafter continually sends a value of 0.0 for all known stats, indicating that it is no longer listening or accepting new metric data. Given the same input, the Node.js implementation writes a "Bad line" error [2] but continues running and processing.

[1]
Traceback (most recent call last):
File "/usr/bin/pystatd-server", line 5, in
pkg_resources.run_script('pystatsd==0.1.4', 'pystatd-server')
File "build/bdist.linux-i686/egg/pkg_resources.py", line 489, in run_script
if dist.key not in keys2:
File "build/bdist.linux-i686/egg/pkg_resources.py", line 1214, in run_script
# of multiple eggs; that's why we use module_path instead of .archive
File "/usr/lib/python2.6/site-packages/pystatsd-0.1.4-py2.6.egg/EGG-INFO/scripts/pystatd-server", line 5, in
pkg_resources.run_script('pystatsd==0.1.4', 'pystatd-server')
File "build/bdist.linux-x86_64/egg/pystatsd/server.py", line 179, in run_server
File "build/bdist.linux-x86_64/egg/pystatsd/server.py", line 153, in run
File "build/bdist.linux-x86_64/egg/pystatsd/server.py", line 140, in serve
File "build/bdist.linux-x86_64/egg/pystatsd/server.py", line 48, in process
ValueError: too many values to unpack

[2] https://github.com/etsy/statsd/blob/master/stats.js#L40

2 Aug 18:42:04 - Bad line: Coding.Code
2 Aug 18:42:04 - Bad line: cacheHits
2 Aug 18:42:04 - Bad line: Coding.Code
2 Aug 18:42:04 - Bad line: fetchOperations
2 Aug 18:42:09 - Bad line: system.code.gladiator.GladiatorCodeStore$CodeCacheLoader#load(Object)

ipv6 is not supported

mostly bind does the right thing if you set AF_INET6 as the family type - maybe try that as a fallback?

diff --git a/pystatsd/server.py b/pystatsd/server.py
index 40118c8..0bb9f9a 100644
--- a/pystatsd/server.py
+++ b/pystatsd/server.py
@@ -300,8 +300,23 @@ class Server(object):
     def serve(self, hostname='', port=8125):
         assert type(port) is int, 'port is not an integer: %s' % (port)
         addr = (hostname, port)
-        self._sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
-        self._sock.bind(addr)
+        try:
+            self._sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+            self._sock.bind(addr)
+        except socket.gaierror as e:
+            if e.errno not in (
+                    # Address family for hostname not supported
+                    socket.EAI_ADDRFAMILY,
+                    # Name or service not known
+                    socket.EAI_NONAME):
+                raise
+            # IPv6 address calls for (host, port, flowinfo, scopeid)
+            # although, flowinfo and scopeid would default to 0 in
+            # socketmodule.c we're explict here
+            flowinfo = scopeid = 0
+            addr = (hostname, port, flowinfo, scopeid)
+            self._sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
+            self._sock.bind(addr)

         import signal

Remove argparse in install_requires

https://github.com/sivy/py-statsd/blob/master/setup.py#L17

Seems unnecessary since this is part of python since 2.7+

If you need to maintain backwards compatibility, can there be a check for python version?

It causes some annoying issues in Ubuntu:

/usr/local/lib/python2.7/dist-packages/pytz/init.py:35: UserWarning: Module argparse was already imported from /usr/lib/python2.7/argparse.pyc, but /usr/local/lib/python2.7/dist-packages is being added to sys.path
from pkg_resources import resource_stream

When using other packages that import argparse.

Timing in timing_since is being converted to microseconds then passed to timing as milliseconds

I suspect the statsd.py code:
def timing_since(self, stat, start, sample_rate=1):
"""
Log timing information as the number of microseconds since the provided time float
>>> start = time.time()
>>> # do stuff
>>> statsd_client.timing_since('some.time', start)
"""
self.timing(stat, int((time.time() - start) * 1000000), sample_rate)

should be converting to ms rather than usec (ie, multiply by 1000 instead of 1000000). Especially since timing pretty explicitly marks the input (ie, the value which was converted to usec) as ms:
def timing(self, stat, time, sample_rate=1):
"""
Log timing information for a single stat
>>> statsd_client.timing('some.time',500)
"""
stats = {stat: "%f|ms" % time}
self.send(stats, sample_rate)

Send metric from log file Nginx to statsd

Hi, I'm using pystatsd in order to send my metrics to Statsd. My metrics are responses http in log file Nginx. I don't know how to we can extract and send these metrics to statsd because in example ? Can u help me ?

How is statsd.update_stats different from statsd.timing?

I'm reading over the statsd client code, and I don't quite understand the different between update_stats and timing. Should update_stats be considered a private method? What is the difference between the method arg 'delta' and 'sample_rate'?

Crash when adding a timer

I'm somewhat unclear how this can happen since that particular dictionary should have been created. What's odd is that it doesn't happen immediately. It happens after a while seemingly randomly.

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pystatsd/server.py", line 377, in
run_server()
File "/usr/local/lib/python2.7/dist-packages/pystatsd/server.py", line 374, in run_server
daemon.run(options)
File "/usr/local/lib/python2.7/dist-packages/pystatsd/server.py", line 333, in run
server.serve(options.name, options.port)
File "/usr/local/lib/python2.7/dist-packages/pystatsd/server.py", line 306, in serve
self.process(data)
File "/usr/local/lib/python2.7/dist-packages/pystatsd/server.py", line 107, in process
if (mtype == 'ms'): self.__record_timer(key, value, rest)
File "/usr/local/lib/python2.7/dist-packages/pystatsd/server.py", line 118, in __record_timer
self.timers[key][1] = ts
KeyError: 'production.cache.memcached.get'

py-statsd hangs in FUTEX_WAIT

root@metrics01-eu:~# strace -p 2934
Process 2934 attached - interrupt to quit
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f41e40174b0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 2934
Reading symbols from /usr/bin/python2.6...(no debugging symbols found)...done.

warning: .dynamic section for "/lib/libc.so.6" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/lib/libnss_files.so.2" is not at the expected address (wrong library or version mismatch?)
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /usr/lib/libssl.so.0.9.8...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /usr/lib/libcrypto.so.0.9.8...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libnss_files.so.2

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x00007f41ec9c6330 in sem_unlink () from /lib/libpthread.so.0
(gdb) bt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.