segmentio / analytics-python Goto Github PK

View Code? Open in Web Editor NEW

234.0 234.0 144.0 353 KB

The hassle-free way to integrate analytics into any python application.

Home Page: https://segment.com/libraries/python

License: MIT License

Makefile 0.42% Python 98.85% Shell 0.73%

analytics-python's People

Contributors

Stargazers

Watchers

Forkers

evansolomon chiehwen poswald strategist922 alanjds dfee cmabastar pombredanne duly fsys orblivion yonico splatonline 0x19 abhishekasthana1 briefmnews seekshreyas krhenderson torkil doordash shakefu ericschles sdalvo graingert kolinko erichilarysmithsr badgermaps loganrkerr stempower pathlight simudream findyr m-vdb joshkehn stepheweffie netxillon adamchainz kajarenc myblup rhaarm diwu1989 ubergrape aadebuger pipz digideskio jjhuang enderlabs lbrindze hadrien nagyistge jsonau findhotel jhh3000 revlworld rileycrane jamiemccrindle tomchuk plainflow-dcp ampelmann edx bderusha jklemm nischithak eurekarex egmontsong curioustauseef samcheng hybridious geru-br xenops dailycoding gautamgupta samanthadrago cleanly afcarl alexholvi behestee sagar12pawar tcostam brsbilgic dolftax marlline lerela frozenith biswapanda cornershop noah-vetcove jdevera pberganza-applaudostudios avinashtgoje persistence18 eakman-nomad rohitpaulk rudderlabs zaslavskii avinash-broadcom harshsingh121098 advance512 unacademy russellp17

analytics-python's Issues

Fails to install in a clean virtualenv

Hi,

Steps to reproduce

virtualenv venv
source venv/bin/activate
pip install analytics-python

Error message:

$ pip install analytics-python
Downloading/unpacking analytics-python
  Downloading analytics-python-0.3.5.tar.gz
  Running setup.py egg_info for package analytics-python
    analytics-python requires that you have a Python "requests" library installed. Try running "pip install requests"
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/tmp/venv2/build/analytics-python/setup.py", line 1, in <module>
        from analytics import VERSION
      File "analytics/__init__.py", line 17, in <module>
        from client import Client
      File "analytics/client.py", line 7, in <module>
        from dateutil.tz import tzutc
    ImportError: No module named dateutil.tz
    Complete output from command python setup.py egg_info:
    analytics-python requires that you have a Python "requests" library installed. Try running "pip install requests"

Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/tmp/venv2/build/analytics-python/setup.py", line 1, in <module>

    from analytics import VERSION

  File "analytics/__init__.py", line 17, in <module>

    from client import Client

  File "analytics/client.py", line 7, in <module>

    from dateutil.tz import tzutc

ImportError: No module named dateutil.tz

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /tmp/venv2/build/analytics-python
Storing complete log in /tmp/tmpq6IXRz

This is caused by the version import from analytics.
When the module is imported, the presence of required dependencies is checked, before setup processes them.

This makes it a problem to deploy cleanly.

add Python 3 classifier to pypi

This will correct report analytics-python as being python3 compatible.

Programming Language :: Python :: 3

Here's an example of a python 3 checker that says this isn't compatible:

https://caniusepython3.com/project/analytics-python

easy_install not working

Not sure will use easy_install, but I assume its some :)

ivolo:~ ivolo$ easy_install analytics-python
Searching for analytics-python
Reading http://pypi.python.org/simple/analytics-python/
Best match: analytics-python 1.0.0
Downloading https://pypi.python.org/packages/source/a/analytics-python/analytics-python-1.0.0.tar.gz#md5=89d4025...
Processing analytics-python-1.0.0.tar.gz
Running analytics-python-1.0.0/setup.py -q bdist_egg --dist-dir /var/folders/zl/db8k00cd6q387kg7nht3vn4h0000gn/T/easy_install-y5ctsJ/analytics-python-1.0.0/egg-dist-tmp-YlUYW2
zip_safe flag not set; analyzing archive contents...
analytics.test.__init__: module references __path__
Adding analytics-python 1.0.0 to easy-install.pth file

Installed /Library/Python/2.7/site-packages/analytics_python-1.0.0-py2.7.egg
Processing dependencies for analytics-python
Finished processing dependencies for analytics-python
ivolo:~ ivolo$ python
Python 2.7.5 (default, Mar  9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import analytics
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/analytics_python-1.0.0-py2.7.egg/analytics/__init__.py", line 3, in <module>
    from analytics.client import Client
  File "/Library/Python/2.7/site-packages/analytics_python-1.0.0-py2.7.egg/analytics/client.py", line 5, in <module>
    import six
ImportError: No module named six

<type 'exceptions.ValueError'>: list.remove(x): x not in list

Hi,
we signed up for your service and are trying it out for the first time.
i wrote a simple script (not threaded) that sends some data but it crashes with this error, which seems to be caused by your library.

Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
  File "/usr/local/lib/python2.7/dist-packages/analytics/consumer.py", line 28, in run
  File "/usr/local/lib/python2.7/dist-packages/analytics/consumer.py", line 39, in upload
  File "/usr/local/lib/python2.7/dist-packages/analytics/consumer.py", line 62, in next
  File "/usr/local/lib/python2.7/dist-packages/analytics/consumer.py", line 78, in next_item
  File "/usr/lib/python2.7/Queue.py", line 177, in get
  File "/usr/lib/python2.7/threading.py", line 363, in wait
<type 'exceptions.ValueError'>: list.remove(x): x not in list

Allow specifying sessionId rather than just userId

Sometimes we want to log metrics for unlogged in user on server side. Currently we could specify a random userId but that's really not the intended meaning of userId (which according to the docs is a persistent database identifier for the user).

errors when calling client..

When i call

 client.identity(userid,)

or

analytics.identity(userid,)

if get this error....
No handlers could be found for logger "segment"

Date objects fail json serialization

Date objects fail json serialization:

>>> import json
>>> from datetime import date
>>> from analytics.request import DatetimeSerializer
>>> json.dumps(date.today(), cls=DatetimeSerializer)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hadrien/.pyenv/versions/3.5.2/lib/python3.5/json/__init__.py", line 237, in dumps
    **kw).encode(obj)
  File "/Users/hadrien/.pyenv/versions/3.5.2/lib/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/hadrien/.pyenv/versions/3.5.2/lib/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
  File "/Users/hadrien/.virtualenvs/coredata/lib/python3.5/site-packages/analytics/request.py", line 53, in default
    return json.JSONEncoder.default(self, obj)
  File "/Users/hadrien/.pyenv/versions/3.5.2/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: datetime.date(2017, 1, 23) is not JSON serializable

Django Stop/Restart flush events?

Hi,

Sorry this isn't a bug but more of a question.

I would like to integrate segmentio in a django app however I have concern on what it does in cases that django would be restarted or stopped using through uwsgi.
Since its using threads in background, will it flush all the messages before being stop or restarted?

Weird error while running on AWS Lambda

When trying to flush the event queue I get the following error

error uploading: [Errno 2] No such file or directory

I added some print debugging to get the full stacktrace and this is it

Traceback (most recent call last):
  File "/var/task/analytics/consumer.py", line 51, in upload
    self.request(batch)
  File "/var/task/analytics/consumer.py", line 87, in request
    self.request(batch, attempt+1)
  File "/var/task/analytics/consumer.py", line 87, in request
    self.request(batch, attempt+1)
  File "/var/task/analytics/consumer.py", line 87, in request
    self.request(batch, attempt+1)
  File "/var/task/analytics/consumer.py", line 87, in request
    self.request(batch, attempt+1)
  File "/var/task/analytics/consumer.py", line 83, in request
    post(self.write_key, batch=batch)
  File "/var/task/analytics/request.py", line 22, in post
    res = _session.post(url, data=data, auth=auth, headers=headers, timeout=15)
  File "/var/task/requests/sessions.py", line 535, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/var/task/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/var/task/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/var/task/requests/adapters.py", line 497, in send
    raise SSLError(e, request=request)

add requestId's to each action flushed

the spec developed and deployed with the iOS SDK: https://gist.github.com/reinpk/7bd33d29694578b06cce (ignore the requestTimestamp on batch flushing since we don't want to correct timestamps coming from a server)

_coerce_unicode() is wrong

Hi there,

I noticed that in your clean() function that sanitizes data before sending it to your API, if you encounter an unknown type, you will end up trying to coerce it to unicode, by using the _coerce_unicode() function. Unfortunately, this doesn't play well with anything except strings (requires the decode() method). And sadly, you only log a warning here.

We can do two things:

really coerce the item to unicode by calling unicode() on it if it's not a string (and keep the call to decode() if and only if isinstance(item, basestring)
enhance logging with more context so that when errors come up, we can actually retry sending the full event (and maybe log at error level)

We circumvented the errors in our codebase by checking that everything is JSON serializable before handing them to the SDK, but I really believe these are must-have enhancements :)

Add timeout to requests.post() in client

Looks like you guys are not adding a timeout here. Seems like a nice way to degrade in the event that you service is down.

At the very least, offer it up as a configurable option.

The requests library allows you to specify a timeout as an optional keyword arg.

Test "randomly" fail

Hi,
Been running the tests a few times and they seem to fail randomly (from what I can see).
Example of failures below

======================================================================
FAIL: test_async_basic_identify (__main__.AnalyticsBasicTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 98, in test_async_basic_identify
    self.assertEqual(analytics.stats.successful, last_successful + 1)
AssertionError: 0 != 1

======================================================================
FAIL: test_async_basic_track (__main__.AnalyticsBasicTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 119, in test_async_basic_track
    self.assertEqual(analytics.stats.successful, last_successful + 1)
AssertionError: 2 != 1

======================================================================
FAIL: test_async_full_track (__main__.AnalyticsBasicTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 175, in test_async_full_track
    self.assertEqual(analytics.stats.successful, last_successful + 1)
AssertionError: 3 != 4

======================================================================
FAIL: test_blocking_flush (__main__.AnalyticsBasicTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 194, in test_blocking_flush
    self.assertEqual(analytics.stats.successful, last_successful + 1)
AssertionError: 5 != 4

----------------------------------------------------------------------
Ran 9 tests in 12.105s

FAILED (failures=4)

======================================================================
FAIL: test_async_full_track (__main__.AnalyticsBasicTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 176, in test_async_full_track
    self.assertEqual(analytics.stats.successful, last_successful + 1)
AssertionError: 3 != 4

----------------------------------------------------------------------
Ran 9 tests in 13.122s

FAILED (failures=1)

let me know if you need more details to reproduce

analytics.flush can hang forever if analytics.send == False

Calling analytics.flush() makes the shell hang forever, even keyboard interrupts don't work.

In [1]: import analytics

In [2]: analytics.write_key = 'TEST_WRITE_KEY'

In [3]: analytics.send = False

In [4]: analytics.track('fake_user_id', 'Sent Email')

In [5]: analytics.flush()
^C^C^C^C^C^C

Two alternative options how to improve the behavior if analytics.send is set to False:

Client._enqueue should never put items into the queue (it still makes sense to do everything else that that method does so that its behavior can be covered by unit tests).
Let Client.flush return immediately or raise an exception saying that flush won't work if send is False because the consumer is not running.

Personally I prefer the 1st option.

Drop Python 3.2 support

As you can see under https://github.com/kennethreitz/requests#feature-support, requests (your dependency) doesn't support Python 3.2 anymore. New versions of virtualenv don't either (https://virtualenv.pypa.io/en/stable/changes/#id12).

No changelog

I'm upgrading from 1.1.0 to 1.2.3 and I have no idea what has changed

The docs link to the github releases atom which is useless as there is no metadata, jsut version numbers

The commit log is not readable when trying to determine the relevance of upgrades.

Please keep a changelog and make Segment.io a more useable product.

Thanks!

Add versioning to setup.py requirements

I had a previous version of python-dateutil (1.5) installed from another library, which broke pip install analytics-python.

Add time-based flush test

Perform action
Wait larger than time triggered flush
Perform another action
Check that the flush and successful matches 2

"Python's standard datetime object is broken"

This is a documentation issue, but I figure it may be best to talk directly to developers.

https://segment.com/docs/libraries/python/#timezones-in-python

Python datetime objects are not "broken". The Python datetime module, in fact, is one of the best standard library implementations of all the programming languages I've worked with. The Python core devs knew timezones were difficult to work with. Libraries that deal with timezones are gnarly messes of spaghetti code that are constantly updated. Most people use pytz, which is able to be versioned independently of your installed Python, instead of being locked into timezones that existed at the date your version of Python was released.

The core team abstracted the timezone madness into the tzinfo classes, which you can implement to determine the offsets for a particular date.

tzinfo classes are not required to use datetime objects in Python, which is good. It allows you to work with dates in an abstract way. These are called "naive" datetime objects. datetime objects which do have tzinfo objects are called "aware" datetime objects. This is documented in the second paragraph of the datetime module docs.

The following sentence on your docs shows a misunderstanding of how Python datetime objects work.

They don't "lose timezone information", whatever that means. You're not giving timezone information to the now() function in the first place. Try this: datetime.datetime.now(tz=pytz.utc).

The following image on your docs depict the behavior of naive datetime objects in a way as if you're expecting them to be timezone aware. now() doesn't create timezone aware datetime objects unless you pass the tz keyword.

All you need to do, as a library, is require timezone aware datetime objects. Then you can isoformat them and all will be fine.

Please fix the documentation to not hate on Python datetime objects because of a misunderstanding.

Thread running after request in app engine

In google app engine I consistently get the following error (not always, but very often). See stack trace below

The effect is that the event doesn't get recorded. When I first encounter this, I tried to isolate the call to analytics.track() in a task (using the handy deferred library), to avoid side effects on a user facing http request. So, I sleep better but still many events don't get recorded

I'm only interested in executing one analytics track per task in the task queue. Was looking at the code to try avoiding the starting a new thread etc (threading is very tricky in app engine), but I don't see how to avoid it, even it queue-size is 1 I don't think it would work.

Ideas?

2014-11-06 10:39:41.179 consumer is running...
D 2014-11-06 10:39:41.179 enqueued track.
D 2014-11-06 10:39:41.181 making request: {"batch": [{"event": "Payment Failed", "anonymousId": null, "context": {"library": {"version": "1.0.3", "name": "analytics-python"}},
I 2014-11-06 10:39:41.195 Starting new HTTPS connection (1): api.segment.io
D 2014-11-06 10:39:41.836 "POST /v1/batch HTTP/1.1" 200 None
D 2014-11-06 10:39:41.838 data uploaded successfully
D 2014-11-06 10:39:41.959 successfully flushed 1 items.
E 2014-11-06 10:49:39.608 Thread running after request. Creation traceback: File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/runtime.p
I 2014-11-06 10:49:39.609 This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This requ
W 2014-11-06 10:49:39.609 Threads started by this request continued executing past the hard deadline

daemon thread exception

This line: https://github.com/segmentio/analytics-python/blob/master/analytics/consumer.py#L15

...during process shutdown causes "Exception in thread Thread-1 (most likely raised during interpreter shutdown)"

this is a known problem with daemon threads: http://stackoverflow.com/questions/20596918/python-exception-in-thread-thread-1-most-likely-raised-during-interpreter-shutd

App freezing with uwsgi

I've just been trying to track down a really weird issue with running segment in flask under uwsgi. I'm explicitly creating the segment client myself.

The curious (non-app loading) bits of uwsgi look like this:

uwsgi --master -p 1 --vacuum --enable-threads \
        --carbon stats.example.com:2003 \
        --carbon-name-resolve

I actually contact segment during the starting of the app - which may be what's causing the issue. The code looks a little like this:

app = Flask(__name__)
app.analytics = analytics.Client(write_key='blah')
app.analytics.identify(user account info)

I have a whole lot of totally separate instances of the app and some of them load and some don't with no obvious reason for the differences. It won't even accept the first request.

If I add app.analytics.flush() after this bit of code - so it forces the thread to block and pump the data out to segment then the apps can start fine. I don't understand it at all but it seems to be something to do with running that thread before uwsgi takes control of the app. Or something.

I thought I'd post it here in case someone else runs into the same issue.

Fix ordering issue in test.py

The first request isn't made before the assert statement in only the first test.

Package name should NOT be "analytics", rename to segment/segmentio

The top-level package name really should be something unique, like segmentio or something. If you have another package or module loaded in your Python load path named analytics, a very common word, there is a name-space collision and it's possible (or very difficult) to resolve in Python

# import analytics
import segmentio

Add empty method signatures in analytics module to silence static analyzers

In `init.py', currently the code reads

for method in methods:
    setattr(this_module, method, uninitialized)

Which definitely works in runtime, but some IDEs (e.g. PyCharm) and static analyzers are not able to provider autocompletion and marks method calls to say analytics.track as warning and complain that method doesn't exist.

Give an ability to overwrite messageID

Currently I have not found a way to overwrite messageID. Python's library sets it automatically

App Engine Stripped prohibited headers from URLFetch request: ['Content-Length']

Getting this error running on appengine and the requests get blocked at it. Any idea?

analytics.track() - asynchronous

Hi there,

I have been working with Segment's analytics-python module for a little while now.

Today I discovered that for example the track method is asynchronous.
I never realized this before, since I usually use it within services that are always up (listeners, queues, etc)

Today I had to do a simple script to read from a file and push some data into my Segment bucket. Surprisingly when I run the super simple script the data didn't get to my Segment debugger, although when I ran the same exact code in a python console it worked like a charm.

I finally saw the problem was that my script finished before the analytics.track() function was done.
I added an extra 10 seconds to wait for it to be executed and I made sure that was the problem indeed.

So, is there anyway to make the track method wait for itself? Any extra args I could put in?
Callbacks? Any suggestions?

I took sometime to go through all the options I didn't figure out a way to do it,
any help is greatly appreciated.

Thanks for your time.

Breaking on unicode arguments

We use the

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

heading on all our files, which defaults our strings to type 'unicode'.

Upon upgrading to v1.0, analytics-python throws an exception being passed any unicode arguments, as it appears to do type-checking to explicitly check for 'str' type only. This applied for the analytics.write_key argument as well as the analytics.track 'event' name argument, but may be an issue elsewhere as well.

Not sure if the explicit type-checking was intentional, but a more lenient 'isinstance' type-check seems more appropriate to allow for unicode strings, which worked in the previous 0.4.4 version.

python wheels support

http://pythonwheels.com/

Cast non-string objects to string types

I previously hit this problem in this library, fixed it and my PR was merged. Since then it seems like you rewrote much of it and reintroduced the problem. The problem is basically with this line of code. Here's the scenario:

>>> from decimal import Decimal
>>> from six import string_types
>>> isinstance("100.00", string_types)
True
>>> isinstance(unicode(Decimal("100.00")), string_types)
True
>>>
>>> isinstance(Decimal("100.00"), string_types)
False

There is no reason to reject decimal types. They serialize to JSON-compatible strings perfectly well. In short: many, many objects developers deal with in python are not strings but can still be serialized out and you probably shouldn't be so aggressive about rejecting them. This happens with Decimals, custom types, lazy localization strings, etc... Your users can all pepper their code with unicode() calls to explicitly cast them into the types you need for your JSON transport but the proper way to solve this would be for you to just try casting to whatever types your API expects/support for serialization.

I am not 100% sure but I think this is also the problem outlined in issue #39.

Make it possible to toggle send=False

When we are running our unit tests, we need to be able to set analytics.send to be False and not let the thread client waste time enqueueing.

Please add this capability.

Test coverage

Could you calculate test coverage on your build and link to it? This is useful in evaluating whether the use the library.

Might be an issue with selecting integrations

e.g. { 'all': False, 'Mixpanel': True }

just wanted to get this on here. need to investigate more

Add size-based flush test

target_flushes = 4
actions = 0
for i in range(target_flushes * size_trigger):
    action()
    actions += 1

assert stats.flushes == target_flushes
assert stats.successful == actions

"No module named analytics" error in Python

I have followed the instructions for installing the Python library, specifically for Django.

I ran pip install analytics-python, but when I try to add import analytics anywhere in my project, I get No module named analytics in both the PyCharm IDE and when running the app. I have confirmed that I'm using the correct Python Interpreter and analytics-python is indeed recognized as being installed. It isn't listed in the docs, but should anything be added to INSTALLED_APPS?

Any help to resolve this is much appreciated.

Decimals cause an error

Decimals are not serialized by the built in json package, but the code in analytics-python/analytics/utils.py reads:

def clean(item):
    if isinstance(item, (six.string_types, bool, numbers.Number, datetime,
                         type(None))):
        return item
    elif isinstance(item, (set, list, tuple)):
        return _clean_list(item)
    elif isinstance(item, dict):
        return _clean_dict(item)
    else:
        return _coerce_unicode(item)

Decimal is a numeric type so it is returned, but it should be cleaned somehow first. Ideally this would be a custom JSONEncoder subclass (so that the string value of the JSON will be sent), but a float() conversion would help.

Remove param={} from all the methods

Hi there,

The issue with having mutable default arguments is that. Usually it can be replaced by param = param or {} or param = param or [].

Max.

Should not force dependency on python-dateutil < 2

Starting with 2.2, python-dateutil is python 2 (>= 2.6.x) compatible, and quite a few other python2 packages depends on dateutil > 2.1.

clean TypeError

analytics.utils.clean will error if L43 return _coerce_unicode(item) is reached due to _coerce_unicode taking two arguments. This can be found by using Pylint: No value passed for parameter 'cmplx' in function call (no-value-for-parameter).

Docs - urls.py vs App Ready

The docs state that I have to import analytics on the toplevel of django:

https://segment.com/docs/libraries/python/#django

First of all, this is outdated, as there is now an App Ready Hook Point in django: https://docs.djangoproject.com/en/dev/ref/applications/#django.apps.AppConfig.ready

Second: Can you elaborate why you are so fancy about having this in my urls.py?

I do not want to hardcode an external service at such a prominent point in my architecture and I do not understand the reasons why.

Allow analytics to operate in passthrough mode

Would you be open to a PR that allows the analytics to operate without a write_key?

This would let developers directly integrate analytics in their apps but not actually write data to the API. Definitely useful for debugging and teams where there are variants of dev, staging, test and demo environments, where it's not appropriate to send data to Segment.

Otherwise a developer has to wrap the analytics API in a bunch of boilerplate to not enable/send data.

There's a couple ways that would make it easy to enable this. One would just to add analytics.passthrough and change analytics._proxy to not actually create the client instance if that is set, perhaps logging the calls instead.

Another could be to make this the default behavior if analytics.write_key is None or another known, default value. This has the added advantage of allowing late configuration of the analytics.write_key setting, which may be necessary in some large, complex applications (e.g. it would get rid of the need for the Django work-around that you have in the docs.)

Improve logging/callback/failfast of the library

Giving the whole logging thing a bit more thought here's what I think would be ideal:

no print sys.stderr calls, everything should use logging
calls to track(), identify(), etc… should check if the library is configured right away and log to ERROR if it isn't however this shouldn't cause the call to return. It should still queue the data and behave as it normally would.
failed preconditions should use python's assert for internal checks/safeguards and raise ValueError if something is called improperly by the calling developer (with a missing event string for example). This gives the developer the chance to catch ValueErrors and not have to catch Exception which is too general.
All calls to track, identify, etc... should cause an on_success or on_failure callback, not matter what. Ideally the call will be queued just like normal and then the sync thread would decide if it is properly configured and can send to the server or not. This allows a developer to check their code and print/debug even if they don't have a segment.io key.
In general, reduce the chance that calls to identify/track will raise exceptions on the request handling thread.

The main issue is that not all developers on a team should be sending data to segment.io but they should still be a) testing as much of the identify/track/queue/sync pathway as possible and b) have the option to receive the callbacks on_failure and on_success so they can see and test what would have been sent.

I just want to get these ideas down here for discussion.

Ongoing changes for API v2

https://github.com/segmentio/spec

Use HTTP Basic Auth with the project's Write Key.
Rename to anonymousId from sessionId for clarity.
Separate integrations (was providers) object from context, for cleaner logs.
Add requestId for easily tracing calls through to the raw logs.
Change library to be a object with name and version for consistency.

track event fails

e.g. analytics.track(1, "Exception", {"fn": lambda x: x })

Segment causing celery workers to hang in django

We use the django post save signal to trigger segment analytics tracking asynchronously using celery. However when multiple events (about 350 in 20 seconds) are created, all the celery workers hang up consistently at the following output.

[2015-06-12 22:41:34,245: INFO/Worker-2] Starting new HTTPS connection (1): api.segment.io [2015-06-12 22:41:34,574: DEBUG/Worker-2] "POST /v1/batch HTTP/1.1" 200 21 [2015-06-12 22:41:34,578: DEBUG/Worker-2] data uploaded successfully

When the analytics tracking is commented out the workers function as expected. When the the celery rate limit is set to "600/m" the celery workers run without hanging. We have a celery hard time limit of 30 seconds to prevent segment from hanging. We found at a higher rate limit, the hard time limit was hit at high frequency and the analytics tracking was not sent through.

Not sure why the segment library is causing this to happen, please advice.

[Errno 8] _ssl.c:504: EOF occurred in violation of protocol

My logs (sentry) are showing a Segment.io request error in python2.7/site-packages/analytics/client.py

Stacktrace (most recent call last):

  File "analytics/client.py", line 71, in request
    timeout=client.timeout)
  File "requests/api.py", line 88, in post
    return request('post', url, data=data, **kwargs)
  File "requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "requests/adapters.py", line 389, in send
    raise SSLError(e)

It has happened about 900 times in the past 4 days. Any thoughts? Any more info I can give you?

debug option to client works oddly (and doesn't work in python2.6)

When you create an instance of the Client class and pass debug=True, the __init__ method will execute

self.log.setLevel('DEBUG')

and there are 2 problems with this.

self.log is the logger instance named 'segment' - it's the same instance for every client, so if you create multiple clients you can't use different values for debug or things get confused.
In python 2.6, you can't say log.setLevel('DEBUG'), you have to say log.setLevel(logging.DEBUG), otherwise you're effectively setting the log level to an arbitrary high level and no log messages appear.

Is python 2.6 supported? I can't find any documentation that specifies either way.

Date objects are removed from event properties

>>> from datetime import date
>>> from analytics import utils
>>> simple = {'birthdate': date(1981, 2, 2)}
>>> utils.clean(simple)
Dictionary values must be serializeable to JSON "birthdate" value 1981-02-02 of type <class 'datetime.date'> is unsupported.
{}

It should not be removed as json serializer permits date object.

segmentio / analytics-python Goto Github PK

analytics-python's People

Contributors

Stargazers

Watchers

Forkers

analytics-python's Issues

Recommend Projects

Recommend Topics

Recommend Org