bufferapp / buda-events-collector Goto Github PK
View Code? Open in Web Editor NEWBuda gRPC event collector
License: MIT License
Buda gRPC event collector
License: MIT License
Heya @michael-erasmus! Forking this question from #20.
Right now we have a RPC for each event type, and we're duplicating a bunch of code (send call and responses). I was thinking to move more towards a CollectEvent
generic RPC. We can define the events we track in the gRPC definitions inside an ENUM or similar.
Not an easy change and probably it doesn't make sense to invest time on it right now but I think is a great one to discuss at this stage.
We should be able to deploy the server using @bufferbot servicedeploy buda-events-collector
Hey @davidgasquez!
One thought that came to mind today, right now in the server we're doing a single put_record per rpc call. This makes sense since even though with gPRC you could streaming rpc, a simple request/response flow fits in with how we would collect events
Reading through some of the Kinesis docs I saw this though and that made me pause a bit:
Each call to PutRecord operates on a single record. Prefer the PutRecords operation described in Adding Multiple Records with PutRecords unless your application specifically needs to always send single records per request, or some other reason PutRecords can't be used.
I was curious if you've given it some thought if we could optimize things a bit more by batching up events and using put_records instead? A naive implementation that comes to mind is perhaps keeping an in-memory list or stack of messages that we can periodically flush to kinesis.
I don't think this is super important right away and we can probably just use put_record
for the time being, but it could be great to keep this issue around as a reminder!
Hey there! While exploring the RAW visits data on BigQuery I spotted that some visits we were receiving have empty visitor_id
and user_id
.
Checking Redshift, seems this is happening too there. For example, there are 1,893,159 visits between 2019-03-24
and 2019-03-26
that have null or empty visitor_id
and user_id
.
The same query against BigQuery data returns 2,053,177 records.
These visits are going the following URIs:
Most of them seems to be back-end related but there are a few like https://buffer.com/guides/chrome/installed
that got a few visits without visitor_id
and user_id
.
As @michael-erasmus suggested, having Bugsnag integrated in the collector could be useful in the future to spot errors.
Now that we've cleaned up the folder structure and improved the README I think we can move forward and open source this repository. ๐
The only thing that's in the history that could concern us is the Lambda Amazon ARN. Happy to delete the folder from the history if this information is dangerous at any level! ๐
cc @michael-erasmus and @djfarrelly
Not all the events are being sent to BigQuery. For example, we have a dfbddd5c-e9b8-4659-8c76-2db858b87b5b
Action Taken in Redshift that is not currently present in the events BigQuery table.
Overall we're "missing" 5% of the events we send to Redshift.
With the Redshift deprecation taking place, we can stop sending some BUDA metrics to AWS.
These are the current producers to stop:
funnel_events
funnels
subscription_created
subscription_period_updated
subscription_cancelled
visits
signups
signins
actions_taken
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.