The fmke_client from goncalotomas

Client generated IDs collide occasionally

Looking back at the Travis logs there appears to be a significant number of times that prescription_already_processed errors are shown in the log, which means that the the IDs generated by the clients are colliding too often. This is a bit more serious since the Travis tests are run on a single machine, and this situation is obviously magnified in distributed deployments of the clients...

In search for a solution to this problem I have read this post and I believe we should try to mimic some of the techniques described there. Firstly, the Twitter Snowflake approach is not ideal since it is a networked service. This would potentially add latency to client requests and we need to accurately measure that metric.

An optimal solution seems to be one that does not require ID generators to know about each other. One in each node seems to be ideal, but we need to minimize the chance of collision between the different generators. In the previously mentioned post there is an interesting approach to design an ID from 3 different parts:

{timestamp, per_generator_counter, per_generator_unique_value}

The per generator unique value is probably easy to get with a function call like erlang:phash2(node()), but some other source of randomness should be added just in case.

Once this question is properly addressed and the probability of collision minimised, the behaviour for collisions should also be specified. For instance, if a collision does happen, maybe the operation should just be counted as successful, or at least log the reason behind the failure (e.g. {error, prescription_id_collision}).

Despite this erroneous behavior seen in the benchmarks, the frequency with which these errors are generated is negligible. This can easily be seen in the benchmark graphs as they also record the number of unsuccessful operations, and there is no evidence of a significant (or even noticeable) percentage of operations resulting in this error.

However, in order to fix this issue we will need to implement something akin to Twitter Snowflake.

erlang:phash2(node()) will not return a unique value for all clients since all of the clients will be bound to [email protected]. It may be used as long as we make sure that the client node IDs are generated with the IP address they are using, but this might have to be achieved with a makefile target when running benchmarks.

goncalotomas / fmke_client Goto Github PK

fmke_client's People

Contributors

Watchers

Forkers

fmke_client's Issues

Client generated IDs collide occasionally

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent