3scale-archive / xcflushd Goto Github PK

View Code? Open in Web Editor NEW

2.0 12.0 7.0 273 KB

Daemon to flush cached data of an XC API Gateway to 3scale Service Management API

License: Apache License 2.0

Ruby 94.06% Makefile 5.25% Shell 0.69%

3scale

xcflushd's People

Contributors

Stargazers

Watchers

Forkers

pankajv82 davidor chavdam sekharvajjhala alexxnica kryndex unleashed

xcflushd's Issues

Doc improvements to README for verification of docker images

Make the following changes to README file related to the verification of docker images.

Add section on how to verify on RHEL - gpg2 is installed. So install skopeo and "make verify" .
Organize the README into sections on verification and signing. So customers who want to just verify can just look at verification section and not bother with signing.
Recategorize "easy" , "hard" approach . The "hard approach" which includes "make verify" can be simple on os where the tools are already installed ( e,g gpg2 ) or easily installed ( like skopeo ). On other hand, on some os , installing of tools ( gpg2 and skopeo) can be hard. The "make verify-docker" is easy to use. So find a good way to capture and describe this.
Firs the image should be verified and then the docker pull should be done. Verification does not pull the docker.

Make failure for xcflushd to reach Service Management API visible

Any issue causing xcflushd to not be able to report data batches back to the Service Management API may over time result in data loss and/or impair the API gateway's ability to function correctly.

Such issues should be appropriately logged as a minimum, and maybe additional steps taken to enable monitoring of the error8s) so that an Ops alarm can be generated quickly.

Note that related to #9 that could involve more than one xcflushd instance on one or more physical machines.

Makefile: Add an option to always fetch keys from PGP keyserver

Currently the verification process in the Makefile checks if there is a .asc file locally. If present, then the keys are imported from .asc file. The keys are fetched from a PGP keyserver only if the .asc is not present. Provide an option to always fetch the keys from the PGP keyserver into the PGP keyring.

Max num of threads in params is ignored

There are two parameters to configure the number of threads of the thread pools that we use: prio-threads (for the priority auth renewer) and threads (for the main thread pool). Both of them accept a min and a max. However, the max is ignored.

The reason is that we are using concurrent-ruby's ThreadPoolExecutor without specifying a max size for the queue, and that type of pool only spawns a new thread when the queue is full. For more details check: http://ruby-concurrency.github.io/concurrent-ruby/file.thread_pools.html

Specifying a max size for the queue creates new problems, like deciding what to do with new jobs when the queue is full and no new threads can be created. I think that for our use case, a FixedThreadPool is fine.

The workaround until we solve this is to specify max:max in the params instead of min:max.

Close the gap of in-process requests by caching pubsub responses for a bit

When a request misses the cache, it asks the on-demand mechanism for a response, While that is being processed (ie. contacting the configured backend), another request can arrive and miss the cache, and request the on-demand mechanism again. The responses could happen in an unfortunate sequence so that the would be needed to be done twice (or even more, depends on latencies which could increase the window).

To reduce the impact of this, we can implement a small caching mechanism (memoizer) for pubsub in which combinations that have been very recently requested would be responded ASAP, and those which have been requested too long ago (realistically only a few seconds) would still behave like there was no cache.

Write design document

We need to write a design document that details how xcflushd works.

Improve logging when getting process signals

Currently, if the process receives a signal, SIGTERM, for example, it prints:

FATAL -- : Unhandled exception SignalException, shutting down:  - SIGTERM

It looks like there is something wrong, even though the behavior is correct. The log messages should not be alarming in this case.

Force log flushing

Standard Ruby logger buffers the messages before printing them out to STDOUT. For this reason, the logs are sometimes not seen in the container logs.

A quick fix for that is to include:

STDOUT.sync = true

which forces the logs to be printed immediately.

This is not optimal, though. We need to review the logging and probably do the sync periodically.

Allow for multiple instances of xcflushd to run concurrently

Goal:
Avoid having a single point of failure in XC by only allowing one instance of xcflushd to run and flush the data concurrently (on single machine, or separate machines).

Also, allow for roll-over deployments in some orchestration frameworks where a new copy of xcflushd is launched, and the older one killed, avoiding downtime.

Proposal:
Changes in xcflushd to allow more than instance to run concurrently and access the data and flush to the Service Management API, without data corruption, loss, or double-counting.

Out of scope in this issue is if we force or ensure more than one is running, what tools are used for that and how roll-over deployments are handled. All of those may vary depending on how xc is deployed - this issue is limited to code changes to allow more than one to be running.

Make error handling more robust

The way we retrieve cached reports from Redis can lead to losing or duplicating some reports if the Redis connection fails while running some specific commands. This should only happen rarely.

The problematic command is rename. When we rename keys, we give them a unique suffix so they are not overwritten in the next flushing cycle and we can retrieve their content later.

When we retrieve their content successfully, we delete them.

The problem is that the delete operation can fail. When trying to recover contents of keys that failed to be renamed we'll not be able to distinguish these 2 cases:

The key is there because we decided not to delete it to retrieve its content later.
The key is there because the delete operation failed.

We could take a look at the logs to figure out what happened, but of course that is not an ideal solution.

Use CentOS as base image in Docker files

Move to use CentOS as base image in docker files, not ubuntu.

3scale-archive / xcflushd Goto Github PK

xcflushd's People

Contributors

Stargazers

Watchers

Forkers

xcflushd's Issues

Doc improvements to README for verification of docker images

Make failure for xcflushd to reach Service Management API visible

Makefile: Add an option to always fetch keys from PGP keyserver

Max num of threads in params is ignored

Close the gap of in-process requests by caching pubsub responses for a bit

Write design document

Improve logging when getting process signals

Force log flushing

Allow for multiple instances of xcflushd to run concurrently

Make error handling more robust

Use CentOS as base image in Docker files

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent