3scale / apisonator Goto Github PK
View Code? Open in Web Editor NEWRed Hat 3scale API Management Apisonator backend
Home Page: https://3scale.net
License: Apache License 2.0
Red Hat 3scale API Management Apisonator backend
Home Page: https://3scale.net
License: Apache License 2.0
It looks like when enabling async Apisonator leaks memory. This might be fixed in upstream dependencies of the async reactor, but we are blocked on #303.
@3scale/operations can you provide more data on this? Grafana dashboard screen captures would be nice to have.
Hi,
I am attempting to test the apisonator project on Fedora 28.
The Running tests suggests using the script
$ script/test
But I am finding this script is expecting some tools/software in the /opt directory (amongst other locations).
I would have expected the documentation on testing to have listed the following as dependencies
Do you agree ?
This spec failure:
Failures:
1) ThreeScale::Backend::EventStorage.ping_if_not_empty with events in set with two calls in same moment (race condition) returns false the second time
Failure/Error: values = threads.each(&:wakeup).map { |thread| thread.join.value }
fatal:
No live threads left. Deadlock?
# ./spec/unit/event_storage_spec.rb:224:in `join'
# ./spec/unit/event_storage_spec.rb:224:in `block (6 levels) in <module:Backend>'
# ./spec/unit/event_storage_spec.rb:224:in `map'
# ./spec/unit/event_storage_spec.rb:224:in `block (5 levels) in <module:Backend>'
Finished in 2.77 seconds (files took 2.09 seconds to load)
938 examples, 1 failure
is much more likely to be triggered by the Circle CI infrastructure. In fact, is the most likely reason tests fail, and it is becoming so annoying I am seriously considering removing the whole thing.
Hi,
I've been performance testing 3Scale AMP v2.3 and come across a scalability issue. Nearly all the AMP system is scalable except for one component.
I've scaled up the pods on the system as suggested in the documentation. Increasing gateway, backend-listener (apisonator) and backend-worker pods.
The issue is with the Redis jvm instance. Redis running as a single threaded JVM process becomes the bottleneck in the system.
This should not be shocking news to apisonator project community.
To take the project beyond this scalability bottleneck something needs to change.
There will be effort involved to find a suitable data-store, migrate APIs and performance test scalability.
Is there any appetite for substituting the Redis instance with something that can scale out ?
Perhaps it is my misunderstanding but I am finding it confusing the way rejections due to breach of rate limits are presented to the caller.
For example in a standard response where I have exceeded the limits, I would get something like this:
<?xml version="1.0" encoding="UTF-8"?>
<status>
<authorized>false</authorized>
<reason>usage limits are exceeded</reason>
<plan>Basic</plan>
<usage_reports>
<usage_report metric="hits" period="minute">
<period_start>2018-09-01 14:44:00 +0000</period_start>
<period_end>2018-09-01 14:45:00 +0000</period_end>
<max_value>1</max_value>
<current_value>1</current_value>
</usage_report>
</usage_reports>
</status>
So we see there we have a human readable reason
but we have not gotten an error code tag.
Now if I look at the docs here https://github.com/3scale/apisonator/blob/master/docs/rfcs/error_responses.md#currently-known-error_codes-and-proposed-classification I can see that limits_exceeded
is a known error code that can be mapped to a 409
response, so that is slightly conflicting with the actual response.
What then causes further confusion is if I use the rejection_reason_header
header is see that limits_exceeded
is embedded in the response headers.
Personally, what I would like to see is the limits_exceeded
as part of the xml in the error_code
tag for consistency. I don't want to have to enable an extension for the single case where I need to know that I've exceeded limits, as per the docs linked above.
Hi,
Taking a quick peek inside a generated apisonator container I see the following directories.
ruby/3scale_backend-2.89.0/vendor/bundle/ruby/2.3.0/bundler/gems/puma-9b17499eeb49/.git
ruby/3scale_backend-2.89.0/vendor/bundle/ruby/2.3.0/bundler/gems/puma-9b17499eeb49/examples
ruby/3scale_backend-2.89.0/vendor/bundle/ruby/2.3.0/bundler/gems/resque-88839e71756e/.git
ruby/3scale_backend-2.89.0/vendor/bundle/ruby/2.3.0/bundler/gems/resque-88839e71756e/examples
Git metadata can quickly add up in size. Are these directories good candidates for exclusion ?
It might be the case this suggestion needs raised upstream to the respective gem projects.
This extension would be useful for caches, much like hierarchy
, to avoid contacting 3scale when they find an application key they don't know anything about.
Without this extension caches need to contact 3scale even if they know an app is within limits and otherwise it's been authorized before with a different app key, because this new application key might actually exist in the database or not and then the request should be rejected. If a user kept calling a cache with different app keys, a cache would be forced to keep contacting 3scale.
With this extension a cache can take the opportunity in which they learn about metric hierarchy to also list the set of accepted application keys. Like the hierarchy, this information would then be periodically retrieved to pick up any updates.
Security-wise there is no privilege boundary crossed, since a cache already has full access to a 3scale account via the Porta APIs.
Up until now, in Porta it was possible to define hierarchies of metrics with only 2 levels. This is going to change to support the "APIs as a product" feature.
For apisonator, we need to ensure that hierarchies of more than 2 levels are supported. This means that we need to check things like:
Some of those things might already be working fine, but I think in the past, when implementing some features, we've assumed that it was fine to assume a max of 2 levels.
This is a topic proposed by @unleashed . I'll quote what he said:
The problem arises when we want to report some usage and then perform an authorization. The and then part involves a guarantee: reporting should have been performed before authorization is evaluated.
We currently do not have a repauth endpoint (it is not clear whether that would work well without resorting to OOB jobs), so we could support this flow issuing two different calls as of now, but with a guarantee.
My idea is that apisonator could create a token when reporting, pass it to the caller, pass it to the job, and have the job create a key with a reasonable expiration time once it finishes reporting. This way, we could modify the authorization calls to receive an optional token that would be checked before proceeding with authorization. If the token did exist, the call would keep calm and carry on. Otherwise it would signal the problem and ask the client to try again soon-ish.
In S3, events exported via Kinesis are grouped by hour and it looks like the RedshiftAdapter
crashes when it cannot find a directory (in this case '2017/02/18/11').
Here's an old stacktrace that shows the problem:
irb(main):006:0> ThreeScale::Backend::Stats::RedshiftAdapter.insert_pending_events
Loading events generated in hour: 2017-02-18 11:00:00 UTC
PG::InternalError: ERROR: The specified S3 prefix '2017/02/18/11' does not exist
DETAIL:
-----------------------------------------------
error: The specified S3 prefix '2017/02/18/11' does not exist
code: 8001
context:
query: 634029
location: s3_utility.cpp:568
process: padbmaster [pid=29509]
-----------------------------------------------
from /var/lib/gems/2.2.0/gems/3scale_backend-2.69.0/lib/3scale/backend/stats/redshift_adapter.rb:304:in `exec'
from /var/lib/gems/2.2.0/gems/3scale_backend-2.69.0/lib/3scale/backend/stats/redshift_adapter.rb:304:in `execute_command'
from /var/lib/gems/2.2.0/gems/3scale_backend-2.69.0/lib/3scale/backend/stats/redshift_adapter.rb:349:in `import_s3_path'
from /var/lib/gems/2.2.0/gems/3scale_backend-2.69.0/lib/3scale/backend/stats/redshift_adapter.rb:334:in `save_in_redshift'
from /var/lib/gems/2.2.0/gems/3scale_backend-2.69.0/lib/3scale/backend/stats/redshift_adapter.rb:255:in `block in insert_pending_events'
from /var/lib/gems/2.2.0/gems/3scale_backend-2.69.0/lib/3scale/backend/stats/redshift_adapter.rb:253:in `each'
from /var/lib/gems/2.2.0/gems/3scale_backend-2.69.0/lib/3scale/backend/stats/redshift_adapter.rb:253:in `insert_pending_events'
from (irb):6
from /usr/bin/irb2.2:11:in `<main>'
Hi,
I have attempted to start testing apisonator project. Using the instructions.
Seeing an error when running make test
$ docker run -ti --rm -h apisonator-test -v /thebounty/work/redhat/3scale/apisonator:$(docker run --rm apisonator-test /bin/bash -c 'cd && pwd')/apisonator:z -u $(docker run --rm apisonator-test /bin/bash -c 'id -u'):$(docker run --rm apisonator-test /bin/bash -c 'id -g') --name apisonator-test apisonator-test
/home/ruby/.bash_rbenv: eval: line 14: syntax error near unexpected token `)'
/home/ruby/.bash_rbenv: eval: line 14: ` )'
ruby 2.2.4p230 (2015-12-16 revision 53155) [x86_64-linux]
/home/ruby/.bash_rbenv: eval: line 14: syntax error near unexpected token `)'
/home/ruby/.bash_rbenv: eval: line 14: ` )'
/home/ruby/.bash_rbenv: eval: line 14: syntax error near unexpected token `)'
/home/ruby/.bash_rbenv: eval: line 14: ` )'
/home/ruby/apisonator/script/lib/functions: line 5: $'\r': command not found
/home/ruby/apisonator/script/lib/functions: line 7: syntax error near unexpected token `$'{\r''
'home/ruby/apisonator/script/lib/functions: line 7: `function daemonize {
/home/ruby/apisonator/script/lib/rbenv/ruby_versions: line 2: $'\r': command not found
/home/ruby/apisonator/script/lib/rbenv/ruby_versions: line 8: syntax error near unexpected token `$'{\r''
'home/ruby/apisonator/script/lib/rbenv/ruby_versions: line 8: `regex_escape() {
script/test: line 46: start_services: command not found
script/test: line 22: bundle_exec: command not found
script/test: line 9: stop_services: command not found
Failed tests in default version
$
We needed to temporarily disable stats deletion background jobs because they are inefficient and take too much time to complete.
Here are some numbers that can help us find a more efficient solution:
I also measured the runtime for different operations that we need to perform in this kind of background job. Bear in mind that these are just approximations. There are many factors that can alter these numbers (redis latency, CPU, etc.):
According to the numbers above:
I choose to give the number for partitioning by year and month, but we could choose other granularity.
Regarding response codes, I think they can be treated as 8 extra metrics (202, 403, 404, 500, 504, 2xx, 4xx, 5xx).
A possible implementation would be as follows: the job that generates partitions takes into account all the factors (app, metrics, period of time) and generates small jobs that take a reasonable time to generate the subset of keys that they were assigned and delete them.
As an example, we have seen a job that generates around 10M keys. In order to delete all those keys, with the approach of splitting the work by 1service-1app-1metric-1year as mentioned above. We'd need to generate around 10M/9190 = 1088 jobs. Enqueuing those jobs would take 1088/750 = 1.45s. If we partitioned by month instead of by year, we'd need 10M/(9190/12)=13056 jobs. Which would take 17.4s to be enqueued.
This approach would be much more efficient than the current one. However, the time to enqueue all the smaller jobs could be an issue.
The enqueue time could be reduced if we enqueued jobs using pipelines, which as far as I know, is not something that the Resque client that we are using supports, but should be doable. Alternatively, we could make those calls in parallel.
We could also try to find different ways to generate keys more efficiently. We could analyze the code that does that with stackprof or similar and see if we can optimize something.
@eguzki and I discussed an alternative approach. It would be a recursive approach. Once a job is executed, it would delete some keys and create a new job that has fewer applications, fewer metrics, a shorter period of time, or some combination of all that. In the end, we'd have a job without apps, metrics, and an interval of 0s. Then, we'd know that we've finished deleting all the keys of the original job. This approach removes the cost of enqueuing a large number of jobs in a single job, because each one just enqueues another one. However, I see two problems with this approach. The first one is that deleting all the keys could take a long time, since it serializes the jobs. The second problem I see is that it might not be so easy to reduce the job into a smaller one. For example, if we wanted to send one application less to the next job, we'd need to delete everything for that app first.
Let me know what you think @unleashed , @eguzki .
In some environments, that's done using a Redis exporter. However, that's not available in some setups. For that reason, it'd be nice if the Apisonator workers exposed the queue sizes via Prometheus.
On ppc64le, upstream make ci-build
resulted in the following error(s) :
Step 18/63 : RUN sudo runuser -l postgres -c "${POSTGRES_PREFIX}/bin/initdb --pgdata='${POSTGRES_DATA_PREFIX}/data' --auth='trust'" < /dev/null && sudo runuser -l postgres -c "${POSTGRES_PREFIX}/bin/postgres -D '${POSTGRES_DATA_PREFIX}/data'" > /tmp/postgres.log 2>&1 & sleep 5 && sudo runuser -l postgres -c "${POSTGRES_PREFIX}/bin/createdb test" && sudo runuser -l postgres -c "${POSTGRES_PREFIX}/bin/psql test"
---> Running in 9acba8a3c1ee
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
initdb: invalid locale settings; check LANG and LC_* environment variables
createdb: could not connect to database template1: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
& twemproxy version v0.4.1
cannot be built, as it does not recognize underlying architecture.
And is fixed in v0.5.0
config/config.guess: unable to guess system type
This script, last modified 2009-04-27, has failed to recognize
the operating system you are using. It is advised that you
download the most up to date version of the config scripts from
http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
and
http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD
If the version you run (config/config.guess) is already up to date, please
send the following data and any information you think might be
pertinent to <[email protected]> in order to provide the needed
information to handle your system.
config.guess timestamp = 2009-04-27
uname -m = ppc64le
uname -r = 4.18.0-240.10.1.el8_3.ppc64le
uname -s = Linux
uname -v = #1 SMP Mon Jan 18 17:21:08 UTC 2021
/usr/bin/uname -p = ppc64le
/bin/uname -X =
hostinfo =
/bin/universe =
/usr/bin/arch -k =
/bin/arch = ppc64le
/usr/bin/oslevel =
/usr/convex/getsysinfo =
UNAME_MACHINE = ppc64le
UNAME_RELEASE = 4.18.0-240.10.1.el8_3.ppc64le
UNAME_SYSTEM = Linux
UNAME_VERSION = #1 SMP Mon Jan 18 17:21:08 UTC 2021
configure: error: cannot guess build type; you must specify one
configure: error: ./configure failed for contrib/yaml-0.1.4
Hi there! I was using a docker image of Apisonator to avoid reliance on SaaS for integration testing of WASM filters, being developed under the GSoC'21 program. I am using internal APIs to initialize service ids, tokens, and applications. Even though all calls are successful and registered by Apisonator and Redis, authorize endpoint is not able to resolve the service token.
Script to reproduce the error:
echo "Start Redis"
docker run -p 6379:6379 -d --name my-redis redis --databases 2
echo "Start Apisonator"
docker run -e CONFIG_QUEUES_MASTER_NAME=redis://redis:6379/0 \
-e CONFIG_REDIS_PROXY=redis://redis:6379/1 -e CONFIG_INTERNAL_API_USER=root \
-e CONFIG_INTERNAL_API_PASSWORD=root -p 3000:3000 -d --link my-redis:redis \
--name apisonator quay.io/3scale/apisonator 3scale_backend start
echo "Wait for redis and apisontor to launch"
sleep 5
echo "Create a service"
curl -d '{"service":{"id":"my_service_id","state":"active"}}' http://root:[email protected]:3000/internal/services/ | jq '.'
echo "Create a service id and token pair"
curl -d '{"service_tokens":{"my_service_token":{"service_id":"my_service_id"}}}' http://root:[email protected]:3000/internal/service_tokens/ | jq '.'
echo "Add application"
curl -d '{"application":{"service_id":"my_service_id","id":"my_app_id","plan_id":"my_plan_id","state":"active"}}' http://root:[email protected]:3000/internal/services/my_service_id/applications/my_app_id | jq '.'
echo "Check if service exists or not (Should return back service in JSON format)"
curl http://root:[email protected]:3000/internal/services/my_service_id | jq '.'
echo "Check if pair exists or not (should return 200 OK)"
curl --head http://root:[email protected]:3000/internal/service_tokens/my_service_token/my_service_id/
echo "Check pair without head (returns 'not found')"
curl http://root:[email protected]:3000/internal/service_tokens/my_service_token/my_service_id/ | jq '.'
echo "Use Authorize endpoint (returns 'service_token_invalid'):"
curl "http://0.0.0.0:3000/transactions/authorize.xml?service_token=my_service_token&service_id=my_service_id&user_key=my_user_key"
sleep 2
echo "Clean up"
docker rm my-redis -f
docker rm apisonator -f
Apisonator logs:
172.17.0.1 - root [12/Jul/2021 11:50:59 UTC] "POST /internal/services/ HTTP/1.1" 201 169 0.030991 0 0 0 0 2 1 - -
172.17.0.1 - root [12/Jul/2021 11:50:59 UTC] "POST /internal/service_tokens/ HTTP/1.1" 201 20 0.0023496 0 0 0 0 2 1 - -
172.17.0.1 - root [12/Jul/2021 11:50:59 UTC] "POST /internal/services/my_service_id/applications/my_app_id HTTP/1.1" 201 180 0.0138289 0 0 0 1 3 1 - -
172.17.0.1 - root [12/Jul/2021 11:50:59 UTC] "GET /internal/services/my_service_id HTTP/1.1" 200 167 0.0052454 0 0 0 3 5 1 - -
172.17.0.1 - root [12/Jul/2021 11:51:00 UTC] "HEAD /internal/service_tokens/my_service_token/my_service_id/ HTTP/1.1" 200 - 0.0069157 0 0 0 4 6 1 - -
172.17.0.1 - root [12/Jul/2021 11:51:00 UTC] "GET /internal/service_tokens/my_service_token/my_service_id/ HTTP/1.1" 404 42 0.002605 0 0 0 4 6 1 - -
172.17.0.1 - - [12/Jul/2021 11:51:00 UTC] "GET /transactions/authorize.xml?service_token=my_service_token&service_id=my_service_id&user_key=my_user_key HTTP/1.1" 403 - 0.0026236 0 0 0 6 10 3 - -
Redis keys dump (using docker exec -it my-redis redis-cli; select 1; keys *
):
1) "application/service_id:my_service_id/id:my_app_id/state"
2) "service/id:my_service_id/state"
3) "service/id:my_service_id/referrer_filters_required"
4) "services_set"
5) "application/service_id:my_service_id/id:my_app_id/plan_id"
6) "service_id:my_service_id/applications"
7) "service/provider_key:/ids"
8) "service_token/token:my_service_token/service_id:my_service_id"
9) "application/service_id:my_service_id/id:my_app_id/user_required"
10) "service/id:my_service_id/provider_key"
11) "provider_keys_set"
Checking for the service pair with '--head' makes sense because there is a path listed for it and not for GET. But @unleashed asked me to mention it in this Issue.
apisonator/app/api/internal/service_tokens.rb
Lines 6 to 8 in ea23574
I am not sure why authorize endpoint is not able to resolve the service token which is required for integration tests for the wasm-filters.
Please let me know if I missed anything, thanks!
Update: I mistakenly added the provider key into the JSON data sent for creating a service and error message change from "invalid service token" to "user key missing/not found" (which makes sense as I haven't initialized any user key). So, I think, a more relevant error should be used (pertaining to provider key); OR, @unleashed mentioned that the provider key is deprecated so maybe there shouldn't be any reliance on it?
This is just a reminder to delete 83c822c after we upgrade the redis-rb gem to >= 4.1.2
We learnt in #280 that Porta is storing OIDC apps' client_secrets as app_key
's, and that has caused confusion as to how to deal with OIDC in the 3scale Istio Adapter, as specifying the client_secret as an app_key
while using the auth*.xml
endpoints ends up in successfully authorizing requests.
This issue should be resolved when we know why this is being done and whether we should remove/not allow these keys to be stored for such apps, and consequently, whether a request for an OIDC service specifying an app_key
parameter should be checked against the registered app_key
s that we have in our data store.
/cc @davidor
This validator might no longer be necessary. This issue should be closed whenever we find out whether it is or isn't needed as of the latest 3scale versions: it might be the case the admin UI (Porta) does no longer allow people to set these.
This is so because the only OAuth apps users should be creating now should be using OpenID Connect, which handles redirects in proxies way before 3scale gets to authorize/report.
Arising from #280 (comment).
/cc @davidor
Got this exception while running v3.2.1:
lib/3scale/backend/stats/aggregator.rb:148:in `block in update_alerts': undefined method `load_metric_names' for nil:NilClass (NoMethodError)
from lib/3scale/backend/stats/aggregator.rb:143:in `each'
from lib/3scale/backend/stats/aggregator.rb:143:in `update_alerts'
from lib/3scale/backend/stats/aggregator.rb:62:in `process'
from lib/3scale/backend/transactor/process_job.rb:16:in `perform'
from lib/3scale/backend/transactor/report_job.rb:15:in `perform_logged'
from lib/3scale/backend/background_job.rb:33:in `perform_wrapper'
from lib/3scale/backend/background_job.rb:12:in `perform'
from bundler/gems/resque-88839e71756e/lib/resque/job.rb:168:in `perform'
from lib/3scale/backend/worker.rb:69:in `perform'
from lib/3scale/backend/worker_sync.rb:23:in `block in work'
from lib/3scale/backend/worker_sync.rb:19:in `loop'
from lib/3scale/backend/worker_sync.rb:19:in `work'
from lib/3scale/backend/worker.rb:47:in `work'
from bin/3scale_backend_worker:25:in `block in <top (required)>'
from gems/daemons-1.2.4/lib/daemons/application.rb:266:in `block in start_proc'
from gems/daemons-1.2.4/lib/daemons/daemonize.rb:84:in `call_as_daemon'
from gems/daemons-1.2.4/lib/daemons/application.rb:270:in `start_proc'
from gems/daemons-1.2.4/lib/daemons/application.rb:296:in `start'
from gems/daemons-1.2.4/lib/daemons/controller.rb:56:in `run'
from gems/daemons-1.2.4/lib/daemons.rb:197:in `block in run_proc'
from gems/daemons-1.2.4/lib/daemons/cmdline.rb:92:in `catch_exceptions'
from gems/daemons-1.2.4/lib/daemons.rb:196:in `run_proc'
from bin/3scale_backend_worker:24:in `<top (required)>'
from /usr/local/bin/3scale_backend_worker:22:in `load'
from /usr/local/bin/3scale_backend_worker:22:in `<main>'
I've only seen this once and the job was successful after rescheduling it, so this is probably a corner case that triggers very infrequently.
The current README file is heavily oriented towards developers / contributors, but it barely has information on how to deploy and use Apisonator. We can split both aspects and have a README that is more oriented towards new users installing and using the software. We also have broken links to documentation in the old 3scale website.
I think it would be good to have a look at it from the point of view of a user wanting to install and start using it (even if we suppose they are not acquainted with the rest of the 3scale platform or just want to test or push data to the Redis database).
The usage of Date._parse
when taking the input of the timestamp
field of transactions is insufficient to validate a date. In particular, it's been discovered that some strings with a specific number of digits are considered dates by the affected code.
Note that the documentation only talks about a specific format for dates in the timestamp
field, so for example we might want to consider changing this so proper validation happens as specified in the docs.
It'd be great to define stats endpoints in the internal API so Porta does not have to access the Redis DB directly.
It would be good to have Prometheus metrics for the Internal API endpoints.
The implementation would be similar to what we already have for the auth and report endpoints.
We should delete the SAAS env. There are no differences any more.
In the process, we should also unify all the dependencies under a single Gemfile and reorganize the Rake tasks to stop depending on the env.
Unfortunately, the list of documented extensions does not include the hierarchy
extension.
This extension is useful for caching purposes, and could be extended or used in conjunction with existing or new extensions to make caching more effective.
Caching agents essentially need to replicate some functionality in Apisonator. Because these agents need to take metric hierarchies into account for correctly applying limits, they need to keep track of the current counters for them. When reporting, these agents need to figure out what delta to report, that is, how much a metric was consumed.
However, this is problematic in the case of metrics that have a parent (or in newer releases even grandparents and so on), because they need to walk over the hierarchy carefully to take into account that whatever they report in a child metric will be added to in the parent, grandparent, etc.
They need to compute this because Apisonator will add to parents. This issue is a new feature request to help these agents avoid doing work that is only meant to be undone by Apisonator. That is, agents will keep a delta for each and all metrics involved in a given reporting period, but they will need to recompute the deltas with hierarchy data when "flushing" this information, that is, decreasing the delta values for parents by their children's deltas, only for Apisonator to take that and increase the parents' deltas by their childrens'.
So this feature would be adding an extension for reporting (authrep and report endpoints) in which hierarchy computation would be turned off. That is, the calling agent tells us to trust them to have correctly computed the deltas based on the current metric hierarchy for a given service.
This benefits both the agent and Apisonator: the agent won't need to touch the deltas, it will just report hits for all involved metrics, and Apisonator won't have to apply the reported values to parents (which is effectively undoing the work made by the agent).
This implies work in the aggregator so that if this extension is enabled for the reporting request we take the specified metrics values and just add one by one to our current values without any further processing.
This is just a list of things to consider when we decide to expose new APIs for authorizing and reporting. It applies to other APIs that we might want to expose for example to improve caching done by external systems. In no particular order, some things that we've discussed in the past:
For each processed job, the workers log some information like the process time, the total time (process time + time in the queue), etc.
The second is not correct. The reason is that the enqueue time is stored in Redis by a listener when it enqueues a job:
apisonator/lib/3scale/backend/transactor.rb
Line 204 in 18edf5c
Whereas the total time is calculated by the worker when it finished processing a job:
That's the reason why sometimes we get inconsistent numbers in the logs. For example, sometimes we see that the process time is greater than the process time plus the time in the queue. In many deployments, those two timestamps always come from different machines (listeners vs background workers) so comparing them is not reliable.
We can't do this yet, but when possible, we should drop support for Ruby 2.4.
After that, we'll be able to update some gems like the async ones.
When service is disabled, the applications under all these service should all stop being authed.
Service can be enabled back. Apps start authorizing again.
Only public API endpoints should be affected by service state. Internal API should work regardless service state.
The XML reporter has been upstreamed so we no longer need to maintain an extra dependency and can finally also update.
See this PR for details.
These are some ideas that we have discussed in the past to improve the processing of background jobs:
The async-redis client does not support Redis logical databases. That means that it does not work properly on environments that have for example, the main db in redis://redis-backend:6379/0
and the queues in redis://redis-backend:6379/1
: https://github.com/3scale/3scale-operator/blob/master/pkg/3scale/amp/auto-generated-templates/amp/amp.yml
We should group the jobs and use Redis pipelining here:
Currently there is no way to define a severity level for logging apisonator events.
For customers with large numbers of requests, they can find that they are using a lot of disk storage for daily logs at the current log level.
/dev/null
by setting the CONFIG_WORKERS_LOG_FILE
but this means that no logs are sent.INFO
events are currently logged for every report job.It would be very useful to be able to set the log level with a variable, e.g rails_log_level
INFO
, WARNING
and ERROR
severity levels should be enough for most customers, although we may want to think about adding a DEBUG
level also depending on the component.
Twemproxy is no longer maintained and Envoy can act as a Redis proxy: https://www.envoyproxy.io/docs/envoy/v1.13.0/api-v2/config/filter/network/redis_proxy/v2/redis_proxy.proto
I tried to pass the test container by simply stopping Twemproxy and starting an Envoy with an equivalent config:
static_resources:
listeners:
- name: redis_listener
address:
socket_address:
address: 0.0.0.0
port_value: 22121
filter_chains:
- filters:
- name: envoy.redis_proxy
typed_config:
"@type": type.googleapis.com/envoy.config.filter.network.redis_proxy.v2.RedisProxy
stat_prefix: egress_redis
settings:
op_timeout: 5s
enable_hashtagging: true
prefix_routes:
catch_all_route:
cluster: redis_cluster
clusters:
- name: redis_cluster
connect_timeout: 1s
type: strict_dns # static
lb_policy: RING_HASH
load_assignment:
cluster_name: redis_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 7379
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 7380
admin:
access_log_path: "/dev/null"
address:
socket_address:
address: 0.0.0.0
port_value: 8001
All the tests pass except some in the BucketStorage
and BucketReader
classes. They fail because they use the SUNION
command, which is not supported in Envoy. We could replace those SUNIONS
with SMEMBERS
and perform the union in Ruby. It would probably be less efficient.
Recently we introduced the ability to remove a default service when it's the only one that a provider has (#51).
I think we should evaluate if it makes sense to remove the concept of a default service from the apisonator codebase. I believe the responsibility of deciding whether a service can be removed or not should be only in https://github.com/3scale/porta
Puma already has Prometheus metrics (added in #174), We need to do the same for Falcon.
Currently, delete stats endpoint
DELETE /internal/services/#{service_id}/stats
Accepts as request body
{
"deletejobdef": {
"applications": ["1"],
"metrics": ["5"],
"from": 1483228800,
"to": 1483228800,
"users": []
}
}
The list of endusers users
exists only in apisonator database. Porta client does not have list of end users. Moreover, Apisonator does not have internal endpoint te get list of endusers
.
So, requested change is:
users
from request body ThreeScale::Backend::User.service_users_set_key(service_id)
when delete stats endpoint is called and pass the list to stats key generator.When there's an exception, Falcon returns its description in the response body. Puma only does that in development mode. I opened an issue in Falcon to fix this socketry/falcon#98
Issue reported by @unleashed:
backend requires further activity to "flush" alerts to system.
The whole alerts code (event_storage.rb particularly) is rather low quality. We should improve the design (avoiding the problem above) and fix the awful code that handles the webhook. A Webhook class should be created (nice if it also can handle custom Host headers), and used, hopefully without the need to re-create the client and being configurable for timeouts with proper error handling.
The "proper error handling" part is avoiding horrible stuff like rescue => e; notify(); end which caused NoMethodError, NameError and LoadError among others to inflict infinite pain to me for hours.
Hello Team,
trying to validate the 3scale/apisonator repository getting issue while executing the following command :
Command
make DOCKER_OPTS="-e TEST_ALL_RUBIES=1" test
Issue
ruby@b650a9e48ae7 bin]$ bundle_install_rubies
Switching to 2.7.4
Latest version already installed. Done.
Bundling Gemfile on ruby 2.7.4p191 (2021-07-07 revision a21a3b7d23) [powerpc64le-linux] with Bundler 2.2.26
[!] There was an error parsing `Gemfile`:
[!] There was an error while loading `apisonator.gemspec`: cannot load such file -- 3scale/backend/version. Bundler cannot continue.
I don't think it's used anywhere. We probably forgot to delete it when we removed the "end-users" feature.
We are using our own fork (https://github.com/3scale/puma) that solves a performance issue. That problem has been solved in upstream, so we should try to update.
Right now there's no way to delete the default service of a provider. It probably makes sense to allow deleting it when it's the only one left.
Apisonator is not compatible with the Redis cluster mode.
I did not try all the Apisonator functionality, I just tried to make a basic test work. The problems can be summarized in two:
Alerts.update_utilization
, Application.save
, EventStorage.pending_ping?
, Metric.save
, etc.mget
, blpop
, hmget
, etc. Examples: Application.load
, Service.get_service
, some methods in UsageLimit
, etc. Also, the blpop
used to fetch jobs from the queue.I don't see a straightforward solution for those problems.
Modifying all the keys to ensure that in every operation they belong to the same shard might be complicated. Migrating a running system might be very challenging.
The solution of replacing all the mgets
with multiple gets
, remove the pipelines, etc. is not feasible. The performance hit would be very high.
In the error responses document, it suggest thats backend will return a 409
when rate limits have been exceeded which is behaviour I am observing. However it also suggests that the error_code
tag in the xml response will be populated with the code limits_exceeded
which I have not been able to observe.
This is the response I am seeing when an imposed limit has been breached:
<?xml version="1.0" encoding="UTF-8"?>
<status>
<authorized>false</authorized>
<reason>usage limits are exceeded</reason>
<plan>app-plan</plan>
<usage_reports>
<usage_report metric="hits" period="minute">
<period_start>2020-01-06 14:52:00 +0000</period_start>
<period_end>2020-01-06 14:53:00 +0000</period_end>
<max_value>2</max_value>
<current_value>2</current_value>
</usage_report>
</usage_reports>
</status>
The limit headers return the remaining quota and the reset time based on the most constrained limit that applies.
When the request is rate-limited, these headers return incorrect information.
I tried defining a metric with 2 limits:
After making 6 requests in a minute, I would expect to see a reset-time close to 60 seconds. However, it returns the remaining for the year.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.