Comments (9)
Thanks. We also found another issue which exactly coincides with your hypothesis. Our http api implementation has a bug that if any exception thrown our server does not return a response and client waits forever. We are fixing this issue
from emqx.
Thanks for reporting this. It looks like the timeout happens when our HTTP client timeouts while doing the request. Can you send your configuration for the HTTP authentication? In particular it would be interesting to see what the the value is for the request_timeout
option. Is it possible that your HTTP server gets slower at certain time periods and that could cause the timeouts?
from emqx.
I have seen once such behaviour before: if the the http server silently drops a HTTP request (e.g. due to rate limit) without responding with an error code, and does not close the connection either, the HTTP client (at HTTP layer) will wait indefinitely for a HTTP response or socket close -- this is however just a guesswork, would be nice if @thehellmaker you can help to look on the server side (logs maybe) to verify my guess.
Nonetheless, we plan to do something at application layer: reconnect if timeout happens.
NOTE: The application layer did not do retries or reconnects because in most cases, it's a server overload, retry or reconnect is likely going to increase the load even more. so the coming fix will have to be very careful when picking the dfaults.
from emqx.
@kjellwinblad Here is the authn config on emqx.conf. request_timeout is default 15 i presume as I do not provide it.
And not the load on the server is <20% CPU utilization @P99 so i do not think its server slowing down. However there are times when the server is unavailable on non blue green deployment at times.
We have 2 different listeners, one is with the below authn and other with mTLS certificate verification. The username based http based listener shown below is what our mobile application connect to and mtls based listener is what our devices connect to. While the username pwd http based auth suffers from this issue the mtls based devices are absolutely fine and they are able to connect. I can also confirm both my application cluster and emqx are running on the same machine so network partitions/connectivity issues are not a possibility. I think what @zmstone might also be possible as this issue starts building up slowly where these count of the timeout occurrences build up slowly until it starts happening to all requests. So it seems like the connections in the pool start getting into inconsistent state slowly for some reason I dnt know yet.
authentication = [
{
mechanism = password_based
backend = http
enable = true
method = post
url = "***",
body {
clientid = "${clientid}"
username = "${username}"
password = "${password}"
peerhost = "${peerhost}"
cert_subject = "${cert_subject}"
cert_common_name = "${cert_common_name}"
}
headers {
"Content-Type" = "application/json"
"X-Request-Source" = "EMQX"
}
}
]
the default pool size is 8 so if more than 8 requests come at the same time it should get pipelined. However that can timeout the requests as well if some of these are starved regularly. Our mobile applications which are connecting to this listener have infinite retries on this failure so initially once in a while connection requests fail, after sometime the first 2 fail almost regularly and then it connects, and then it increased to 5 reconnects before it connects and finally all reconnects start failing. I have now changed the config to below which has increased pool_size parameter and stricter timeouts and trying.
authentication = [
{
mechanism = password_based
backend = http
enable = true
method = post
url = "***",
pool_size=24
enable_pipelining=100
connect_timeout = 10
request_timeout = 5
body {
clientid = "${clientid}"
username = "${username}"
password = "${password}"
peerhost = "${peerhost}"
cert_subject = "${cert_subject}"
cert_common_name = "${cert_common_name}"
}
headers {
"Content-Type" = "application/json"
"X-Request-Source" = "EMQX"
}
}
]
@zmstone Since emqx is only giving these logs and my entire application is running just fine with other devices connecting to mtls listener.
I have 2 options to confirm your hypothesis
- Stress testing a lot of simultaneous connect requests so that the server is overloaded to simulate this scenario.
- Simulate by not responding to any authn requests by the http auth api
from emqx.
@zmstone it looks like your hunch was right. This is happening when the http server is unable to respond. Our deployments are not bluegreen right now. And the entire http service is unavailable during deployment during which time the http sever will be unable to respond. We have been able to verify that the more deployments we do this issue gets worse progressively.
from emqx.
@zmstone it looks like your hunch was right. This is happening when the http server is unable to respond. Our deployments are not bluegreen right now. And the entire http service is unavailable during deployment during which time the http sever will be unable to respond. We have been able to verify that the more deployments we do this issue gets worse progressively.
Thank you for the confirmation.
I wonder why the server doesnโt reply error codes such as 503, or disconnect.
from emqx.
I am not super sure but what I can confirm is that there are always mqtt connections and new requests coming consistently so there could be a possibillity that a request came right before the deployment started and the server was stopped after the handshake.
Can we introduce client side timeout configurations for the http pool so that clients can configure accordingly and if they don't return a response then the connection is returned to the pool timing it out?
from emqx.
yeah sure. I will work on a patch. Will be in 5.7.1 or 5.8.0
from emqx.
Thanks. We also found another issue which exactly coincides with your hypothesis. Our http api implementation has a bug that if any exception thrown our server does not return a response and client waits forever. We are fixing this issue
This is maybe the only cause. Or should at least buy some time before we release the enhancement.
from emqx.
Related Issues (20)
- Bug: v5.7.0 /api/v5/monitor API return 500 Error becasue of the incompatible conf changes HOT 4
- MemoryDB certificate verification fails after upgrade to version 5.7.0 HOT 8
- How to automatically create a configuration file after the container starts HOT 4
- how to configure different ACLs for different listeners HOT 9
- [Website] Emqx.io Does Not Allow to Download The Open Source Version Anymore HOT 2
- Make auto block support blocking Usernames and IPs HOT 2
- The Client ID prefix in the MQTT bridge does not take effect HOT 2
- The Client ID used by the MQTT bridge is generated according to fixed rules HOT 2
- Allow users to explicitly persist Dashboard configuration
- Connector to MQTT host fails with bad username/password, other clients connect normally HOT 2
- But I found that it was sent successfully, and I also subscribed to this topic. Should return 200 and messageid HOT 1
- Feature Request: Add exact_match Parameter to JWT ACL HOT 17
- The unit of max packet size is wrong
- Default Value of fail_if_no_peer_cert HOT 10
- Backup and Restore | Rule Configuration Missing
- Placeholders such as ${cert_subject} in JWT AuthN do not work HOT 4
- SSL listener's check for "CA Cert" HOT 3
- The statistics of disconnection reasons do not include malformed packets HOT 3
- Clearer disconnection reasons
- Add "topic_subscribe_filter" field to JWT ACL (or some acl behavior like this) HOT 21
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emqx.