Comments (39)
Still not able to reproduce with emqx on host, and emqx in docker with memory limit.
React-less nodejs code to simplify troubleshooting:
process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0";
//process.env.DEBUG = "*";
process.env.NODE_ENV = "dev";
const mqtt = require("mqtt");
const mqttClient = mqtt.connect({
port: 8084,
path: '/mqtt',
clientId: 'test',
username: 'test',
password: 'test',
protocol: 'wss',
hostname: '127.0.0.1',
keepalive: 120,
clean: false,
connectTimeout: 1000,
reconnectPeriod: 0,
})
mqttClient.on('packetsend', (packet) => {
console.log(packet)
})
mqttClient.on('connect', () => {
// Change the last number to vary the number of megabytes in the payload (roughly accurate).
const payload = JSON.stringify({ field: 'x'.repeat(1000 * 1000 * 90) })
mqttClient.publish('t/test', payload)
})
npm install mqtt --save
node ./test.js
from emqx.
I presume that I should run the command at resting state?
Here are the files for both pods:
alloc-emqx-cluster-core-5854b66996-0.txt
alloc-emqx-cluster-core-5854b66996-1.txt
from emqx.
Tried with vanilla emqx 5.3.2 installed with emqx-operator on GKE and the test script above, could not reproduce. No OOM, and no memory usage spike.
gcloud container clusters create emqx
gcloud container clusters get-credentials emqx
helm repo add jetstack https://charts.jetstack.io
helm repo add emqx https://repos.emqx.io/charts
helm repo update
helm upgrade --install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --set installCRDs=true
helm upgrade --install emqx-operator emqx/emqx-operator --namespace emqx-operator-system --create-namespace
kubectl wait --for=condition=Ready pods -l "control-plane=controller-manager" -n emqx-operator-system
kubectl create namespace emqx
kubectl apply -f emqx.yaml
kubectl -n emqx wait --for=condition=Ready emqx emqx --timeout=120s
kubectl -n emqx get svc
emqx.yaml
apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
name: emqx
namespace: emqx
spec:
image: emqx/emqx:5.3.2
coreTemplate:
spec:
replicas: 3
volumeClaimTemplates:
resources:
requests:
storage: 10Gi
accessModes:
- ReadWriteOnce
listenersServiceTemplate:
metadata:
annotations:
cloud.google.com/l4-rbs: "enabled"
spec:
type: LoadBalancer
dashboardServiceTemplate:
spec:
type: LoadBalancer
from emqx.
@bernard-bear glad to know, thanks!
Larger driver buffer give less stress on memory allocator, create less blocks with larger size, less fragmentations (I think this is main reason) and less context switch, basically reduce the work of memory allocator.
To be clear it is just memory spikes, not memory usage, the node reclaims and defreg the memory when it has the resource to do it.
When you want to limit the memory usage consider set it to 1.3 * memory spikes that tested
.
from emqx.
Hi @bernard-bear, thanks for the report. I could not reproduce this locally, will try on GKE.
from emqx.
How many subscribers to the topic were there? Do you have any rule engine rules?
If the broker has to fan-out a message to N subscribers, it will create N copies of the message.
from emqx.
The behaviour described above happens even if the max packet size is set to a low value (e.g. 1mb)
@bernard-bear frame_too_large
should be logged if mqtt.max_packet_size
limit is hit, could you check if you can observe such logs by setting this limit to 1MB and send a 2MB or 10MB message?
If this limit indeed works, then we can significantly cut the investigation scope.
from emqx.
@bernard-bear noticed that you have debug
level logging.
could you try to test it at info
level ?
from emqx.
How many subscribers to the topic were there?
@ieQu1 The client was the only client, so there were no subscribers to the topic.
Do you have any rule engine rules?
@ieQu1 Nope.
frame_too_large
@zmstone After sending a 2MB message with a 1MB limit, I do not observe any frame_too_large
via log trace on my client id.
These are the logs (info level):
2024-01-19T03:05:32.784021+00:00 [MQTT] bernard@<redacted>@XXX.XXX.XXX.XXX:XXXXX msg: mqtt_packet_received, packet: CONNECT(Q0, R0, D0, ClientId=bernard@<redacted>, ProtoName=MQTT, ProtoVsn=4, CleanStart=false, KeepAlive=120, Username=bernard@<redacted>, Password=******)
2024-01-19T03:05:32.784264+00:00 [AUTHN] bernard@<redacted>@XXX.XXX.XXX.XXX:XXXXX msg: jwt_verify_error, jwk: {jose_jwk,undefined,{jose_jwk_kty_rsa,{'RSAPublicKey',<redacted>}},#{}}, jwt: <redacted>, provider: emqx_authn_jwt, reason: {badarg,[<<"<redacted>">>]}
2024-01-19T03:05:32.784399+00:00 [AUTHN] bernard@<redacted>@XXX.XXX.XXX.XXX:XXXXX msg: invalid_jwt_signature, jwks: [{jose_jwk,undefined,{jose_jwk_kty_rsa,{'RSAPublicKey',<redacted>}},#{}}], jwt: <redacted>, provider: emqx_authn_jwt
2024-01-19T03:05:32.784490+00:00 [AUTHN] bernard@<redacted>@XXX.XXX.XXX.XXX:XXXXX msg: authenticator_result, authenticator: jwt, result: ignore
2024-01-19T03:05:32.784583+00:00 [AUTHN] bernard@<redacted>@XXX.XXX.XXX.XXX:XXXXX msg: authenticator_result, authenticator: password_based:built_in_database, result: {ok,#{is_superuser => false}}
2024-01-19T03:05:32.784620+00:00 [AUTHN] bernard@<redacted>@XXX.XXX.XXX.XXX:XXXXX msg: authentication_result, reason: chain_result, result: {stop,{ok,#{is_superuser => false}}}
2024-01-19T03:05:32.786528+00:00 [WS-MQTT] bernard@<redacted>@XXX.XXX.XXX.XXX:XXXXX msg: mqtt_packet_sent, packet: CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)
<MQTT client gets disconnected at this point, and subsequently reconnects>
2024-01-19T03:05:49.691282+00:00 [MQTT] bernard@<redacted>@YYY.YYY.YYY.YYY:YYYYY msg: mqtt_packet_received, packet: CONNECT(Q0, R0, D0, ClientId=bernard@<redacted>, ProtoName=MQTT, ProtoVsn=4, CleanStart=false, KeepAlive=120, Username=bernard@<redacted>, Password=******)
@bernard-bear noticed that you have debug level logging.
could you try to test it at info level ?
@zmstone It was only on debug
level for a short duration. I've tested it at info level as well multiple times and the same issue occurs.
from emqx.
mqtt_packet_received
this log is only produced at debug
level or as a trace
log.
maybe your config changes did not take effect?
could you please share emqx.conf
and data/cluster.hocon
or the configs in your k8s yaml files.
from emqx.
mqtt_packet_received
this log is only produced atdebug
level or as atrace
log.
maybe your config changes did not take effect?
I tried again and verified in cluster.hocon
that console logging level is info
, but mqtt_packet_received
is still produced. I was using the "Log Trace" feature on the web UI dashboard. Is that what you mean by trace
log?
from emqx.
Here are the configs as requested:
/opt/emqx/etc/emqx.conf
mqtt {
peer_cert_as_username = "cn"
max_packet_size = 1MB
}
retainer {
msg_expiry_interval = 2160h
max_payload_size = 1MB
msg_clear_interval = 1h
backend {
storage_type = disc
}
}
dashboard {
listeners {
https {
bind = "0.0.0.0:18084"
ssl_options {
cacertfile = "/opt/emqx/etc/certs/ca.crt"
certfile = "/opt/emqx/etc/certs/tls.crt"
keyfile = "/opt/emqx/etc/certs/tls.key"
}
}
}
}
api_key.bootstrap_file = "/opt/emqx/etc/bootstrap_api_key"
authorization.no_match = deny
authentication = [
{
algorithm = "public-key"
enable = true
from = password
mechanism = jwt
public_key = "<redacted>"
use_jwks = false
}
]
listeners.ssl.default {
bind = "0.0.0.0:8883"
enable_authn = false
ssl_options {
cacertfile = "/opt/emqx/etc/certs/ca.crt"
certfile = "/opt/emqx/etc/certs/tls.crt"
keyfile = "/opt/emqx/etc/certs/tls.key"
verify = verify_peer
fail_if_no_peer_cert = true
}
}
listeners.ws.default {
bind = "0.0.0.0:8083"
enable = false
}
listeners.wss.default {
bind = "0.0.0.0:8084"
ssl_options {
cacertfile = "/opt/emqx/etc/certs/ca.crt"
certfile = "/opt/emqx/etc/certs/tls.crt"
keyfile = "/opt/emqx/etc/certs/tls.key"
}
}
log.console {
enable = true
level = info
}
/opt/emqx/data/configs/cluster.hocon
authentication = [
{
acl_claim_name = acl
algorithm = public-key
enable = true
from = password
mechanism = jwt
public_key = "<redacted>"
use_jwks = false
verify_claims = ""
},
{
backend = built_in_database
mechanism = password_based
password_hash_algorithm {name = sha256, salt_position = suffix}
user_id_type = username
}
]
authorization {
cache {
enable = true
max_size = 32
ttl = 1m
}
deny_action = ignore
no_match = deny
sources = [
{
body {clientid = "${clientid}", topic = "${topic}"}
connect_timeout = 15s
enable = true
enable_pipelining = 100
headers {
accept = "application/json"
cache-control = no-cache
connection = keep-alive
content-type = "application/json"
keep-alive = "timeout=30, max=1000"
}
method = post
pool_size = 8
request_timeout = 30s
ssl {
ciphers = []
depth = 10
enable = false
hibernate_after = 5s
log_level = notice
reuse_sessions = true
secure_renegotiate = true
verify = verify_peer
versions = [tlsv1.3, tlsv1.2]
}
type = http
url = "<redacted>"
},
{
enable = true
path = "${EMQX_ETC_DIR}/acl.conf"
type = file
}
]
}
log {
console {
enable = true
formatter = text
level = info
time_offset = system
}
file {
default {
enable = false
formatter = text
level = warning
path = "/opt/emqx/log/emqx.log"
rotation_count = 10
rotation_size = 50MB
time_offset = system
}
}
}
mqtt {
await_rel_timeout = 300s
exclusive_subscription = false
idle_timeout = 15s
ignore_loop_deliver = false
keepalive_multiplier = 1.5
max_awaiting_rel = 100
max_clientid_len = 65535
max_inflight = 32
max_mqueue_len = 1000
max_packet_size = 1MB
max_qos_allowed = 2
max_subscriptions = infinity
max_topic_alias = 65535
max_topic_levels = 128
mqueue_default_priority = lowest
mqueue_priorities = disabled
mqueue_store_qos0 = true
peer_cert_as_clientid = disabled
peer_cert_as_username = cn
response_information = ""
retain_available = true
retry_interval = 30s
server_keepalive = disabled
session_expiry_interval = 2h
shared_subscription = true
shared_subscription_strategy = round_robin
strict_mode = false
upgrade_qos = false
use_username_as_clientid = false
wildcard_subscription = true
}
from emqx.
I was using the "Log Trace" feature on the web UI dashboard. Is that what you mean by trace log?
yes
from emqx.
@bernard-bear have you tried to disable the trace? do you still get memory spikes?
from emqx.
@bernard-bear have you tried to disable the trace? do you still get memory spikes?
Yes, the memory spike was reproduced multiple times consistently, both with log trace on and off.
from emqx.
I cannot reproduce it with default emqx config and set mqtt.max_packet_size
200M , I publish a payload size of 150M and the memory heap spike is below < 600M (bump from 270M).
However I could easily reproduced what you said when turn on the debug tracing which bumps the memory usage to >2GB
On average, it seems that the memory spike is about 20 times of the payload (i.e. if the payload is 50mb, memory usage increases by around 1000mb)
Could you run these commands in your container to see if they are disabled?
emqx eval 'persistent_term:get(emqx_trace_filter, [])'
emqx eval 'emqx_logger:get_primary_log_level()'
from emqx.
These are the results from the commands (same for both nodes):
emqx eval 'persistent_term:get(emqx_trace_filter, [])'
[]
emqx eval 'emqx_logger:get_primary_log_level()'
info
from emqx.
thanks for the update.
I am comparing the difference between your setup and mine.
to clarify:
You have the configs listed in #12344 (comment)
And only one client shoots only one publish message (QoS 0?) (with payload size 90M) to EMQX with wss (secured web socket) then the EMQX get OOM killed?
The behaviour described above happens even if the max packet size is set to a low value (e.g. 1mb), which means that the memory spike occurs even before the publish packet has been accepted by the broker.
Do you mean you could reproduced the issue(OOM killed) by setting "max packet size" to 1MB and shoot 90M payload
AND you could also reproduce the issue by by setting "max packet size" to 100MB and shoot 90M payload
Could you provide an example message? (you could strip the payload, only headers are interested).
from emqx.
You have the configs listed in #12344 (comment)
Yup
And only one client shoots only one publish message (QoS 0?) (with payload size 90M) to EMQX with was (secured web socket) then the EMQX get OOM killed?
Yes, this is correct. It is QoS 0. I have also verified that bytes.received
in the metrics dashboard increases by the correct number of bytes (e.g. 90M) after message is published.
Do you mean you could reproduced the issue(OOM killed) by setting "max packet size" to 1MB and shoot 90M payload
AND you could also reproduce the issue by by setting "max packet size" to 100MB and shoot 90M payload
Yes, this is also correct. In the former case, the client will be forcefully disconnected, and the message doesn't actually get published. In the latter case, the client remains connected, and the message does get published. But in both cases, the memory spike occurs, and if the spike exceeds the memory limit, it gets OOM killed.
Could you provide an example message? (you could strip the payload, only headers are interested).
I am using the MQTT.js library (library version 5.1.3) with all the standard defaults, which includes connecting via MQTT v3.1.1 protocol. Here's a minimally reproducible example (I am using React but plain JavaScript should be the same). Previously, I had also seen the same memory spike OOM killed with a different MQTT client (Python Paho) but I have not done any further testing with that.
import { connect } from 'mqtt/dist/mqtt.min'
const MqttPage = () => {
useEffect(() => {
const mqttClient = connect({
port: 8084,
path: '/mqtt',
clientId: '<redacted>',
username: '<redacted>',
password: '<redacted>',
protocol: 'wss',
hostname: '<redacted>',
keepalive: 120,
clean: false,
})
mqttClient.on('packetsend', (packet) => {
console.log(packet)
})
mqttClient.on('connect', () => {
// Change the last number to vary the number of megabytes in the payload (roughly accurate).
const payload = JSON.stringify({ field: 'x'.repeat(1000 * 1000 * 90) })
mqttClient.publish('topic_name', payload)
})
return () => {
mqttClient.end()
}
})
}
Packet (captured via the console.log
):
{
"cmd": "publish",
"topic": "topic_name",
"payload": "{\"field\":\"<truncated>"}",
"qos": 0,
"retain": false,
"messageId": 0,
"dup": false
}
from emqx.
I suspect something about acl, could you try disable ACL? I got mem bump when ACL via http failed.
from emqx.
I think the issue also correlates to the clean: false,
, once I managed to reproduce it, I could always reproduce it with clean: false
, only using clean: true
could get rid of the mem bump.
must be some garbage left in the system.
from emqx.
Pls try to set these envvar and restart emqx pod to see if it makes any differences.
for testing payload 100M, assume network mtu 1400.
ERL_FLAGS='+MBmbcgs 50 +MBsmbcs 8192'
from emqx.
I suspect something about acl, could you try disable ACL? I got mem bump when ACL via http failed.
I can still reproduce the memory spike after I turned off both authentication methods (JWT and built-in database) and both authorization methods (HTTP Server and File). I've also tried with both clean: false
and clean: true
.
I think the issue also correlates to the
clean: false
I can still reproduce the memory spike after changing to clean: true
.
Pls try to set these envvar and restart emqx pod to see if it makes any differences.
I restarted both pods via these commands:
emqx stop
ERL_FLAGS='+MBmbcgs 50 +MBsmbcs 8192'; emqx start
However, I am still reproducing the same memory spike.
By the way, have you tried with GKE?
from emqx.
However, I am still reproducing the same memory spike.
the spike is unavoidable due to the buffering but the peak should be lowered, like from 2GB to 900M
By the way, have you tried with GKE?
I don't think it has something to do with GKE in terms of memory usage unless the memory usage report is wrong.
I assume when you say memory spikes you read it from ps
or top
for the emqx process right?
from emqx.
the spike is unavoidable due to the buffering but the peak should be lowered, like from 2GB to 900M
The peak was not lowered unfortunately. The broker still crashed due to OOM
I assume when you say memory spikes you read it from
ps
ortop
for the emqx process right?
I'm reading from the EMQX web dashboard, as seen in the video above
Is there anything else we can try?
from emqx.
I'm reading from the EMQX web dashboard, as seen in the video #12344 (comment)
That is OS memory not the memory emqx process uses. In container env it MAYBE the host memory usage that includes the other pods.
from emqx.
That is OS memory not the memory emqx process uses. In container env it MAYBE the host memory usage that includes the other pods.
I just checked the memory via top
via the following steps (steps below are for 1 pod, but did the same for the other pod):
bernard@...........$ kubectl exec -it emqx-cluster-core-5854b66996-0 bash
emqx@emqx-cluster-core-5854b66996-1:/opt/emqx$ top
At resting state, the usage hovers around these values:
top - 09:11:38 up 22 days, 38 min, 0 users, load average: 0.14, 0.31, 0.30
Tasks: 8 total, 1 running, 7 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.8 us, 1.4 sy, 0.0 ni, 90.4 id, 1.1 wa, 0.0 hi, 0.3 si, 0.1 st
MiB Mem : 16006.2 total, 11354.6 free, 1504.5 used, 3147.1 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14040.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 emqx 20 0 2982.8m 247.8m 97.6m S 2.6 1.5 3:53.33 beam.smp
279 emqx 20 0 2.3m 0.6m 0.5m S 0.0 0.0 0:00.00 erl_child_setup
466 emqx 20 0 3.6m 0.9m 0.8m S 0.0 0.0 0:00.02 inet_gethost
467 emqx 20 0 3.8m 1.8m 1.7m S 0.0 0.0 0:00.24 inet_gethost
470 emqx 20 0 2.2m 0.5m 0.4m S 0.0 0.0 0:01.49 memsup
471 emqx 20 0 2.3m 0.6m 0.5m S 0.0 0.0 0:00.02 cpu_sup
742 emqx 20 0 5.9m 3.8m 3.3m S 0.0 0.0 0:00.36 bash
751 emqx 20 0 8.7m 3.5m 3.0m R 0.0 0.0 0:00.00 top
After sending a message, the memory usage for beam.smp
increases by a significant amount, similar to what I saw in the dashboard. Is that the emqx process memory usage? Or is there another way to get just the memory for the emqx process?
from emqx.
I read RES
: 247.8 MB
.
yes beam.smp
is the emqx process.
Do you have a top
shoot when you get spikes?
from emqx.
Do you have a top shoot when you get spikes?
Here it is:
This was with a 60mb payload. Broker did not crash.
top - 09:20:16 up 22 days, 47 min, 0 users, load average: 0.36, 0.17, 0.21
Tasks: 9 total, 1 running, 8 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.6 us, 7.4 sy, 0.0 ni, 82.4 id, 1.0 wa, 0.0 hi, 0.5 si, 0.1 st
MiB Mem : 16006.2 total, 9833.9 free, 3023.8 used, 3148.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 12521.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 emqx 20 0 4573.9m 1.7g 97.6m S 50.2 10.9 4:13.39 beam.smp
279 emqx 20 0 2.3m 0.6m 0.5m S 0.0 0.0 0:00.00 erl_child_setup
466 emqx 20 0 3.6m 0.9m 0.8m S 0.0 0.0 0:00.02 inet_gethost
467 emqx 20 0 3.8m 1.8m 1.7m S 0.0 0.0 0:00.27 inet_gethost
470 emqx 20 0 2.2m 0.5m 0.4m S 0.0 0.0 0:01.57 memsup
471 emqx 20 0 2.3m 0.6m 0.5m S 0.0 0.0 0:00.02 cpu_sup
742 emqx 20 0 5.9m 3.8m 3.3m S 0.0 0.0 0:00.36 bash
752 emqx 20 0 5.9m 3.7m 3.2m S 0.0 0.0 0:00.37 bash
758 emqx 20 0 8.7m 3.6m 3.1m R 0.0 0.0 0:00.02 top
from emqx.
Yes it is indeed an issue.
could you run this command to fetch the allocators counters?
emqx eval 'recon_alloc:snapshot(), recon_alloc:snapshot_save("/tmp/alloc.txt").'
and send a copy of /tmp/alloc.txt in the container?
from emqx.
I checked the allocator counters. the mem spike is caused by many blocks which is not GCed.
Could be caused by slow mem allocation or busy CPU that don't have enough resource to do the GCs in that short period.
And GC get done, when there is low workload and mem usage back to normal.
what resource limit did you set on the EMQX pod in terms of CPU and memory?
@id in your GKE test, did you set resource limits?
from emqx.
what resource limit did you set on the EMQX pod in terms of CPU and memory?
These are the resource limits:
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
from emqx.
OK, try unlimit CPU and see what happens.
from emqx.
Tried again with higher resource limits:
resources:
limits:
cpu: "2"
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: "2"
ephemeral-storage: 1Gi
memory: 2Gi
Still observe similar memory spike. Resting values:
top - 07:33:28 up 1:40, 0 users, load average: 0.45, 0.22, 0.36
Tasks: 11 total, 1 running, 10 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0/1.5 2[| ]
MiB Mem : 16006.2 total, 10876.8 free, 1170.9 used, 3958.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14381.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 emqx 20 0 2958.8m 229.4m 97.7m S 0.0 1.4 0:59.55 beam.smp
271 emqx 20 0 2.3m 0.5m 0.4m S 0.0 0.0 0:00.00 erl_chi+
299 emqx 20 0 3.6m 0.8m 0.7m S 0.0 0.0 0:00.00 inet_ge+
300 emqx 20 0 3.8m 1.7m 1.6m S 0.0 0.0 0:00.00 inet_ge+
301 emqx 20 0 2.2m 0.6m 0.5m S 0.0 0.0 0:00.44 memsup
302 emqx 20 0 2.3m 0.6m 0.5m S 0.0 0.0 0:00.05 cpu_sup
307 emqx 20 0 3.8m 1.7m 1.6m S 0.0 0.0 0:00.00 inet_ge+
308 emqx 20 0 5.9m 3.7m 3.2m S 0.0 0.0 0:00.37 bash
315 emqx 20 0 8.7m 3.6m 3.1m S 0.0 0.0 0:00.05 top
316 emqx 20 0 5.9m 3.8m 3.3m S 0.0 0.0 0:00.38 bash
323 emqx 20 0 8.7m 3.5m 3.1m R 0.0 0.0 0:00.05 top
When 60mb payload is published once (QoS 0):
top - 07:36:25 up 1:43, 0 users, load average: 0.90, 0.58, 0.47
Tasks: 11 total, 1 running, 10 sleeping, 0 stopped, 0 zombie
%Cpu(s): 19.2/16.5 36[||||||||||||||||||| ]
MiB Mem : 16006.2 total, 9820.0 free, 2226.9 used, 3959.3 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 13325.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 emqx 20 0 4109.0m 1.3g 97.7m S 98.7 8.2 1:21.72 beam.smp
271 emqx 20 0 2.3m 0.5m 0.4m S 0.0 0.0 0:00.00 erl_chi+
299 emqx 20 0 3.6m 0.8m 0.7m S 0.0 0.0 0:00.00 inet_ge+
300 emqx 20 0 3.8m 1.7m 1.6m S 0.0 0.0 0:00.00 inet_ge+
301 emqx 20 0 2.2m 0.6m 0.5m S 0.0 0.0 0:00.49 memsup
302 emqx 20 0 2.3m 0.6m 0.5m S 0.0 0.0 0:00.06 cpu_sup
307 emqx 20 0 3.8m 1.7m 1.6m S 0.0 0.0 0:00.00 inet_ge+
308 emqx 20 0 5.9m 3.7m 3.2m S 0.0 0.0 0:00.37 bash
315 emqx 20 0 8.7m 3.6m 3.1m S 0.0 0.0 0:00.13 top
316 emqx 20 0 5.9m 3.8m 3.3m S 0.0 0.0 0:00.38 bash
323 emqx 20 0 8.7m 3.5m 3.1m R 0.0 0.0 0:00.12 top
from emqx.
is that 1.3 g peak you get? which looks reduced from 1.7g.
from emqx.
is that 1.3 g peak you get? which looks reduced from 1.7g.
Possibly, but I suspect could also be just because top
updates with a low frequency. Are you suggesting to increase the CPU limit even further? Is it normal to consume this amount of CPU?
from emqx.
I could reproduce the issue if I use cgroup to limit CPU resource.
the peak go up from 600MB to 950MB. When there is a memory pressure, the peak could go up to 1.3 GB.
however I found default wss socket buffer is too small in your case, may you try set this envvar
EMQX_LISTENERS__WSS__default__tcp_options__buffer=8388608
from emqx.
Is it normal to consume this amount of CPU?
I cannot tell because different platforms. only test could tell.
from emqx.
however I found default wss socket buffer is too small in your case, may you try set this envvar
EMQX_LISTENERS__WSS__default__tcp_options__buffer=8388608
Hi @qzhuyan, this seems to resolve the memory spike problem. Now, with a 60mb payload, the memory usage increases from ~200mb to ~500 - 800mb, which is much lower than before (previously would increase to >1gb). The CPU limit didn't seem to matter - the memory usage was roughly the same for 0.5 CPU vs 2 CPU.
I think we can consider this issue closed. Thanks so much to you and your colleagues for the prompt assistance with this!
Out of curiosity, do you have an idea how the TCP buffer value might have affected the memory usage?
from emqx.
Related Issues (20)
- Per listener authentication built-in database user creation HOT 2
- Null values or fields do not exist has an incomprehensible mapping relationship
- Characters starting with backslash cannot be used in SQL HOT 8
- Can't distinguish if string has leading spaces in Dashboard HOT 2
- Please support writing map type directly in SQL HOT 2
- Authentication 功能中 Built-in Database 不支持密码为空的场景,是否可以支持一下? HOT 1
- Built-in functions cannot be used directly in arrays in Rule Engine SQL HOT 1
- When built on NixOS, Buffer overflow using QUIC HOT 16
- Nodes can't communicate in static cluster HOT 32
- Direct support for lets encrypt HOT 4
- Frame too Large quirk HOT 7
- Allow add mountpoint for each user HOT 1
- [error] failed_to_check_schema: emqx_conf_schema HOT 12
- date_to_unix_ts in rule engine does not convert time correctly
- Time zone offset in function format_date HOT 1
- The return value of the subbits function cannot be JSON encoded HOT 4
- Requires a 4-parameter subbits function HOT 5
- Rule engine processing of large numbers HOT 2
- No prometheus stats on v5.5 HOT 4
- Issue about delivery.dropped hook and metric
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emqx.