Comments (11)
Here's the full lorem value I was using:
$ for i in {1..100000}; do echo "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet." >> lorem.txt; done
Then I just piped that file into the simple producer.
The truncated values all have 192 bytes and the full messages have 888.
from ruby-kafka.
It looks directly related to the max_bytes
setting. If I set max_bytes
to be 1000 then I receive one complete message and one that only has 60 bytes. Bumping `max_bytes_ to 1001 then I still get one complete message and one that only has 61 bytes.
It's like something along the line (Kafka? The socket connection?) is just truncating right in the response instead of at the edge of a message.
Maybe there's some "end of message" byte marker that could be watched for.
from ruby-kafka.
This is interesting, from the docs they have an example socket server configuration: https://kafka.apache.org/documentation.html#prodconfig
socket.request.max.bytes=104857600
That's 100 times the default of 1048576 (typo?). Setting that limit certainly works⦠but only because it is larger than the response of 100,000 messages.
from ruby-kafka.
Ah, that socket request is a different setting: the max that the server will allow. The default fetch request size for consumers is definitely 1024*1024.
from ruby-kafka.
Well here we go! https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchAPI
As an optimization the server is allowed to return a partial message at the end of the message set. Clients should handle this case.
Hm! I wonder if this is even possible for ruby-kafka
to recognize or if end user consumers simply need to make the consumer loop smart enough to discard the partial message.
from ruby-kafka.
Here we go! The message declares the number of bytes to read from the socket. With my example and a max_bytes
setting of 2000
on the final message in the fetch request the message says to read 888
bytes and the socket only reads 146
bytes. I think that mismatch could be recognized and not returned as a message to iterate over.
from ruby-kafka.
My debugging in lib/kafka/protocol/decoder.rb
def read(number_of_bytes)
gathered = @io.read(number_of_bytes) or raise EOFError
puts "number_of_bytes: #{number_of_bytes}"
puts "read from socket: #{gathered.size}"
gathered
end
I don't think that's the right level for this check, but it was the first good place I found to hook into to check for the mismatch.
from ruby-kafka.
Woo! That EOFError
actually seems to be the thing. If that's raised when the number is a mismatch it all works out.
from ruby-kafka.
@sdball can you confirm that this is still a problem on the latest master? I've added a check for partial messages within message sets in c933344
from ruby-kafka.
I can confirm this is indeed fixed on the latest master!
I updated to master and reran my simple consumer using tee
to output the messages to a log file. (Which makes it easy to know when to interrupt the consumer after processing all the messages.)
$ bundle exec ruby simple-consume.rb earliest | tee logs/consume.log
Then counted up unique messages:
$ sort logs/consume.log | uniq -c
100000 # (complete lorem ipsum messages)
100,000 complete messages, which is exactly right. π
from ruby-kafka.
Awesome! π
from ruby-kafka.
Related Issues (20)
- Support Reporting Metrics to OpenTelemetry HOT 5
- Update latest release on repo landing page. HOT 1
- Offset is getting reset to earliest after getting an "invalid offset" error HOT 1
- Support for KAFKA-4148 / KIP-79 / search offsets by timestamp in Consumer HOT 1
- EOFError when connecting with SCRAM authentication HOT 2
- Extra space in `[Producer ]` prefix of async producer logs HOT 1
- Join request timeout doesn't consider rebalance_timeout HOT 2
- MSK IAM: add support for AssumeRole auto refresh HOT 12
- Kafka emit to Datadog is broken under ruby 3 HOT 2
- Create new configuration to be able to disable '[Producer] Sending' logs. HOT 1
- Why "consumer.each_message..." not looping indefinitely HOT 1
- OutOfOrderSequenceNumberError for async idempotent producer HOT 1
- Infinite loop in Cluster.get_coordinator if CoordinatorNotAvailable occurs HOT 1
- parameter "ssl_verify_hostname" is not used HOT 1
- Unexpected failure during shutdown after disconnect from a restarting Broker HOT 1
- Help: Kafka topics are deleted automatically after sudo systemctl stop kafka is done. HOT 2
- Long term plan for ruby-kafka HOT 1
- https://ruby-kafka-slack.herokuapp.com is not working. HOT 4
- Ruby Kafka Issues with MessageBufferOverFlow HOT 2
- Add support for using ssl_ca_cert as string HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ruby-kafka.