Code Monkey home page Code Monkey logo

Comments (7)

tomas12 avatar tomas12 commented on May 31, 2024 1

I have the same problem. My records are in avro format and the topic consists of over two million records. The filtering takes also extremely long - couple of hours. My topic has 2 partitions and in comparison paging through records in kafka is very fast.

from kafka-webview.

Crim avatar Crim commented on May 31, 2024

Hey @xstephen95x apologies for the slow reply, I've been out on travel.

I haven't seen such poor performance from the filtering logic even in topics with hundreds of millions of records. The filtering is done via a Kafka interceptor here.

How many partitions does your topic have? Do you have similar performance issues using the websocket streams vs just paging through records in kafka?

from kafka-webview.

xstephen95x avatar xstephen95x commented on May 31, 2024

I have a python script for reading from kafka topics (protobufs), and it reads a couple thousand records per second. So theres a considerable slowdown somewhere in the stack.
Perhaps its the deserializer? Perhaps its the interceptor you linked here? No way to know without perf profiling. Do you have a recommended way to do perf analysis on this? I've never perfed java, just c/c++.

Topic only has 1 or 2 partitions, but i dont think thats related.

Just timed it, it filters about 300 records/minute. so with millions of records it will take days.

from kafka-webview.

xstephen95x avatar xstephen95x commented on May 31, 2024

I've tried using https://github.com/jvm-profiling-tools/perf-map-agent to run perf-top, and i've not been able to get anything useful.

In terms of streaming vs paging, I have not been able to successfully stream. I see a null pointer exception each time i try to do stream.

java.lang.NullPointerException: null
        at org.sourcelab.kafka.webview.ui.controller.stream.StreamController.getLoggedInUser(StreamController.java:189) ~[classes/:na]

and yes i have anon access set up

from kafka-webview.

xstephen95x avatar xstephen95x commented on May 31, 2024

I have found that during the filtering of each record:

2019-02-27 17:29:02.961 WARN 25762 --- [p-nio-80-exec-3] o.a.k.clients.consumer.ConsumerConfig : The configuration 'RecordFilterInterceptor.recordFilterDefinitions' was supplied but isn't a known config.

gets logged to stdout.

from kafka-webview.

Crim avatar Crim commented on May 31, 2024

Hey @xstephen95x thanks for the detailed responses! I'll try to hit all of them here, but let me know if I missed something.

In terms of streaming vs paging, I have not been able to successfully stream. I see a null pointer exception each time i try to do stream.

Which version are you running of Kafka-WebView? That sounds a lot like a bug fixed in 2.1.3 Issue-127 Let me know if you're running version 2.1.3 or newer, and I may need to revisit this. If you're running version 2.1.2 or older, upgrading should resolve this issue.

I have found that during the filtering of each record:

2019-02-27 17:29:02.961 WARN 25762 --- [p-nio-80-exec-3] o.a.k.clients.consumer.ConsumerConfig : The configuration 'RecordFilterInterceptor.recordFilterDefinitions' was supplied but isn't a known config.
gets logged to stdout.

I think that is considered "normal" Basically if you define any non-standard configuration property that the kafka library isn't explicitly aware of, it will toss out that warning. In this case, I set a custom property to configure kafka-webviews record filter.

RE: The performance issue. The fact that you have a small number of partitions, and it sounds like paging thru the topic without filtering enabled, definitely makes me believe something is up with the filtering logic, I must be doing something silly, I just can't seem to spot it with my eyes. I believe you're right, performance profiling is going to be the best way to determine the cause here. Short of doing that, I may be able to put together a custom build for you that adds debug timing log statements to help track down the source. Is this something you would be interested in trying if I put together?

from kafka-webview.

xstephen95x avatar xstephen95x commented on May 31, 2024

Thank you for all of your responses.

So, I upgraded to 2.1.4, and ran from the compiled jar instead of ./buildAndRun.sh, and i am now getting about 800 records filtered / minute. Good speed up, but still gonna take days to filter millions of records. So i believe thats around 80ms per record, which isn't that great.

I've been working on getting a perf analysis, but im having a hard time getting it to work with the jvm.
I've attached a flamegraph from my last attempt.

If perf isn't going to cooperate, then yes perhaps the best option is to start logging timestamps.
Although, i would also need to add them in my deserializer and filter, so not sure the best way to go about that.

flamegraph-40079.svg.zip

from kafka-webview.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.