Code Monkey home page Code Monkey logo

Comments (7)

simonmacmullen avatar simonmacmullen commented on May 21, 2024

Hard code a smaller buffer size

Seems rather sad.

Dynamically shrink the buffer size if we determine it is not working

This is the approach I've gone for, primarily because it should be able to stop other pathological behaviour.

Read the buffer backwards from our seek point if we detect we are seeking backwards

This might be a nice option to get syncing going still faster, but it's also fiddly and only solves this exact problem. I'll settle for having it no worse than it was in 3.4.4.

from rabbitmq-server.

simonmacmullen avatar simonmacmullen commented on May 21, 2024

Just for clarity, this bug:

  • Only affects 3.5.0
  • Requires messages to be larger than the queue index embedding threshold (by default 4kB)
  • Requires messages to be paged out before synchronisation starts

You can see in the I/O stats on the master that if (say) 250 messages are read from disk per second, we also read 250MB/s even if the messages are much smaller than that.

from rabbitmq-server.

dumbbell avatar dumbbell commented on May 21, 2024

Here is what I did to test the correction:

  1. I start two nodes, A and B, with a very low vm_memory_high_watermark to make them page messages out early, clustered them and added the following HA policy on node B:

    rabbitmqctl -n B set_policy ha-all "." '{"ha-mode":"all"}'
    
  2. I stopped node B using:

    rabbitmq -n B stop_app
    
  3. I used PerfTest to publish 10 kB messages with a rate-limited consumer so messages stay in RabbitMQ:

    PerfTest -s 10240 -R 100
    
  4. The producer could publish around 40,000 messages before being throttled.

  5. I started node B again and force synchronisation from the management UI.

With the stable branch, the management UI reports I/O read rates of:

  • 150 messages/s
  • 150 MB/s

With the rabbitmq-server-69 branch (this fix), it reports:

  • 1000 messages/s
  • 15 MB/s

I logged the size of the read buffer in file_handle_cache.erl at the same time. With stable, the buffer remains at an expected 1MB size. With the fix, the size continuously switches between 10468 and 20936, with an occasional jump to 4 MB.

from rabbitmq-server.

simonmacmullen avatar simonmacmullen commented on May 21, 2024

Note that you don't need the -R 100, you can use -y0 -u test -p to get PerfTest to publish to a queue with no consumers which might be easier to work with.

The 4MB sizes probably refer to other files (queue index files?)

Not sure whether the flicking between 10468 and 20936 is worth fixing, what do you think?

from rabbitmq-server.

simonmacmullen avatar simonmacmullen commented on May 21, 2024

Oh, also you can set a very low vm_memory_high_watermark_paging_ratio rather than vm_memory_high_watermark, that way you can publish indefinitely but get paged out rapidly.

from rabbitmq-server.

dumbbell avatar dumbbell commented on May 21, 2024

One correction to my previous comment:

150 messages/s
1000 messages/s

Those should read:

  • 150 reads/s
  • 1000 reads/s

The 4MB sizes probably refer to other files (queue index files?)

You're right, the file handle differs for those reads.

Not sure whether the flicking between 10468 and 20936 is worth fixing, what do you think?

After a test:

  • With the flickering buffer:
    • 1000 reads/s
    • 15 MB/s from disc
    • 10-12 MB/s sent to node B
  • With a constant buffer:
    • 750 reads/s
    • 7 MB/s from disc
    • 7 MB/s sent to node B

In the first case, we read 20 kB to only use 10 kB, then we read 10 kB, then we double and so on. We don't do this in the second case (we always read 10 kB). When comparing the number of reads to the throughput, we see the 33% decrease of throughput in the second case, corresponding to not wasting 10 kB. However, I can't explain why it is slower...

from rabbitmq-server.

dumbbell avatar dumbbell commented on May 21, 2024

Here are new, more meaningful numbers comparing stable and 8faf4ee.

The protocol is:

  1. Start nodes A and B, cluster them, add a HA policy.
  2. Create a queue from the management UI.
  3. Stop node N.
  4. Use PerfTest to queue 300,000 messages, which is enough to page them out (the filesystem is tmpfs). No clients are connected after that.
  5. Start B and force synchronization. While this happens, look at the time the full sync takes, as well as I/O and network statistics.

Results with stable:

  • Synchronization finished in 1'55".
  • Reads: 1600/s (1.6 GiB/s)
  • Network (from A to B): 18 MiB/s while messages are paged in, then 58 MiB/s

Results with 8faf4ee:

  • Synchronization finished in 1'10".
  • Reads: 4500/s (57 MiB/s)
  • Network (from A to B): 58 MiB/s

from rabbitmq-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.