Code Monkey home page Code Monkey logo

Comments (3)

raeburn avatar raeburn commented on August 17, 2024

I’m not really familiar with Go and its coroutines, so I’m not sure what you imagine that would look like when used inside the Linux kernel. As I understand it from what I’ve read, goroutines run efficiently by explicitly managing switching between them within one thread (or maybe more than one) as seen by the kernel -- essentially doing its own second level of scheduling independent of the kernel, so that context switches into the kernel can be reduced. It also needs its own management of stack space and such too because of that.

We looked long ago at running everything in one thread, and at least when fast storage is available, and data is generated quickly, one thread quickly becomes a bottleneck. So we went with multiple threads and message passing, with most of the worker threads owning certain data structures (address maps, allocation tables, etc), and put a fair amount of effort into avoiding or reducing synchronization between threads, even the need for locks in most cases. With fast storage, VDO can consume enough CPU time to keep multiple cores busy.

Our message passing approach seems to work pretty well under heavy load, which is our main use case; when a worker thread is resumed, it’s likely to have a bunch of work waiting for it or about to be queued up by other running threads, so it can process a bunch of updates to the data structures it owns before going back to sleep. Though, when lightly loaded, it’s probably not as efficient as one might like.

I did at one point look at an approach that would use our current message passing but instead of creating a dedicated thread for each worker, used a single work_struct per “worker” that would run under the kernel’s workqueues, letting that mechanism determine how many actual threads would be created. (Sort of a three-layer work scheduling system -- kernel threads, workqueue tasks within a worker thread, and VDO tasks within a workqueue task.) It didn’t perform as well as the dedicated threads; I didn’t have time to investigate why, though I’ve got a few guesses. Creating single-threaded kernel workqueues (to enforce serialization around data structures) and making each I/O request use its own work_struct also performed poorly.

(Also, we were asked a while back about comparing against using mutexes rather than passing the I/O requests off between threads so much. But having not written the driver to be switched between models, we’ve kind of baked the design into the code in places, including assumptions about processing messages in a certain order. So it’s not the easiest experiment to run.)

I can imagine a few scheduler tweaks that might help VDO. For example, being able to say, “I’m done but if I haven’t used too much time, go run another thread from this group”. Being able to express that certain pairs or groups of threads exchange messages a lot and it might be helpful to keep them on the same core or same NUMA node, without having to externally dictate exactly which cores are allowed to be used for each thread.

having a design in which multiple threads could be waiting on each other is inherent problematic.

The threads don’t use a “send request, block waiting for result” type approach, if that’s what you’re picturing. The I/O requests are generally either moving forward through the state machine, or parked in one of various wait queues until certain events take place (usually completion of I/O to backing storage). Except for certain threads designated to handle actions that can block, none of the worker threads should block until they’ve completed their available work and moved all the pending I/O requests to their next steps.

Have you considered a design which may not require explicit tuning of threads?

I would like to see such a design. Of course, it’d be easy to simply remove the knobs, fix the values to compiled-in constants, and live with whatever the resulting performance is. Getting performance comparable to what we can get now with tuning would be the trick. (Ideally, something dynamically adaptable to a workload changing over time, which our current set of knobs is not.) It needs to strike a good balance between cache contention (between cores and between NUMA nodes), queueing delays, lock contention, excessive calls into the scheduler, other demands in the system for CPU cycles, throughput vs latency, performance vs power, etc.

from kvdo.

akseg73 avatar akseg73 commented on August 17, 2024

@raeburn thanks for your detailed response.

from kvdo.

akseg73 avatar akseg73 commented on August 17, 2024

ZFS also offers compression and dedup and while it has a lot of tunables, don't recall there being a need for explicit tuning of threads (could be wrong).

from kvdo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.