Code Monkey home page Code Monkey logo

Comments (14)

CodeFetch avatar CodeFetch commented on June 27, 2024 1

@dsommers io_uring is only supported by Linux from what I know. It allows you to define kind of ring buffers on which a kernel thread works on. This allows networking in userspace with almost no context switches. Together with MSG_ZEROCOPY this allows performance equal to running natively in the kernel minus the skb_copy operation for sending packets. As most ressource-limited devices which would profit from that are practically CPEs with higher downstream bandwidth usage, this would not hurt much I guess. Tests have showed that raw encryption performance in userspace is near what a kernel thread could achieve e.g. using WireGuard.

from ovpn-dco.

CodeFetch avatar CodeFetch commented on June 27, 2024 1

@andywangevertz Indeed. You will save the syscall for the freads/fwrites of the TAP device. TAP devices can ordinarily only accept one packet at a time per syscall
This is a lot of context-switching. With io_uring you lift that limitation. With it you can from my experience send/receive about 64 packets per syscall. Still it's slower than e.g. WireGuard, because WireGuard has multithreading support and does not need to copy the packets from userspace memory to kernel memory and vice-versa, but the skb_copy for that is not the performance killer. I guess with multithreading support or on single-core devices or with multiple OpenVPN instances you could reach almost 90% of WireGuard's throughput.

from ovpn-dco.

ordex avatar ordex commented on June 27, 2024

Hi @CodeFetch and thanks for your message.
To answer your questions: yes, the idea is to take this kernel module upstream once it reaches reasonable maturity.
We are currently working on supporting it in OpenVPN2, so that it can get to a larger user base.

The main reason is definitely performance: with this kernel module we are basically moving the whole data plane (not the control plane!) in kernelspace, similarly to other device drivers or tunnel implementations. On the other hand, it allows us to greatly simplify the linux part of the userspace implementation as it doesn't need to handle user data anymore.

io_uring sounds interesting, but at the moment it doesn't go towards the direction we are taking (as we are moving the data plane in kernel directly). Therefore it wouldn't be meaningful to allocate energy on that side right now.

Still, if you want to play with it and work on a PoC (that may result in clean patches) for OpenVPN, please do.

from ovpn-dco.

dsommers avatar dsommers commented on June 27, 2024

This approach is interesting for platforms which will not or cannot support our DCO kernel module. I would expect DCO still to be faster (as the network packets will still have the fastest path between the physical network interface and the virtual one, without any context switching at all).

But a faster user-space implementation with io_uring might be useful on lower-end routers where getting ovpn-dco running being too difficult, or in setups where the user insists on using a not recommended non-GCM based ciphers or compression or other protocol features not available in the ovpn-dco module.

Is io_uring supported on other platforms than Linux? I believe I read something about Jens Axboe being involved, who is a Linux kernel developer.

from ovpn-dco.

dsommers avatar dsommers commented on June 27, 2024

Thanks! So, then the advantage of ovpn-dco will basically be that it can utilize all the CPU cores on the data plane. OpenVPN 2.x is (still!) single-threaded and will therefor hit some limitations on the server side when more clients are connected, and I expect io_uring in an OpenVPN implementation to also be limited in that regards. However, for server sides with only one client connected, the performance difference might not be that big.

from ovpn-dco.

huangya90 avatar huangya90 commented on June 27, 2024

@CodeFetch Would you help to share some performance numbers to prove advantage of io_uring?

from ovpn-dco.

CodeFetch avatar CodeFetch commented on June 27, 2024

@huangya90 I don't have any statistics on that anymore, but it was a constant 20-30% increase in throughput on an Intel 4820K. Single threaded... I have looked at it with oprofile. The recvmsg sendmsg syscalls are gone which accounted for 20-30% previously. So that matches.

from ovpn-dco.

CodeFetch avatar CodeFetch commented on June 27, 2024

@huangya90 Keep in mind there is more potential for improvement. As far as I know OpenVPN does not only lack multithreading support, but it doesn't have a buffer pool either and isn't optimized for cache hotness.

from ovpn-dco.

huangya90 avatar huangya90 commented on June 27, 2024

I don't have any statistics on that anymore, but it was a constant 20-30% increase in throughput on an Intel 4820K. Single threaded... I have looked at it with oprofile. The recvmsg sendmsg syscalls are gone which accounted for 20-30% previously. So that matches.

If so, speeding up in the kernel space is much better. Please refer to performance numbers [1]of ovpn-dco tested before.

[1] https://www.mail-archive.com/[email protected]/msg21584.html

from ovpn-dco.

CodeFetch avatar CodeFetch commented on June 27, 2024

@huangya90 That's the gain of fastd which is better optimized than OpenVPN. The numbers don't look comparable to me. I guess he used a multicore processor and there must have been AES hardware acceleration as ChaCha should be faster.

from ovpn-dco.

cron2 avatar cron2 commented on June 27, 2024

from ovpn-dco.

CodeFetch avatar CodeFetch commented on June 27, 2024

@cron2 Yes, but it's like comparing apples with Microsoft. Using CPU threads for all cores is possible with the userspace implementation, too, but not implemented in OpenVPN. ovpn-dco will definitely perform better than a userspace version, but not that much without hardware encryption on a single thread machine.

BTW does ovpn-dco also support layer 2 tunnels?

from ovpn-dco.

andywangevertz avatar andywangevertz commented on June 27, 2024

@CodeFetch We are using openvpn on level 2 device (tap0) and the process looks like below (RX side)
packets -> kernel -> openvpn(decrypt) -> kernel -> tap0 -> Application

As you can see, it will go into and come out of kernel twice.. I am not sure what fastd does on the level 2 device/tunnel, do you think that io_uring could also benefit on openvpn userspace application (like 20%-30%)? I would like to give it a try and would like hear some advice for io_uring.

Thanks!

from ovpn-dco.

ordex avatar ordex commented on June 27, 2024

I am closing this, but for further discussions, please reach out to the openvpn-devel mailing list, where a broader audience will be able to join the conversation.
Cheers!

from ovpn-dco.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.