Comments (5)
This epoll_wait call actually have a secondary effect.
We are using nested epfds in the following way:
User epfd <- global_ring_epfd <- CQ_channel_fd
Every CQ_channel_fd is added to the global_ring_epfd, and the global_ring_epfd
is added to the user_epfd [epfd_info.cpp +224]. When we do
epoll_wait(user_epfd) [epoll_wait_call.cpp +180], it will return
"global_ring_epfd ready" if global_ring_epfd is ready. The global_ring_epfd
will be ready if one its cq_channel_fd is ready. Assume epoll_wait(user_epfd)
return ready with global_ring_epfd, we call epoll_wait(global_ring_epfd) to get
the ready cq_channel_fd. After this call, the global_ring_epfd might still be
counted as ready for user_epfd, until another call of
epoll_wait(global_ring_epfd) which return no events will be called. As a
result, calling epoll_wait(user_epfd) next time will return ready with
global_ring_epfd, although none of its cq_channel_fd is ready. This will
interfere with our polling mode, and will insert us to interrupt handling logic
when not necessary.
To avoid user_epfd being falsely ready, we call epoll_wait(global_ring_epfd) a
second time, which change its status to not-ready for the user_epfd.
Instead of removing this call, I now call it only when in polling mode
(VMA_SELECT_POLL != 0).
Original comment by [email protected]
on 6 Feb 2014 at 10:07
from libvma.
please look at this and test if it still improve performance for you:
https://code.google.com/p/libvma/source/detail?r=347e1795d852fa935406bb8a9fa6cc2
3df4e2ca1
Generally, for epoll we have the ability to know exactly which CQ channels we
need to listen to, without the need to ask the sockets for them, since when
calling epoll_ctl we keep this info.
A better way to improve performance here is to remove the use of
global_ring_epfd and use directly only the relevant cq_channel_fds by adding
them directly to user_epfd.
(Alternatively, we can have a special epfd that iomux only the relevant
cq_channel_epfds for this user_epfd and will replace the global_ring_epfd, but
this option is less good, as it require additional epfd for each user_epfd).
This change will require much more work, and we are not planning to do it in
the near future.
We will be happy to get a patch if you like to try and go in this direction.
Original comment by [email protected]
on 6 Feb 2014 at 10:31
- Changed state: Fixed
from libvma.
"Every CQ_channel_fd is added to the global_ring_epfd, and the global_ring_epfd
is added to the user_epfd [epfd_info.cpp +224]. When we do
epoll_wait(user_epfd) [epoll_wait_call.cpp +180], it will return
"global_ring_epfd ready" if global_ring_epfd is ready. The global_ring_epfd
will be ready if one its cq_channel_fd is ready. Assume epoll_wait(user_epfd)
return ready with global_ring_epfd, we call epoll_wait(global_ring_epfd) to get
the ready cq_channel_fd."
Yep...
"After this call, the global_ring_epfd might still be counted as ready for
user_epfd, until another call of epoll_wait(global_ring_epfd) which return no
events will be called. As a result, calling epoll_wait(user_epfd) next time
will return ready with global_ring_epfd, although none of its cq_channel_fd is
ready. This will interfere with our polling mode, and will insert us to
interrupt handling logic when not necessary.
To avoid user_epfd being falsely ready, we call epoll_wait(global_ring_epfd) a
second time, which change its status to not-ready for the user_epfd."
Nope. That's not how epoll_wait() works. The only thing epoll_wait() can do
is tell you if any of the registered FDs have experienced any of the requested
epoll_events and in this case you ignore both the return count and returned
events. There are no side-effects of epoll_wait(). It does not change the
status of any of the registered FDs. The nested epoll FD doesn't change this
fact either.
In this code the first zero timeout epoll_wait() on the global_ring_epfd tells
you which completion channel FDs are ready. You then call
wait_for_notification_and_process_element() on each channel which does an
ibv_get_cq_event(). The ibv_get_cq_event() does a read on the channel FD which
at that point makes the global_ring_epfd no longer "ready" and thus the
user_epfd no longer "ready". The second zero timeout epoll_wait() doesn't do
anything.
Take a look at the attached nested_epoll.c example. It uses a timerfd and not
a completion channel but that doesn't matter.
"A better way to improve performance here is to remove the use of
global_ring_epfd and use directly only the relevant cq_channel_fds by adding
them directly to user_epfd."
Yes please. What got me started looking into this is that performance with VMA
for one of my applications that can't spin is actually worse than simply going
through the kernel. If you put the completion channel FDs into my epfd you can
eliminate both epoll_wait() calls on the global_ring_epfd. I thought about
attempting this but first wanted to simply remove the unnecessary one.
Original comment by [email protected]
on 6 Feb 2014 at 11:09
Attachments:
from libvma.
I looked at your sample code. You are right.
This code was added in the past because it solved some issue, which is not
clear now what the issue was.
Apparently, the explanation I got is not correct, as your sample code show.
I removed this second epoll_wait call, and I will follow-up if any issues come
up.
https://code.google.com/p/libvma/source/detail?r=c48fbb450243d98d8f8d79e3f3959a5
29112557c
Original comment by [email protected]
on 7 Feb 2014 at 1:47
from libvma.
"This code was added in the past because it solved some issue, which is not
clear now what the issue was."
Thanks Or, I think we've all been in this situation before.
Original comment by [email protected]
on 7 Feb 2014 at 2:57
from libvma.
Related Issues (20)
- can't sent/receive between internal threads with libvma HOT 4
- VMA TCP connect() call takes much longer than OS HOT 4
- Libvma : 8.9.4 arp caching HOT 4
- Use MADV_HUGEPAGE as ALLOC_TYPE_HUGEPAGES fallback
- NOT see any rocev2 packet HOT 9
- epoll_ctl EPOLL_CTL_MOD called incorrectly HOT 2
- issue: errno=111 Connection refused
- Rust TcpStream clone cannot send/recv through libvma HOT 5
- Unable to build libvma rpm follow the build instruction wiki HOT 2
- How do I fix the warning "Not enough hugepage resources for VMA memory allocation"?
- Running vma inside docker container HOT 9
- libvma uses huge amount of memory (~4x8G) with max RLIMIT_NOFILE HOT 2
- `libvma` load failed in some situation HOT 1
- clarify the usage of vma daemon (vmad) HOT 4
- Trouble Getting Timestamps in vma_recv_callback HOT 3
- failed to build libvma with gcc-12 HOT 5
- How to estimate a number of huge pages required by libvma? HOT 2
- Unable to find the libibverbs-devel header files HOT 2
- VMA support with Connectx-6 VPI HCAs HOT 3
- verify_qp_creation() QP creation failed on ConnectX-6 IPoIB interface HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libvma.