<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

net_device_table_mgr: Remove unnecessary epoll_wait about libvma HOT 5 CLOSED

mellanox commented on August 15, 2024

net_device_table_mgr: Remove unnecessary epoll_wait

from libvma.

Comments (5)

GoogleCodeExporter commented on August 15, 2024

This epoll_wait call actually have a secondary effect.
We are using nested epfds in the following way:
User epfd <- global_ring_epfd <- CQ_channel_fd

Every CQ_channel_fd is added to the global_ring_epfd, and the global_ring_epfd 
is added to the user_epfd [epfd_info.cpp +224]. When we do 
epoll_wait(user_epfd) [epoll_wait_call.cpp +180], it will return 
"global_ring_epfd ready" if global_ring_epfd is ready. The global_ring_epfd 
will be ready if one its cq_channel_fd is ready. Assume epoll_wait(user_epfd) 
return ready with global_ring_epfd, we call epoll_wait(global_ring_epfd) to get 
the ready cq_channel_fd. After this call, the global_ring_epfd might still be 
counted as ready for user_epfd, until another call of 
epoll_wait(global_ring_epfd) which return no events will be called. As a 
result, calling epoll_wait(user_epfd) next time will return ready with 
global_ring_epfd, although none of its cq_channel_fd is ready. This will 
interfere with our polling mode, and will insert us to interrupt handling logic 
when not necessary.
To avoid user_epfd being falsely ready, we call epoll_wait(global_ring_epfd) a 
second time, which change its status to not-ready for the user_epfd.

Instead of removing this call, I now call it only when in polling mode 
(VMA_SELECT_POLL != 0).

Original comment by [email protected] on 6 Feb 2014 at 10:07

from libvma.

GoogleCodeExporter commented on August 15, 2024

please look at this and test if it still improve performance for you:
https://code.google.com/p/libvma/source/detail?r=347e1795d852fa935406bb8a9fa6cc2
3df4e2ca1

Generally, for epoll we have the ability to know exactly which CQ channels we 
need to listen to, without the need to ask the sockets for them, since when 
calling epoll_ctl we keep this info.
A better way to improve performance here is to remove the use of 
global_ring_epfd and use directly only the relevant cq_channel_fds by adding 
them directly to user_epfd. 
(Alternatively, we can have a special epfd that iomux only the relevant 
cq_channel_epfds for this user_epfd and will replace the global_ring_epfd, but 
this option is less good, as it require additional epfd for each user_epfd).
This change will require much more work, and we are not planning to do it in 
the near future.
We will be happy to get a patch if you like to try and go in this direction.

Original comment by [email protected] on 6 Feb 2014 at 10:31

Changed state: Fixed

from libvma.

GoogleCodeExporter commented on August 15, 2024

"Every CQ_channel_fd is added to the global_ring_epfd, and the global_ring_epfd 
is added to the user_epfd [epfd_info.cpp +224]. When we do 
epoll_wait(user_epfd) [epoll_wait_call.cpp +180], it will return 
"global_ring_epfd ready" if global_ring_epfd is ready. The global_ring_epfd 
will be ready if one its cq_channel_fd is ready. Assume epoll_wait(user_epfd) 
return ready with global_ring_epfd, we call epoll_wait(global_ring_epfd) to get 
the ready cq_channel_fd."

Yep...

"After this call, the global_ring_epfd might still be counted as ready for 
user_epfd, until another call of epoll_wait(global_ring_epfd) which return no 
events will be called. As a result, calling epoll_wait(user_epfd) next time 
will return ready with global_ring_epfd, although none of its cq_channel_fd is 
ready. This will interfere with our polling mode, and will insert us to 
interrupt handling logic when not necessary.
To avoid user_epfd being falsely ready, we call epoll_wait(global_ring_epfd) a 
second time, which change its status to not-ready for the user_epfd."

Nope.  That's not how epoll_wait() works.  The only thing epoll_wait() can do 
is tell you if any of the registered FDs have experienced any of the requested 
epoll_events and in this case you ignore both the return count and returned 
events.  There are no side-effects of epoll_wait().  It does not change the 
status of any of the registered FDs.  The nested epoll FD doesn't change this 
fact either.

In this code the first zero timeout epoll_wait() on the global_ring_epfd tells 
you which completion channel FDs are ready.  You then call 
wait_for_notification_and_process_element() on each channel which does an 
ibv_get_cq_event().  The ibv_get_cq_event() does a read on the channel FD which 
at that point makes the global_ring_epfd no longer "ready" and thus the 
user_epfd no longer "ready".  The second zero timeout epoll_wait() doesn't do 
anything.

Take a look at the attached nested_epoll.c example.  It uses a timerfd and not 
a completion channel but that doesn't matter.

"A better way to improve performance here is to remove the use of 
global_ring_epfd and use directly only the relevant cq_channel_fds by adding 
them directly to user_epfd."

Yes please.  What got me started looking into this is that performance with VMA 
for one of my applications that can't spin is actually worse than simply going 
through the kernel.  If you put the completion channel FDs into my epfd you can 
eliminate both epoll_wait() calls on the global_ring_epfd.  I thought about 
attempting this but first wanted to simply remove the unnecessary one.

Original comment by [email protected] on 6 Feb 2014 at 11:09

Attachments:

nested_epoll.c

from libvma.

GoogleCodeExporter commented on August 15, 2024

I looked at your sample code. You are right.

This code was added in the past because it solved some issue, which is not 
clear now what the issue was.
Apparently, the explanation I got is not correct, as your sample code show.
I removed this second epoll_wait call, and I will follow-up if any issues come 
up.

https://code.google.com/p/libvma/source/detail?r=c48fbb450243d98d8f8d79e3f3959a5
29112557c

Original comment by [email protected] on 7 Feb 2014 at 1:47

from libvma.

GoogleCodeExporter commented on August 15, 2024

"This code was added in the past because it solved some issue, which is not 
clear now what the issue was."

Thanks Or, I think we've all been in this situation before.

Original comment by [email protected] on 7 Feb 2014 at 2:57

from libvma.

net_device_table_mgr: Remove unnecessary epoll_wait about libvma HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent