Code Monkey home page Code Monkey logo

Comments (8)

tbarbette avatar tbarbette commented on August 23, 2024

Hi,

For the dpdk-pools performances :
The dpdk-pools allows to completly get rid of the Packet object from Click. In DPDK the descriptor data (length, L3 layer offset, user-metada, ...) is at the beginning of the buffer, so there is only one "memory object" per packet. So we replaced every "packet descriptor accessor" of the Click Packet object with a wrapper around the DPDK equivalent function, and it works (while maybe a little untested for some functions). We thought that removing an unnecessary structure, and an unnecessary copy of packet length, metadata, ... would enhance the speed, but it didn't.

The problem is that to keep all Click functionalities inside the DPDK packet metadata (including annotations), we need two cacheline at the beginning of the mbuf, so we had two cache miss for every packet. But when we use Click Packets linked to the mbuf data itself, but re-using always the same (burst-size) Packet objects, we have only one cache miss, as packets are recycled in Click inside the ToDPDKDevice, and reused at next run in the FromDPDKDevice, therefore there is absolutely no cache miss with the Packet objects.

This results would probably go with low-size cache processors, but ours have 20MB of cache, so the Packet objects stay there.

For the clone, you touch a big problem we have with staying "click fully-compatible" or going for optimized speed.

The problem is that when you arrive in ToDPDKDevice, you don't know if your other shared packet can be modified or not, therefore you cannot give it to DPDK as it, you have to copy it. Imagine that the cloned version of the packet is then modified? We cannot detect this case, and that's a general problem in Click. The rte_pktmbuf_refcnt_update will prevent DPDK from freeing the packet twice, but it does not prevent changing the data of the clone. Clearly, Click (so FastClick...) needs a "read-only copy". And more than that, a "read-only and do not even care if I want to edit the data of the other packet".... In the first case, you could have packet which is cloned by Tee for example, then the read-only copy goes to an element for some statistics about the packet content, however the first original packet goes to an element which changed its content... If the packet is modified before going the statistic element, you'll have the wrong result, even if the second one is read-only.

To (very partially and awfully) solve that problem, I added a boolean to clone() in one of the last commit. If that parameter (fast clone) is set to true, you'll receive a false clone, with no shared() arguments. The cloned packet will just have the same buffer address but is considered as a normal packet by Click. It also means that if this packet is normally killed, it will corrupt the memory as you'll free a DPDK packet as a normal malloc'ed memory... However the is_dpdk_packet() function will see that the buffer comes from the DPDK pool, and in that case ToDPDKDevice will catch the packet and use zero-copy. The original packet will flow through the pipeline and when killed, call DPDKDevice::free_packet (the destructor).

That's awfull hack, but work with my use-case... I could use rte_pktmbuf_refcnt_update and set the destructor of the quick clone to DPDKDevice::free_packet so if it's killed there is no problem. But I wanted to have the same solution for Netmap, and netmap does not provide a buffer reference counter... Also, this does not prevent the problem about the buffer being modified in the other path mentioned just before.

It'd be happy to have your thoughts. Re-reading your comment, I think we agree on the "don't copy even if shared" mechanism, but I'd like to find a nice and clean solution.

I'll answer to your other issue ;)
Tom

from fastclick.

davidek avatar davidek commented on August 23, 2024

Hi Tom,

Thanks a lot for the detailed (and quick) reply. Concerning dpdk-pools I now have a clearer idea.

Regardning modifiable packets, I'm not sure I'm still missing something:

you don't know if your other shared packet can be modified or not

To my (limited) understanding, I thought Packets were all read-only unless you explicitely uniqueify (or push or pull) them (but they will be turned back into read-only Packets at the element boundary).
To be honest, I can't think of a situation when a packet's content could be altered after it has been sent to output, perhaps I'm missing some edge cases? Some trials follow, but I believe they all keep the correct readonlyness for packets.

If this turned out to be true, i.e. if all packets arriving at ToDPDKDevice were to be indeed read-only, I wouldn't see any problem in implementing clone() to point to the same mbuf. Two strategies come to my mind:

  1. Also increase the dpdk refcount. Killng a packet woud then require to always decrease the refcount of the mbuf, even if it's shared.
  2. (seems more convenient to me right now, but I may be wrong) Do not increase the mbuf's refcount. This would mean to always increase that refcout before sending, leaving to the packet's destructor the burden of freeing it, whenever the Packet's refcount gets to zero.

Would you see any flow in this approach?

Thanks a lot for your help, I really hope we can get this to work out without introducing incompatibilities with standard Click.

Davide

 fromN :: FromDPDKDevice(N); toN :: ToDPDKDevice(N);
 tee :: Tee();  drop :: Drop()
 edit :: (element that calls uniquify(), edits, then forwards)
 checktype :: switches data to port 0, queries to port 1
 cache :: (element which "steals" buffers for later retrieval)
   data packets are eaten. queries are dropped if they trigger data output:
   {data}  -> [0]cache // may cause an older Packet to be kill()ed
   {query} -> [1]cache[0] -> {data fetched. cloned() before outputting}
                 cache[1] -> {query forwarded only if no match}

from1 -> edit -> tee => to1, to2
edit operates in-place, then packet is cloned into 2 read-only packets sharing the same dpdk buffer

  from1 -> tee =>
    (edit -> to1),
    to2

edit sees that the incoming packet is shared(), so uniquify() triggers a full copy, which in turn kills the original packet once, putting the ref counter back to 1. to1 gets a non-dpdk buffer (unless the copy is done to a buffer from the dpdk pool? not sure how it's implemented). to2 gets a non-shared packet that can be forwarded directly.

  from1 -> tee =>
    editX -> to1,
    editY -> to2

editX behaves like above. editY, instead, sees that the incoming packet is not shared(), so uniquify() allows in-place modification. to1 and to2 behave like above.

  /*data*/  from1 -> tee => [0]cache, to2;
  /*query*/ from2 -> [1]cache[1] -> to1;
  /*data*/              cache[0] -> to2;

This is an embarassingly simplified version of our use-case.
Data: both to2 and [0]cache get readonly versions sharing the same buffer. to2 will kill() once, decreasing the refcount (but allowing the mbuf to survive).
Failed query: nothing special, the query is forwared.
Successful query: the incoming query is killed normally; cache runs a clone before pushing the data packet, so to2 sees a shared readonly packet, which is sent as-is (with the mbuf surviving)

  /*data*/  from1 -> edit -> tee => [0]cache, to2;
  /*query*/ from2 -> [1]cache[1] -> to1;
  /*data*/              cache[0] -> to2;

Same as above, edit is in-place.

  /*data*/  from1 -> tee => [0]cache, (edit -> to2);
  /*query*/ from2 -> [1]cache[1] -> to1;
  /*data*/              cache[0] -> to2;

The edit element will trigger a full copy, causing the original Packet's refcount to be decreased back. Not zero-copy anymore, but the semantic is kept.

  /*data*/  from1 -> tee => [0]cache, to2;
  /*query*/ from2 -> [1]cache[1] -> to1;
  /*data*/              cache[0] -> edit -> to2;

Just like 6

from fastclick.

tbarbette avatar tbarbette commented on August 23, 2024

I wrote too quickly... it's true that all Packets are RO and will be uniqueified if we need to modify them. If we update the DPDK refcount on clone, we can give the mbuf to DPDK and the buffer will be effectively freed when all cloned Packet are killed.

Your second idea seems the better, but no need to always update the refcnt. We can do it only if the Packet is shared, I think. If it's not shared it can be freed by DPDK directly after sending, if it is shared it will be killed the last call to the cloned packet->kill() or DPDK after the packet is sent, the one which happen the last.

I still have to think it through for Netmap as there is no refcnt...

(PS : answer edited, I was going for the 1 but the 2 with my remark seems best).

from fastclick.

davidek avatar davidek commented on August 23, 2024

Quick update: After some digging I thought that it would be enough to hack into the get_mbuf function and increase the refcnt, but this appears to have issues when sending from a thread that is on another NUMA node wrt the NIC card (packets appear to get dropped). Will keep working on it.

(note this patch applies cleanly over the one I wrote for #6, but it should give the idea anyway)

diff --git a/elements/userlevel/todpdkdevice.cc b/elements/userlevel/todpdkdevice.cc
index 9597398..7e2e02c 100644
--- a/elements/userlevel/todpdkdevice.cc
+++ b/elements/userlevel/todpdkdevice.cc
@@ -125,10 +125,15 @@ inline struct rte_mbuf* get_mbuf(Packet* p, bool create=true) {
     #if CLICK_DPDK_POOLS
     mbuf = p->mb();
     #else
-    if (likely(DPDKDevice::is_dpdk_packet(p) && !p->shared())) {
-        /* If the packet is an unshared DPDK packet, we can send
-         *  the mbuf as it to DPDK*/
+    if (likely(DPDKDevice::is_dpdk_packet(p))) {
+        /* If the packet is a DPDK packet, we can send
+         * the mbuf as it is to DPDK */
         mbuf = (struct rte_mbuf *) p->destructor_argument();
+        if (unlikely(p->shared())) {
+            /* The mbuf is shared, don't let dpdk free it once sent */
+            rte_pktmbuf_refcnt_update(mbuf, 1);
+        }
+        assert(rte_pktmbuf_is_contiguous(mbuf));  // multi-mbuf not supported
         rte_pktmbuf_pkt_len(mbuf) = p->length();
         rte_pktmbuf_data_len(mbuf) = p->length();
         static_cast<WritablePacket*>(p)->set_buffer(0);

from fastclick.

tbarbette avatar tbarbette commented on August 23, 2024

This code has another problem : if the shared packet we receive is the original, not the copy (the copy does not have the destructor to DPDKDevice::free_pkt, but have the _data_packet set to the original packet), it won't work. The buffer of this packet will be reseted (last line). If the copy is killed() after this happens, it will call the original's destructor which will do nothing as the buffer is reseted, and the DPDK mbuf won't be called.

I would more do something like this :
rte_pktmbuf_pkt_len(mbuf) = p->length();
rte_pktmbuf_data_len(mbuf) = p->length();
if (unlikely(p->shared())) {
rte_mbuf_refcnt_update(mbuf, 1);
} else {
static_cast<WritablePacket*>(p)->set_buffer(0);
}

In the three cases :

  • Packet is unshared : buffer is reseted in packet, DPDK will trash the packet.
  • Packet is shared, we have the original :
    • refcnt is updated
    • the original is not killed and freed, as it's use_count will be decremented to 1 upon kill
    • when the copy is killed, the original's destructor is called and effectively free the mbus
  • Packet is shared, we have the copy
    • refcnt is updated
    • the copy is freed, and we set the original use_count to 1
    • when the original is killed, it calls the destructor and effectively free the mbuf

from fastclick.

tbarbette avatar tbarbette commented on August 23, 2024

I think the problem with the thread is probably linked to issue #2, I'll push just after you send the PR for your other issue.

from fastclick.

davidek avatar davidek commented on August 23, 2024

Indeed, I had missed some details of the cascaded kill().

Meanwhile, I digged into the threading issue and I thought it had to do with the timer being scheduled on the wrong thread:

diff --git a/elements/userlevel/todpdkdevice.cc b/elements/userlevel/todpdkdevice.cc
index e7b4f11..50f5040 100644
--- a/elements/userlevel/todpdkdevice.cc
+++ b/elements/userlevel/todpdkdevice.cc
@@ -95,6 +95,7 @@ int ToDPDKDevice::initialize(ErrorHandler *errh)
         if (_timeout >= 0) {
             _iqueues.get_value(i).timeout.assign(this);
             _iqueues.get_value(i).timeout.initialize(this);
+            _iqueues.get_value(i).timeout.move_thread(i);
         }
     }

from fastclick.

tbarbette avatar tbarbette commented on August 23, 2024

Nice catch, just use the thread_for_queue() function as the assignment may not be a one to one mapping.

from fastclick.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.