Code Monkey home page Code Monkey logo

snabb's People

Contributors

aequabit avatar alexandergall avatar altexy avatar aperezdc avatar benagricola avatar capr avatar chrgraf avatar clopez avatar corsix avatar cryslith avatar domenkozar avatar dpino avatar eugeneia avatar greezlee avatar hanshuebner avatar hb9cwp avatar javierguerragiraldez avatar justincormack avatar kbara avatar ladc avatar lukego avatar mwiget avatar petebristow avatar pkazmier avatar plajjan avatar takikawa avatar teknico avatar tsyesika avatar wingo avatar xray7224 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snabb's Issues

Intel 82574L ethernet driver [intel.lua]: add_txbuf_tso(): Use the TCP Segmentation Offload hardware feature

Taken fom this comment in Issue 1: #1 (comment)

btw: another interesting but larger thing to hack on in this source file is the add_txbuf_tso() function that is currently just a stub. The goal is to use the TCP segmentation offload hardware features so we would have a test case transmits really big packets (~64K) and then (by loopback) receives the same data back in more smaller packets. This would be a major step towards implementing STT in the future (possibly being the first open source implementation...)

Cache Coherency

How does Snabb Switch handle cache coherency?

Here's one of the cases I don't understand. If I'm not mistaken, Intel platforms use a write back cache. Let's say the driver writes a transmit descriptor "to memory". Then the driver instructs the network controller where to find the transmit descriptor(s). At this point, IMO, the descriptor might still be in cache and might not have reached physical memory. In this case, the network controller might fetch a stale descriptor. How's this problem resolved?

Upstream efficient KVM I/O

Take the KVM I/O implementation created for issue #25 and have it merged into the main QEMU codebase so that it will be available by default in future versions.

Intel 82574L ethernet driver selftest shows unexpected hardware counter values

The Intel 82574L device driver's self-test function is showing unexpected counter values. The test attempts to transmit 100,000 packets, and optionally to receive them again with loopback mode. Displaying the hardware counters shows some values that are expected and others that are surprisingly low.

Here are example results when attempting to transmit 100,000 packets
of 1000 bytes each:

Statistics for PCI device 0000:00:04.0:
              54,306 GPTC       Good Packets Transmitted Count
          54,306,000 GOTCL      Good Octets Transmitted Count
         100,000,000 TOTL       Total Octets Transmitted (Low)
              54,306 TPT        Total Packets Transmitted 
             100,000 PTC1023    Packets Transmitted [512–1023 Bytes] Count

Why do some counters show 100,000 and others only 54,306?

Here are similar results with MAC loopback mode engaged:

Statistics for PCI device 0000:00:04.0:
             100,000 PRC1023    Packets Received [512–1023 Bytes] Count
              14,618 GPRC       Good Packets Received Count
              14,618 GPTC       Good Packets Transmitted Count
          14,618,000 GORCL      Good Octets Received Count
          14,618,000 GOTCL      Good Octets Transmitted Count
         100,000,000 TORL       Total Octets Received (Low)
         100,000,000 TOTL       Total Octets Transmitted (Low)
             100,000 TPR        Total Packets Received 
              14,618 TPT        Total Packets Transmitted 
             100,000 PTC1023    Packets Transmitted [512–1023 Bytes] Count

Curious that TPR shows 100,000 while TPT, GPRC, GPTC all show less.

Here's how to reproduce the problem:

  • Get a Linux machine with an 82574L ethernet controller. Hint: hetzner.de servers have a spare 82574L device on the motherboard. (Or send Luke a VM image and he'll boot it with the right hardware and give you remote access.)

  • Checkout and build snabb switch:

    $ git clone --recursive [email protected]:SnabbCo/snabbswitch.git
    $ cd snabbswitch
    $ make
    ...
    Firmware: 536K snabbswitch

    Run self-test

    $ src/snabbswitch

Here are ideas for how to investigate:

  • First, reproduce the issue on your own environment. It really happens, right?
  • Check the 82574L data sheet documentation to verify that it really is a problem.
  • Look for the bug in intel.lua: in TX, in RX, in statistics reading, ...
  • Get help from Intel experts on the e1000-devel mailing list.
  • Post questions and new information on this issue and get feedback.

port.lua selftest all of chur's ports simultaneously

port.lua's spam() and echo() selftest functions could be extended to work on multiple NICs in parallel instead of only one at a time.

This could be used to see how far the test cases can scale up on chur.snabb.co's 20 x 10GbE network interfaces and as the initial basis for performance investigations.

Inject diagnostic Lua code safely

Create a mechanism for injecting arbitrary Lua code into the switch to make diagnostic dumps. Has to be done in a safe way that doesn't impact operation even with buggy commands.

First experiment in this direction is taking Lua expressions and fork()ing a child to execute the query and then terminate. Could be nice if the fork() can be made cheaply enough. This way the child and introspect as much as it likes without interfering with the operation of the parent. Commit 31c78a0.

Autodetect suitable hardware

The switch should by default automatically take control over all supported Ethernet ports that are not already being used by the operating system. That is: all supported Intel NICs that are not in the "UP" interface state on Linux.

This is a matter of extending pci.lua for groveling /sys/bus/pci/devices/* to find suitable devices.

Developer manual

Create a developer manual for Snabb Switch.

One idea is to generate an Leanpub book of the complete source code listing. This would put pressure on to make the source code readable. That could be a good thing!

Parallelism: Separate hardware receive queues

Add support for running multiple forwarding engines each processing a separate hardware queue. The queues would be dispatched by the NICs based on either hash (RSS) or matching specific values in the packet header (Intel Flow Director).

DMA for VMs

The switch needs to access memory directly in virtual machines. This is how zero-copy transmit will work: the guest tells us what it wants to transmit, we figure out a stable physical address for that memory, and we tell the hardware to transmit it.

So we need to be able to map memory belonging to guest VMs (so the switch can see it) and get a stable physical address for this memory (so the hardware can transmit it).

Stand-alone snabbswitch binary with test case

The snabbswitch binary is not supposed to have any dependencies except for libc and Linux. There should be a simple procedure to verify this.

Currently some Lua code is being dynamically loaded from the filesystem (LuaJIT's jit.* modules used in main.lua) which breaks this stand-alone property. (Reproduce by copying snabbswitch to a machine that doesn't have LuaJIT installed.) Those files should be included in the snabbswitch binary and we should find a way to avoid this kind of thing happening in future e.g. build LuaJIT with options that prevent it from going looking for modules on disk.

  • Fix snabbswitch to run on a vanilla Linux machine.
  • Find a way to avoid accidental relapse (if such risk exists after above fix).
  • Create easy test procedure (something that could be automated into a nightly build).

Eliminate dropped packets in intel.lua selftest

intel.lua's selftest currently includes output like this:

             239,166 MPC        Missed Packets Count
             809,801 PRC64      Packets Received [64 Bytes] Count
             810,051 GPRC       Good Packets Received Count
           1,053,699 GPTC       Good Packets Transmitted Count
          52,095,808 GORCL      Good Octets Received Count
          67,437,504 GOTCL      Good Octets Transmitted Count
                  36 RNBC       Receive No Buffers Count
          67,439,296 TORL       Total Octets Received (Low)
          67,439,616 TOTL       Total Octets Transmitted (Low)
           1,053,750 TPR        Total Packets Received
           1,053,753 TPT        Total Packets Transmitted
           1,053,755 PTC64      Packets Transmitted [64 Bytes] Count

The challenge is to reduce MPC (Missed Packet Count) and RNBC (Receive No Buffers Count) to zero.

If the basic problem is overflow (running out of receive buffers) then the solution could involve some flow control e.g. making sure that total packets transmitted in selftest is not more than total receive buffers that have been allocated or e.g. enabling hardware flow control if that is a valid idea (not sure).

@rahul-mr could be interesting for you since you have already been hacking in this area?

Zero-copy KVM I/O

Create a mechanism for zero-copy ethernet I/O with KVM guests.

Two tricks are needed:

  1. Memory mapping magic to make guest memory visible to the switch and vice-versa.
  2. Getting stable physical addresses for guest memory to use for hardware DMA.

Compile on 32 bit system

I had to make some small mods to compile on 32 bit.

diff --git a/src/apps/vhost/vhost_client.c b/src/apps/vhost/vhost_client.c
index 31f3c77..19690a7 100644
--- a/src/apps/vhost/vhost_client.c
+++ b/src/apps/vhost/vhost_client.c
@@ -53,10 +53,10 @@ static int setup_vring(struct vhost *vhost, int index)
struct vhost_vring_file kick = { .index = index, .fd = vring->kickfd };
struct vhost_vring_file call = { .index = index, .fd = vring->callfd };
struct vhost_vring_addr addr = { .index = index,

  •                                .desc_user_addr  = (uint64_t)&vring->desc,
    
  •                                .avail_user_addr = (uint64_t)&vring->avail,
    
  •                                .used_user_addr  = (uint64_t)&vring->used,
    
  •                                .log_guest_addr  = (uint64_t)NULL,
    
  •                                .desc_user_addr  = (intptr_t)&vring->desc,
    
  •                                .avail_user_addr = (intptr_t)&vring->avail,
    
  •                                .used_user_addr  = (intptr_t)&vring->used,
    
  •                                .log_guest_addr  = (intptr_t)NULL,
                                 .flags = 0 };
    

    return (ioctl(vhost->vhostfd, VHOST_SET_VRING_NUM, &num) ||
    ioctl(vhost->vhostfd, VHOST_SET_VRING_BASE, &base) ||
    diff --git a/src/core/memory.c b/src/core/memory.c
    index c27de86..417f770 100644
    --- a/src/core/memory.c
    +++ b/src/core/memory.c
    @@ -5,6 +5,7 @@

    include <stdio.h>

    include <sys/mman.h>

    include <sys/types.h>

    +#include <inttypes.h>

    include <unistd.h>

    /// ### HugeTLB page allocation
    @@ -55,7 +56,7 @@ uint64_t phys_page(uint64_t virt_page)
    return 0;
    }
    if ((data & (1ULL<<63)) == 0) {

  • fprintf(stderr, "page %lx not present: %lx", virt_page, data);

  • fprintf(stderr, "page %" PRIX64 " not present: %" PRIX64, virt_page, data);
    return 0;
    }
    return data & ((1ULL << 55) - 1);
    diff --git a/src/lib/hardware/vfio.c b/src/lib/hardware/vfio.c
    index 1d5345a..81050d2 100644
    --- a/src/lib/hardware/vfio.c
    +++ b/src/lib/hardware/vfio.c
    @@ -88,7 +88,7 @@ uint64_t mmap_memory(void *buffer, uint64_t size, uint64_t iova, uint8_t read, u

struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map) };

  • dma_map.vaddr = (uint64_t)buffer;
  • dma_map.vaddr = (intptr_t)buffer;
    dma_map.size = size;
    dma_map.iova = iova;
    dma_map.flags = 0 |

DMA stops working; needs a reboot

Problem: DMA stops working for the Intel NIC. Cause unknown. Reboot is only known cure.

The basic symptom is a lack of any values in the NIC counters:

selftest: intel device 0000:00:04.0
NIC transmit test
intel selftest: pciaddr=0000:00:04.0 secs=1
Waiting for linkup.............. ok
Generating traffic for 1 second(s)...
Statistics for PCI device 0000:00:04.0:
NIC transmit+receive loopback test
intel selftest: pciaddr=0000:00:04.0 secs=1 receive=true loopback=true
Waiting for linkup............... ok
Generating traffic for 1 second(s)...
Statistics for PCI device 0000:00:04.0:

For full back story see discussion on commit e5d463e.

Prototype "Snabb Switch: The Book"

Format the complete Snabb Switch source code as a book. A motivated person should be able to read this book from start to finish and then understand absolutely every detail of Snabb Switch.

The book should have a table of contents, a natural ordering of sections, and enough editing that it can be read sequentially without a lot of skipping around.

The book will essentially be code but it will be presented in a logical order and with sufficient commentary to be understandable.

Prototype O&M execution mechanism

Create a throw-away prototype of a mechanism for injecting Lua code into Snabb Switch to inspect its state and create a report. The tricky part is to do this in a general and flexible way without impacting the operation of the switch at the same time.

OpenStack Quantum backend

OpenStack Quantum is a framework for creating virtual networks in a cloud computing environment similar to Amazon EC2.

The model of virtual networks is pretty simple. You dynamically create isolated ethernet "Networks", virtual machine ethernet "Ports" attached to specific networks, and each port can be associated with a "Security Group" giving simple firewall rules for incoming traffic. There are a few extra wrinkles for attaching public IP addresses and existing physical networks to these virtual networks too, but we needn't deal with that in the beginning.

Snabb Switch's plugin would probably have these properties:

  • Networks, Ports, and Security Groups would be simple internal Lua objects.
  • Traffic between machines would be tunneled using STT. (No VLANs, etc, on the wire.)
  • The plugin would translate Quantum's internal Python API calls into Lua calls.
  • Let Quantum's Python code take care of allocating IP addresses and so on.

The simplest starting point would probably be to create a simple Lua API that supports the basic operations of the Quantum API: create network, create port, create security group, and so on. The full scale working implementation will require stable implementations of switching, STT, and KVM integration.

This could be a really interesting feature for OpenStack. The current open source Quantum plugins are not spectacularly featureful. If Snabb Switch would find a niche in which to be the best Quantum plugin then we will have really accomplished something significant for the project.

To experiment with OpenStack Quatum you can first create a virtual machine with Vagrant and then install devstack inside.

Packet Filter forwarding engine

The main traffic loop copies packets between each left/right pair of ethernet ports. User-defined LuaJIT functions are applied to filter the traffic. Filters that are unable to keep up with line rate are detected and removed.

The code could be based on hub2.lua.

Layer-1 bypass hardware integration

If the packet filter software or hardware fails then it is desirable that all packets are forwarded during the outage (as opposed to dropping all packets). This can be achieved by using network hardware with a "Layer-1 bypass" function. Such hardware has a mechanical capability to physically connect the Left/Right ports together and this is triggered by either (a) power failure or (b) timeout waiting for software to respond to a healthcheck.

One suitable type of hardware would be an Intel-based Silicom Bypass adapter.

DMA for hardware

The switch needs to have memory suitable for hardware DMA. That is: we need to be able to use stable physical memory addresses when communicating with ethernet controllers.

User manual

Create a user manual describing how to use Snabb Switch.

Intel 82571GB Gigabit Controller Support

Support the Intel 82571GB Gigabit Controller.

My new NIC arrives today I believe. It may take me a little time to figure this out as I'm really starting behind the eight-ball, but I'm excited to hack around!

  • No LUA experience
  • No hardware experience
  • No device driver experience
  • Intimidation by 490 page Intel document
  • Still going through snabbswitch code (completed full review of memory, moving on the pci, and then intel).

Here is my development environment (KVM box and home networking setup):

https://www.evernote.com/shard/s167/sh/bf598a18-dd86-46cb-9469-24b893763749/234c1f88e7527253c47f164e00cae259

I guess I'll have to figure out how to expose the new NIC directly to the guest as well.

Heap corruption with virtio/vhost

Receiving a lot of packets over virtio/vhost seems to eventually cause heap corruption and a SIGSEGV. Looks to me like parts of the vring structures are being overwritten somehow (by the kernel? thinking they are bigger than we do?) and bad things happen when random numbers get written into descriptor tables and used as DMA addresses.

Could be that transmit is also faulty but we don't notice because it simply transmits the wrong memory instead of mangling it and causing a crash.

To reproduce, edit selftest() in virtio.lua to use the "echo" program instead of "spam" and to run for (say) 600 seconds.

I cannot reproduce the problem at the time of writing this issue, but it exists.

DMA for hardware non-intrusively

Create a better solution than #6 for DMA. Specifically: one that doesn't require booting the Linux kernel with special parameters.

There is a sketch of one possible solution in commit 46bb8e3 on the virt2phys branch. This solution is to use normal application RAM and create a mechanism to map this onto physical RAM in a stable way. Maybe the solution, maybe not, and not yet tested even once let alone debugged.

Design efficient KVM I/O

Design a method for integrating Snabb Switch with KVM for efficient ethernet I/O.

This is a follow-up to the simple prototype described in Issue #10.

FDB aging

Time out old entries in the forwarding database.

Link state (up/down) propagation

The link state ("link up" and "link down") should always be the same for Left and Right. The packet filter is responsible for detecting "link down" and "link up" events and propagating them to the matching link.

This requires both driver support to get/set physical link status and application support to keep synchronization.

The function can be tested by physically connecting and disconnecting links and ensuring that link status is always the same on each adjacent node.

Describe FDB

Describe the forwarding database for diagnostic purposes. How many entries, etc.

Compile time dependency on <linux/vfio.h>

Snabb Switch depends on the header file linux/vfio.h but this is not present on common platforms such as Ubuntu 12.10.

Snabb Switch should compile and run even on old versions of Linux. The advanced features like VFIO should be enabled at runtime if present and otherwise disabled.

So to fix this problem we need to:

  • Break the compile-time dependency on linux/vfio.h. For example, by creating our own vfio.h that is a compatible subset.
  • Ensure that Snabb Switch still runs on systems that don't have VFIO kernel support.

First reported by Jianyong Chen on snabb-devel: https://groups.google.com/forum/#!topic/snabb-devel/Pt_TAREbQvI

Wiki Update on Hacking Page

Hi Luke,

I compiled and ran snabbswitch on my debian box, which just passes the memory self test for now, but I'm getting ready for the arrival of my Intel card next week. I wanted to share two points for your Hacking page that might help another ultra-noob:

  1. When running snabbswitch, ensure you have root privileges (sudo).
  2. When passing the kernel boot parameter, one might have to quote the value, as I found the $0 was expanded to a grub pathname and my kernel failed to boot.

Pete

Create high-level ethernet port selftest

Create a high-level ethernet port selftest function in port.lua that tests basic Ethernet functionality. For example, generates a substantial set of unique packets, writes them all to the NIC in loopback mode, and verified that exactly the same packets are read back.

Ethernet selftest is very weak today and only looks at some counters rather than actually verifying functionality. This means the driver code can be totally and utterly broken without ringing alarm bells.

FUSE to emulate Linux kernel networking features

This is a feature and design idea: to write a compatible clone of certain Linux kernel interfaces based on a FUSE filesystem implemented by Snabb Switch.

Linux exports a lot of networking functionality via the file system. For example, to create a "tap" device you open /dev/net/tun with open(), configure it with ioctl(), and access it with read() and write(). To accelerate this traffic you would open /dev/net/vhost-net and use ioctl() to setup a shared-memory interface for I/O on your existing tap device. And under /sys you have a rich set of files telling you all about the hardware and operating system and letting you manipulate things e.g. bind/unbind device drivers.

There are interesting applications that use these interfaces. For example, KVM uses /dev/net/tun and /dev/net/vhost-net for Ethernet I/O.

Could we implement a fully compatible clone of /dev/net/tun and /dev/net/vhost-net based on a userspace filesystem? That way KVM could do zero-copy I/O towards Snabb Switch with little or no modification. And normal applications could open interfaces towards Snabb Switch too.

Question is whether this would be more or less work than directly modifying the most important applications (KVM, libpcap) to talk directly with Snabb Switch via some other mechanism of our invention. Could be that this approach can't work at all....

Dimensioning guide

Describe the performance of the switch with a small set of workloads and hardware platforms. Help people to understand how it will perform for their own uses and make their planning easier.

Implement "Snabb Switch: The Book"

Cleanly implement the machinery needed for formatting the source code into book format as described in issue #28.

The result will read very badly. That's fine. Once the machinery is working it will be much easier to edit.

Linux tuntap host I/O support

Talk to the Linux host using a tuntap interface. This makes it possible for the host machine to communicate with the switch.

Simple prototype

Create a throw-away prototype mechanism for direct communication between the switch and KVM guests. To see how KVM works.

intel.lua: function protected(): line 42: incorrect calculation of bound variable for multi-byte types

A small bug: bound is calculated like so:

local bound = (size + 0ULL) / ffi.sizeof(type)

so when it is used for a multi-byte type (uint16_t for e.g.) will cause run-time assertion error even for correct indices.

For e.g:

local total_len = protected("uint16_t", context, 14+2, 1) 
total_len[0] --RUNTIME ERROR:  assertion failure

Calculating bound as follows seems to fix the problem:

local bound = ((size * ffi.sizeof(type)) + 0ULL) / ffi.sizeof(type)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.