ShRing: Networking with Shared Receive Rings

Boris Pismenny, Adam Morrison, Dan Tsafrir

To appear in OSDI '23: USENIX Symposium on Operating Systems Design and Implementation, 2023

[paper] [definitive] [slides]

Multicore systems parallelize to accommodate incoming Ethernet traffic, allocating one receive (Rx) ring with ≥1Ki entries per core by default. This ring size is sufficient to absorb packet bursts of single-core workloads. But the combined size of all Rx buffers (pointed to by all Rx rings) can exceed the size of the last-level cache. We observe that, in this case, NIC and CPU memory accesses are increasingly served by main memory, which might incur nonnegligible overheads when scaling to hundreds of incoming gigabits per second.

To alleviate this problem, we propose “shRing,” which shares each Rx ring among several cores when networking memory bandwidth consumption is high. ShRing thus adds software synchronization costs, but this overhead is offset by the smaller memory footprint. We show that, consequently, shRing increases the throughput of NFV workloads by up to 1.27x, and that it reduces their latency by up to 38x. The substantial latency reduction occurs when shRing shortens the per-packet processing time to a value smaller than the packet interarrival time, thereby preventing overload conditions.