NVMe TCP offload -– implementation and performance gains

Date:

[event] [slides]

NVMe-TCP is a high-performance pipelined storage protocol over TCP which abstracts remote access to a storage controller, providing hosts with the illusion of local storage. In NVMe-TCP, each storage queue is mapped to a TCP socket. Read and write IOs operations have a unique identifier, called command identifier (CID) and servers can handle CIDs out of order to allow small IOs to bypass large IOs and improve performance. Additionally, each PDU is protected by application-layer CRC generated by senders and verified on receivers.

As presented last year in the session “Autonomous NVMe TCP offload”, in order to offload the NVMeTCP, but without the disadvantages of a full offload solution, the Linux kernel upper layer protocol (ULP) direct data placement (DDP) offload infrastructure was introduced. It provides request-response ULP protocols, such as NVMe-TCP, the ability to place response data directly in pre-registered buffers according to header tags. DDP is particularly useful for data-intensive pipelined protocols whose responses may be reordered.

In this design, copy between TCP and destination buffers is avoided by directly providing the NIC with the destination buffers. While handling the receive path, the NIC can also compute and validate CRCs, and on the transmit path it will calculate the CRCs.

In this talk, we will present the NVMeTCP direct data placement and CRC (data digest) offload design and the driver-HW interaction in order to support it. We will present the performance benefit of the offload in a variety of comparisons and under different conditions. We will also cover the challenges we had and how we were able to resolve them.