NVMe-TCP is a high-performance pipelined storage protocol over TCP which abstracts remote access to a storage controller, providing hosts with the illusion of local storage. In NVMe-TCP, each storage queue is mapped to a TCP socket. Read and write IOs operations are translated to RPC operations. Each operation has a unique identifier, called capsule identifier (CID) and servers can handle CIDs out of order to allow small IOs to bypass large IOs and improve performance. Additionally, each RPC is protected by application-layer CRC generated by senders and verified on receivers.
Traditional approaches to offload NVMe-TCP require offloading all layer-4 functionality: TCP, IP, routing, QoS, NAT, firewall, tunneling, etc. This is undesirable as all of these are complex, and their overhead on bulky storage operations is easily mitigated using batching as demonstrated last year in “NVMe-over-TCP ≈ NVMe-over-RDMA”.
In this talk, we will present how we offload CPU-intensive operations that cannot be optimized away using batching or clever software engineering: copy and CRC. Our offload is independent of layer-4 functionality, i.e. we offload NVMe-TCP autonomously. The main challenges addressed with our approach: (1) handling retransmission and reordering; (2) offloading transparently to software TCP/IP.