What is Vector Packet Processing (VPP)?
The VPP platform is an extensible framework that provides out-of-the-box production quality switch/router functionality. It is the open source version of Cisco's Vector Packet Processing (VPP) technology. This a high performance, packet-processing stack that can run on commodity CPUs. The framework allows anyone to "plug in" new graph nodes without the need to change core/kernel code.
The VPP platform is built on a ‘packet processing graph’. This modular approach means that anyone can ‘plugin’ new graph nodes. Extensibility then becomes rather simple, and it means that plugins can be customized for specific purposes.
Why is it called vector processing? VPP uses vector processing as opposed to scalar processing. Scalar packet processing refers to the processing of one packet at a time. That older, traditional approach entails processing an interrupt, and traversing the call stack (a calls b calls c... return return return from the nested calls... then return from Interrupt). That process then does one of 3 things: punt, drop, or rewrite/forward the packet.
The problem with that traditional scalar packet processing is:
- thrashing occurs in the I-cache (instruction cache)
- each packet incurs an identical set of I-cache misses
- no workaround to the above except to provide larger caches
By contrast, vector processing processes more than one packet at a time.
One of the benefits of the vector approach is that it fixes the I-cache thrashing problem. It also mitigates the dependent read latency problem (pre-fetching eliminates the latency).
This approach fixes the issues related to stack depth / D-cache (data cache) misses on stack addresses. It improves "circuit time". The "circuit" is the cycle of grabbing all available packets from the device RX ring, forming a "frame" (vector) that consists of packet indices in RX order, running the packets through a directed graph of nodes, and returning to the RX ring. As processing of packets continues, the circuit time reaches a stable equilibrium based on the offered load.
As the vector size increases, processing cost per packet decreases because you are amortizing the I-cache misses over a larger N.