Archive for August, 2010
By Eric Carmes – 6WIND Founder and CEO
Recently, we have seen many companies promoting pure benchmark numbers as a way to claim superiority in multicore packet processing performance. Although these arbitrary benchmarks can provide interesting data points, they deserve in-depth analysis to clarify exactly what is being measured.
Modern IP networking stacks are very complex. Over time, many new protocols have been added to the initial IP-UDP-TCP stack: Layer 2, tunneling, IPsec, NAT, QoS, firewall, IPv6, multicast etc. For each packet, a significant amount of processing capability is required to implement each protocol. So, it is attractive to optimize packet processing performance measurements by implementing shortcuts to provide the highest possible performance. However, the system-level implications of these have to be analyzed
Let me illustrate this using a simple IP forwarding example. Starting from a Fast Path implementation (refer to this post), a possible solution is to implement a flow-based cache. This accelerates IP forwarding processing once the flow is identified as cacheable after the first packet has been analyzed. Other protocols are still processed by the Fast Path
If we use the most favorable scenario with a stable traffic of cacheable flows (this is a good example of where you should be sure to carefully examine the test case each time you analyze performance), the measurements we have performed at 6WIND show that a cache implementation more than doubles IP forwarding performance on 64 byte packets.
Why does IP Forwarding run faster in this case? There are several reasons:
- The shortcut selects valid IPv4 packets using hardware flags (IPv4 packets, TTL, checksum OK…),
- The cache entry directly provides all the information needed to process the packet (MAC header, outgoing port/queue) and does not require route lookup,
- There is no cross-layer processing so only a minimal change is performed on each packet (Ethernet header + TTL update).
That’s great. However, we should also review the limitations:
- First of all, only basic IP forwarding is performed; fragmentation, for example, is not handled.
- The use case is VERY limited as it does not work if another protocol is implemented as well; encapsulated, IPsec, NAT, firewall… traffic has to be forwarded to the Fast Path and this is unfortunately almost all applications involve multiple protocols.
- This approach does not help with unstable traffic conditions. The cache is invalidated as soon as there is a change in the routing table, and then performance advantage is removed.
- The approach is not suited to short-lived traffic flows because the data in cache has to be changed often.
- The solution is not scalable. A forwarding rule for a set of IP addresses has to be expanded as a rule for each flow. There is a risk when different flows have the same hash code. Side effects such as packet de-sequencing are possible.
So using raw performance benchmarks when you design your application is very risky. I recommend the following guidelines:
- Avoid using benchmark performance results for system design.
- Avoid implementing cache mechanisms except for very well defined, stable and limited use cases.
- Use a Fast Path that already implements multiple protocols, providing a scalable and maintainable solution; this implies that if you are presented with performance benchmarks on a simple function like IPv4 forwarding, your first question should be about system performance when IP forwarding is combined with other protocols (VLAN, IPsec, firewall, NAT, IPv6…).
More information about 6WINDGate architecture can be found here.
You can download more detailed documents here.
You can check 6WINDGate FAQ here.
