By Eric Carmes – 6WIND Founder and CEO
When I do presentations about multicore packet processing software, I often have the following two questions: What are the possible Fast Path implementations for multicore architectures? What are the main criteria for selecting the best option?
I’m going to answer the first question in today’s post. The main selection criteria will be discussed in a further post.
First of all, we have discussed several times here in the Forum why a Fast Path is required to extract the highest level of performance from a multicore platform. In typical networked applications, 90+% of the workload is sophisticated packet processing. Operating system overhead limits the performance of standard networking stacks for these functions. A dedicated “Fast Path” within the data plane offloads packet header inspection to accelerate networking, telecoms and security applications.
Now, the main question is how to implement this Fast Path.
The Fast Path can run in two different environments: the OS environment or, if available for the multicore processor you selected, the executive environment (often named “bare-metal”).
Another important question concerns the I/O drivers (Ethernet, crypto engines…). Standard OS drivers can be used but the Fast Path can also use specific executive drivers that will be more efficient. Note also that the Fast Path could use virtualized I/Os but this implementation will not be considered in the rest of this post as the only major difference with OS I/O is likely to be lower performance.
The following figure describes the different possible options, taking Linux as an example for the OS.
Beside the standard Linux solution (solution 0) without Fast Path, we have three different Fast Path implementations:
- Linux kernel (Solution 1): This solution does not require any executive environment as it runs under the Linux kernel environment and uses Linux I/O drivers. The cores are shared with the application and Linux mechanisms such as affinities are used to dedicate processes to cores. This solution is straightforward in terms of integration and development; a single environment is required and standard OS development tools can be reused. The performance of the kernel implementation is around 40% of the executive Fast Path.
- Hybrid (Solution 2): The Fast Path runs as a Linux application and uses executive I/O drivers to provide performance close (70 to 90%) to an executive implementation. By using standard Linux options like affinities, scheduler tuning etc., Fast Path process can be allocated to specific cores to provide more processing capabilities. However, the Fast Path performance cannot be strictly guaranteed. As described in this post, a variant of this solution uses a constrained Linux for the Fast path; a constrained Linux environment is one without thread and timer scheduling and interrupt handling. This variant could provide slightly better performance but the standard Linux debugging and development tools cannot be used anymore.
- Executive (Solution 3): This solution has been described in different posts in the Forum (including this one). The Fast Path runs as an application in the executive environment. It uses dedicated cores and executive I/O drivers. This solution provides the highest level of performance (typically 10x compared to a standard Linux without any Fast Path). As the Fast Path runs on dedicated cores, the performance can be fully guaranteed.
More information about 6WINDGate architecture can be found here.
You can download more detailed documents here.
You can check 6WINDGate FAQ here.
