Archive for November, 2009

By Vincent Jardin – 6WIND CTO

I read here and there that it is not possible to have an efficient portable solution for multicore packet processing. In other words, only the chip-set provider should be able to provide packet processing for the following reasons:

  1. My CPU is too complex, so complex that only me – the CPU vendor – can master it!
  2. No other company can have enough skills to understand how to develop properly on my CPU under my environment,
  3. Networking protocols are themselves so complex and so deeply integrated with the processor that I am the only one to be able to develop such networking protocols,
  4. As the CPU provider delivers excellent performance benchmarks on basic networking protocols, I am the most qualified one to develop complex protocols.

The history of software is full of examples that simply demonstrate these assumptions are not true:

  • How can we get efficient compilers knowing they were not developed from scratch by the CPU vendor?
  • How can we get efficient OS knowing they were not developed by the CPU’s architects?
  • How can we get efficient 3D games that were not developed by the GPU’s architects?

In fact, it is just a standard cycle for new technologies. New technologies need to be integrated in an ecosystem. At the very beginning, it is led by the provider of this technology. Then, when this technology becomes popular, the ecosystem takes the lead. All the software we mentioned here above has been developed thanks to a strong ecosystem sponsored by the CPU vendors simply because the CPU vendor is not able to bear all the costs for software development.

So, why would it be different for multicore and especially for packet processing engines?

At 6WIND, we of course think that developing an efficient portable solution for multicore packet processing is possible… How can it be done knowing the Fast Path part is developed outside the OS?

First of all, the software architecture has to be designed to be portable. An abstraction API layer (we call it FPN – Fast Path Networking SDK ) has to be defined and used by all the Fast Path modules.

Then a dedicated process has to be defined to effectively port the software:

  1. Study the CPU in detail
  2. Do a development breakdown for this CPU
  3. Do the porting phases using the FPN SDK; all the hardware accelerators of this CPU have to be included into the FPN SDK (crypto-engines, HW queues for QoS, inter-core communication)
  4. Validate the porting (refer here under).
  5. Then profile your code (checking all the profiling and debugging counters, count the instructions, check the assembly code from the compilers) and read the CPU specs again with the help of the CPU vendor in order to understand what you observe.
  6. Validate again…

Validation is of course important. A real packet processing engine integrates a very large number of complex protocols. Both protocol behaviour and performance have to be tested as a whole. The best solution is to have a robot that can test periodically all the protocols of your protocol engines.

It is also very important to develop specific tools to speed up the development and validation process. At 6WIND, we developed the Virtual Fast Path concept that runs the Fast Path in the user land on a PC or QEMU environment.

Once a Fast Path module is available under the Virtual Fast Path, the validation robot will automatically check on every platform that it cross compiles and that it runs without any regressions (protocol and performance regressions).

Using this development process, we have successfully ported our software on market-leading multicore platforms. A classical question is about the performance penalty of a portable solution compared to a per-CPU optimized one. Our answer is:

  • There could be performance penalty on very simple protocols,
  • This performance penalty becomes negligible as soon as packet processing integrates more complex protocols because the software architecture itself is more important than low-level optimizations,
  • Portability provides the end user with a much more flexible solution,
  • Compared to low-level optimisations, a generic solution scales better when protocols are stacked because it avoids redesign;
  • It is better to have generic packet processing software from a provider that guarantees evolutions because it is its core business.

More information about 6WINDGate architecture can be found here.

You can download more detailed documents here.

You can check 6WINDGate FAQ here.

VN:F [1.9.6_1107]
Rating: 9.3/10 (4 votes cast)
VN:F [1.9.6_1107]
Rating: +1 (from 1 vote)
Subscribe to the Forum
Categories
Archives