By Eric Carmes – 6WIND Founder and CEO
A few years ago, it seemed that the race for a larger number of cores was a major challenge. However, except for Tilera’s announcement of the first 100-core processor last September, increasing the number of cores does not seem to be a short-term priority anymore. The new generation of processors to be shipped in 2010 will likely use a maximum of 8 cores. What does it mean?
Are there some microelectronics limitations that prevent the integration of more cores at an affordable cost? Is a smaller number of cores running at a higher frequency a better solution for reaching an optimal ratio of processing per watt? Does it mean that applications do not really require such processing capabilities? It seems that multicore processor providers also prefer to integrate more and larger hardware engines to offload the processor itself. Others think that having heterogeneous cores for data plane and control plane may be more optimal.
I’m not a specialist so I would be happy to have comments from hardware technology experts as well as end users.
We are going to have some interesting events to analyze this trend. World Mobile Congress in mid February in Barcelona will showcase some interesting multicore applications for wireless access and infrastructure equipment and will be the right place to explore future requirements for next-generation networks. The RSA Conference in early March in San Francisco will be for sure of great interest for security applications. The Multi-Core Expo in late April in San Jose will welcome all the experts to discuss about the future of the multicore technology.
I’m looking forward to meeting with you at these events.
More information about 6WINDGate architecture can be found here.
You can check 6WINDGate FAQ here.
It is an interesting observation that the already-announced new generation multicore processors for shipping in 2010 happen to have 8 or fewer cores. As vendors like Intel, AMD, and Freescale only offered dual or qual cores in their previous generation, it is quite a step up to get to around 8 cores. If we look at the roadmap trends of Intel, AMD, Cavium Networks, and Freescale, it is obvious that new multicore processors continue to feature more cores, higher core frequency, and new and/or enhanced hardware acceleration in the form of co-processors and new instructions. In addition, the tremendous industry momentum behind multicore has led to advancements in software tools for multicore development and performance optimization. We should and can expect processors with many more standard ISA based cores to come beyond just 8 cores.
Cavium Networks has been shipping processors with up to 16 enhanced MIPS64 cores for more than 4 years. With its demonstrated core competency in designing very efficient cores and architecting hardware features which enable multicore performance scaling, Cavium Networks will offer many more cores running at significantly higher frequency in its next generation offerings.
I fully agree Cavium Networks has been a pioneer for providing 16-core Octeon processors.
However, and correct me if I’m wrong, the first Octeon II to be shipped will have 6 cores and the 32-core processor will come later. So, it seems that having a very large number of cores is not your very first priority.
Yes, the first implementation of the OCTEON II family is CN63XX with up to 6 next generation cores. The 32 cores version has not been officially announced yet, but will be shortly. As Alex has pointed out, this is a temporary state perspective.
Thanks,
Kin-Yip
I think, it is a temporary state. Multicore processors came to the world because power and thermals couldn’t move the technology forward. They’ve enabled to move the power significantly down, probably below the available limit. The next step is to run the clock again until we reach the ceiling again, and then there will be the next level jump. This is possible.
The second reason could be the move to heterogeneous architecture with more and more devices including some kind of specialized offload engine(s). Obviously, these acceleration technologies reduce the need for more computing power and bring a number of cores effectively down.
The third factor is probably related to the economy conditions, because a large number of cores correlates with high-end performance and scalability, meaning lower volumes. For example, we have seen a number of vendors going either for mobile devices market or for base station/access point market for wireless networks where volumes are high.
To add to Alex’s first point regarding motivation behind multicore. In addition to optimizing for performance per watt, multicore enables taking advantage of additional parallelism that is available at the application level and threads level. Such additional parallelism can be translated into additional speed up by going from single core to multicore. Moreover, multicore offers higher performance and more deterministic processing bandwidth for virtualization than time-sharing a virtualized processor/core. These multicore advantages will drive usage of multicore and higher core counts.
There is, however, one point required for good multicore scalability, which is the efficiency of multicore application. First of all, we have to have well-parallelized application, and we are painfully aware in the complexity of the task and the fact that the software trails the hardware here. And there is still relatively high core idle time in both single- and multi-threaded multicore implementations. It would be really good to see that hardware optimizes this idle time. For example, other cores/threads can use the resources of the idle core/thread (register file, TLB, etc.), the running core can be overclocked when another core is idle within the given power budget, etc. I know that it is not the easy task, but the fact that some processors support temporary overclocking already today, meaning that it is possible. I would really challenge all processor developers out there to have such capability.
I assume that you mean, for instance,
http://www.intel.com/technology/turboboost/
I would be even more aggressive, it would be good to see CPU instructions that allow to power off/on some units of a core, all cores or to change the frequency but in a very fast manner. From a dataplane software point of view, at least with networking, we know the states of the tables, so we can control and call those CPU instructions.
Those CPU instructions could be linked with some programmable registers for IOs in order to let the IOs turn on some cores too.
But is it doable in a 1 CPU-clock latency?
Turning parts on and off is related to the aggressive power management. You are right, power is also very important parameter. Some parts of it can be done automatically, but manual control could indeed help. I meant more the performance parameter to improve a single-thread application performance or make a better use of the core idle time. Sometimes I don’t want to save power, but I want to use available power for more processing. Sometimes, I want to save power by turning blocks on/off, bringing dynamically bus and core frequency down, etc. Both working together would be even better.
BTW, Intel had also an idea of on-chip processing domains, when you can subdivide the chip into “sub-chips”, each one is controllable separately.
To add to Alex’s first point regarding motivation behind multicore. In addition to optimizing for performance per watt, multicore enables taking advantage of additional parallelism that is available at the application level and threads level. Such additional parallelism can be translated into additional speed up by going from single core to multicore. Moreover, multicore offers higher performance and more deterministic processing bandwidth for virtualization than time-sharing a virtualized processor/core. These multicore advantages will drive usage of multicore and higher core counts.