<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Disadvantages of Multi-Threading in Next-Generation Multicore Processors</title>
	<atom:link href="http://www.multicorepacketprocessing.com/disadvantages-of-multi-threading-in-next-generation-multicore-processors/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.multicorepacketprocessing.com/disadvantages-of-multi-threading-in-next-generation-multicore-processors/</link>
	<description>A forum about multicore networking software</description>
	<lastBuildDate>Fri, 14 Oct 2011 12:13:45 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
	<item>
		<title>By: Tatiana Griffin</title>
		<link>http://www.multicorepacketprocessing.com/disadvantages-of-multi-threading-in-next-generation-multicore-processors/comment-page-1/#comment-423</link>
		<dc:creator>Tatiana Griffin</dc:creator>
		<pubDate>Tue, 05 Jan 2010 19:17:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=533#comment-423</guid>
		<description>I fail to understand why there is a presupposition that in the data plane that you can always complete your processing with 100% (or near 100%) pipeline efficiency.  A data plane engine can poll, process, queue and repeat, but there is no presumption that those all have zero waits or no dependencies.

Let&#039;s take for example just a simple L3 forwarding case: there are two essential lookups, one for the route to yield the next-hop, and another for the link-layer address of the next-hop.  When you have thousands of routes in hundreds of VRF contexts, you have no semblance of assured data availability (even if you were to implement some intelligent pre-fetching).

There is a misconception (which befuddles me) that multi-threading only benefits multi-tasking at an OS level; this is in fact far from the truth.  Multi-threading actually benefits packet processing environments in a much larger scale, because you cannot avoid latency specifically in run-to-completion models.

Conversely, if you presume that there is never any idle cycle, then a 3 GHz core can conceptually outperform 3 individual 1 GHz cores because a 3 GHz core could finish the job in a third of the time.

Clearly, that is never the case because in packet processing that today involves upwards of 10 lookups/packet with hundreds of thousands, if not millions of entries per table, controlling (or predicting) pipeline behavior on a cycle-by-cycle basis is impossible.

The one postulate I can agree with, is that the demands on all cores or threads are not symmetrical: this is exactly a source of advantage for a hardware thread with a single core where even if all threads are executing the same instruction stream, they are not executing the exact same instruction (or operating on the same object) at a given cycle.

You can also surely run a lightweight run-to-completion model with a multi-threaded core, and certainly is the better way to accomplish ideal performance.  Linux is horrible as a packet processing framework, so it generally doesn&#039;t behoove associating it with any performance or throughput related discussion.

I would be curious to see a multi-core vendor backup the hypothesis that the run-to-completion model somehow allows them to reach 100% pipeline efficiency in packet processing environments.

-TG</description>
		<content:encoded><![CDATA[<p>I fail to understand why there is a presupposition that in the data plane that you can always complete your processing with 100% (or near 100%) pipeline efficiency.  A data plane engine can poll, process, queue and repeat, but there is no presumption that those all have zero waits or no dependencies.</p>
<p>Let&#8217;s take for example just a simple L3 forwarding case: there are two essential lookups, one for the route to yield the next-hop, and another for the link-layer address of the next-hop.  When you have thousands of routes in hundreds of VRF contexts, you have no semblance of assured data availability (even if you were to implement some intelligent pre-fetching).</p>
<p>There is a misconception (which befuddles me) that multi-threading only benefits multi-tasking at an OS level; this is in fact far from the truth.  Multi-threading actually benefits packet processing environments in a much larger scale, because you cannot avoid latency specifically in run-to-completion models.</p>
<p>Conversely, if you presume that there is never any idle cycle, then a 3 GHz core can conceptually outperform 3 individual 1 GHz cores because a 3 GHz core could finish the job in a third of the time.</p>
<p>Clearly, that is never the case because in packet processing that today involves upwards of 10 lookups/packet with hundreds of thousands, if not millions of entries per table, controlling (or predicting) pipeline behavior on a cycle-by-cycle basis is impossible.</p>
<p>The one postulate I can agree with, is that the demands on all cores or threads are not symmetrical: this is exactly a source of advantage for a hardware thread with a single core where even if all threads are executing the same instruction stream, they are not executing the exact same instruction (or operating on the same object) at a given cycle.</p>
<p>You can also surely run a lightweight run-to-completion model with a multi-threaded core, and certainly is the better way to accomplish ideal performance.  Linux is horrible as a packet processing framework, so it generally doesn&#8217;t behoove associating it with any performance or throughput related discussion.</p>
<p>I would be curious to see a multi-core vendor backup the hypothesis that the run-to-completion model somehow allows them to reach 100% pipeline efficiency in packet processing environments.</p>
<p>-TG</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Guinther</title>
		<link>http://www.multicorepacketprocessing.com/disadvantages-of-multi-threading-in-next-generation-multicore-processors/comment-page-1/#comment-390</link>
		<dc:creator>Mark Guinther</dc:creator>
		<pubDate>Wed, 23 Dec 2009 19:50:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=533#comment-390</guid>
		<description>Alexander makes a good point about the applicability of multiple threads. To generalize, multi-threading can realize valuable peformance increases in multi-tasking environments. The more context switches you have, the more benefit you will see from multi-threading. Conversely, the benefits of hyperthreading will decrease with the relative percentage of context switches a core can make. In multicore packet processing, cores can be separated between control plane and data plane functionality. The control plane is typically a multitasking OS like Linux, where hyperthreading benefits can be demonstrated easily.  

On the data plane, a much simpler executive can be used. In a run-to-completion model an individual core (which I&#039;ll call a Network Acceleration Engine, NAE) can poll for packets(incoming our outgoing), perform the necessary processing, queue for dispatch, and return to polling state. In this case, multiple threads are not necessarily going to make the NAE run more efficiently. If the NAE is I/O or bus bound, the system performance will be poor regardless of the processing power. Likewise cache stalls will kill throughput performance, even if a second thread is available to prevent the core from going idle. 

I agree that multithreading is a brilliant and useful concept for most OS environments. But in the particular case of network packet processing, the demands on all cores are not symmetrical.</description>
		<content:encoded><![CDATA[<p>Alexander makes a good point about the applicability of multiple threads. To generalize, multi-threading can realize valuable peformance increases in multi-tasking environments. The more context switches you have, the more benefit you will see from multi-threading. Conversely, the benefits of hyperthreading will decrease with the relative percentage of context switches a core can make. In multicore packet processing, cores can be separated between control plane and data plane functionality. The control plane is typically a multitasking OS like Linux, where hyperthreading benefits can be demonstrated easily.  </p>
<p>On the data plane, a much simpler executive can be used. In a run-to-completion model an individual core (which I&#8217;ll call a Network Acceleration Engine, NAE) can poll for packets(incoming our outgoing), perform the necessary processing, queue for dispatch, and return to polling state. In this case, multiple threads are not necessarily going to make the NAE run more efficiently. If the NAE is I/O or bus bound, the system performance will be poor regardless of the processing power. Likewise cache stalls will kill throughput performance, even if a second thread is available to prevent the core from going idle. </p>
<p>I agree that multithreading is a brilliant and useful concept for most OS environments. But in the particular case of network packet processing, the demands on all cores are not symmetrical.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tatiana Griffin</title>
		<link>http://www.multicorepacketprocessing.com/disadvantages-of-multi-threading-in-next-generation-multicore-processors/comment-page-1/#comment-342</link>
		<dc:creator>Tatiana Griffin</dc:creator>
		<pubDate>Thu, 17 Dec 2009 01:17:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=533#comment-342</guid>
		<description>The two respondents above make a sparring analysis as to why &quot;multi-threading&quot; is disadvantageous to &quot;multi-core&quot; or vis-a-vis, why multi-core is &quot;better&quot; than multi-threading.  Mr. Litvack&#039;s original article appears to be about concurrent and simultaneous utilization of BOTH multi-core and multi-threading.  It does not preclude multi-core, and in fact the title states it clearly: &quot;multi-threading in [next-generation] multi-core processors.&quot;

All multi-threaded processors in fact support threading on a multi-core architecture: the duality in choice allows native selection and concurrent use of both ILP and TLP.  The original article appears to address core performance within a single-core, not across multiple cores.  Clearly, one could make a collective argument about &quot;Disadvantages of multi-core in next generation packet processing&quot; and that would also be jaded.

I encourage readers to visit http://www.cs.washington.edu/research/smt/ for an unbiased approach to multi-threading, in the specific context of where it could be a favorable design choice in concordance with multi-core selection.

Multi-threading is not panacea and neither is multi-core; multi-threading can be a distinct benefit in a variety of use cases.  Many modern core architectures (MIPS-MT, POWER5, UltraSPARC, Nehalem etc.) and many vendors (MIPS, NetLogic, IBM, Sun Microsystems, Intel) all support multi-threading simultaneously with multi-core.

Multi-core is here to stay.  And so is multi-threading.

You need not pick between the two: you can have both, have your cake and eat it too.

-TG</description>
		<content:encoded><![CDATA[<p>The two respondents above make a sparring analysis as to why &#8220;multi-threading&#8221; is disadvantageous to &#8220;multi-core&#8221; or vis-a-vis, why multi-core is &#8220;better&#8221; than multi-threading.  Mr. Litvack&#8217;s original article appears to be about concurrent and simultaneous utilization of BOTH multi-core and multi-threading.  It does not preclude multi-core, and in fact the title states it clearly: &#8220;multi-threading in [next-generation] multi-core processors.&#8221;</p>
<p>All multi-threaded processors in fact support threading on a multi-core architecture: the duality in choice allows native selection and concurrent use of both ILP and TLP.  The original article appears to address core performance within a single-core, not across multiple cores.  Clearly, one could make a collective argument about &#8220;Disadvantages of multi-core in next generation packet processing&#8221; and that would also be jaded.</p>
<p>I encourage readers to visit <a href="http://www.cs.washington.edu/research/smt/" rel="nofollow">http://www.cs.washington.edu/research/smt/</a> for an unbiased approach to multi-threading, in the specific context of where it could be a favorable design choice in concordance with multi-core selection.</p>
<p>Multi-threading is not panacea and neither is multi-core; multi-threading can be a distinct benefit in a variety of use cases.  Many modern core architectures (MIPS-MT, POWER5, UltraSPARC, Nehalem etc.) and many vendors (MIPS, NetLogic, IBM, Sun Microsystems, Intel) all support multi-threading simultaneously with multi-core.</p>
<p>Multi-core is here to stay.  And so is multi-threading.</p>
<p>You need not pick between the two: you can have both, have your cake and eat it too.</p>
<p>-TG</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kin-Yip Liu</title>
		<link>http://www.multicorepacketprocessing.com/disadvantages-of-multi-threading-in-next-generation-multicore-processors/comment-page-1/#comment-20</link>
		<dc:creator>Kin-Yip Liu</dc:creator>
		<pubDate>Thu, 05 Nov 2009 18:11:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=533#comment-20</guid>
		<description>One additional aspect to consider when developing application on multi-threaded cores is performance and latency determinism.  The effect of threads competing for the shared per-core caches and cache pollution that this post has mentioned means that the performance and latency of completing the tasks that a thread executes is much less deterministic as compared to when only one thread executes on an entire core.  In the latter case, the thread owns all the resources that the core has to offer.

There are some other factors which reduce deterministic performance with multi-threading.  First, if the processor hardware decides when to switch thread, then software developer does not control when a thread is executed and for how long, before hardware switches the execution to another thread.  Second, even if hardware tries to run all the threads at the same time, these threads also compete for the same execution units.  It is not always clear to the software developer how the hardware allocates the execution units among multiple threads being executed.  As a result, performance determinism gets impacted.

Performance determinism is an important performance attribute for packet processing.  Throughput is not the only important factor.</description>
		<content:encoded><![CDATA[<p>One additional aspect to consider when developing application on multi-threaded cores is performance and latency determinism.  The effect of threads competing for the shared per-core caches and cache pollution that this post has mentioned means that the performance and latency of completing the tasks that a thread executes is much less deterministic as compared to when only one thread executes on an entire core.  In the latter case, the thread owns all the resources that the core has to offer.</p>
<p>There are some other factors which reduce deterministic performance with multi-threading.  First, if the processor hardware decides when to switch thread, then software developer does not control when a thread is executed and for how long, before hardware switches the execution to another thread.  Second, even if hardware tries to run all the threads at the same time, these threads also compete for the same execution units.  It is not always clear to the software developer how the hardware allocates the execution units among multiple threads being executed.  As a result, performance determinism gets impacted.</p>
<p>Performance determinism is an important performance attribute for packet processing.  Throughput is not the only important factor.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

