<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Advantages of Multi-Threading in Next-Generation Multicore Processors</title>
	<atom:link href="http://www.multicorepacketprocessing.com/advantages-of-multi-threading-in-next-generation-multicore-processors/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.multicorepacketprocessing.com/advantages-of-multi-threading-in-next-generation-multicore-processors/</link>
	<description>A forum about multicore networking software</description>
	<lastBuildDate>Fri, 14 Oct 2011 12:13:45 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
	<item>
		<title>By: Ryan Petersen</title>
		<link>http://www.multicorepacketprocessing.com/advantages-of-multi-threading-in-next-generation-multicore-processors/comment-page-1/#comment-2017</link>
		<dc:creator>Ryan Petersen</dc:creator>
		<pubDate>Thu, 22 Jul 2010 18:56:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=467#comment-2017</guid>
		<description>TG, I believe you misread the above post.  I realize this is old and the original posters won&#039;t read this, but TG appears to be very knowledgeable so people will tend to agree with his statement.  But he incorrectly interprets Kin&#039;s statement as breaking single threads up across four cores.  
If that were the case he would be correct. Actually, all Kin is saying is that each core handles a single thread in its entirety.  When he says &quot;colored bar&quot; he doesn&#039;t mean each segment of color he means the whole bar that represents a timeline of a single core.  In fig 2 there are 2 bars total, the top and bottom.  With four cores, there would be four parallel bars for a given time period, a yellow and white, a blue and white, and so on.  And Kin is correct they would all be shorter than the multithreaded bar because they only do a single thread.</description>
		<content:encoded><![CDATA[<p>TG, I believe you misread the above post.  I realize this is old and the original posters won&#8217;t read this, but TG appears to be very knowledgeable so people will tend to agree with his statement.  But he incorrectly interprets Kin&#8217;s statement as breaking single threads up across four cores.<br />
If that were the case he would be correct. Actually, all Kin is saying is that each core handles a single thread in its entirety.  When he says &#8220;colored bar&#8221; he doesn&#8217;t mean each segment of color he means the whole bar that represents a timeline of a single core.  In fig 2 there are 2 bars total, the top and bottom.  With four cores, there would be four parallel bars for a given time period, a yellow and white, a blue and white, and so on.  And Kin is correct they would all be shorter than the multithreaded bar because they only do a single thread.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tatiana Griffin</title>
		<link>http://www.multicorepacketprocessing.com/advantages-of-multi-threading-in-next-generation-multicore-processors/comment-page-1/#comment-344</link>
		<dc:creator>Tatiana Griffin</dc:creator>
		<pubDate>Thu, 17 Dec 2009 02:00:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=467#comment-344</guid>
		<description>With all due respect, Mr. Liu, your analysis favorably assumes that you can speed-up the processing just by throwing more cores at it, or that latency is avoided just because you parallelized a solution.

The figures while not drawn to scale, appears to show linearly progressing time where latency is non-zero.  If a packet requires even just a single memory reference, it still takes X cycles for it complete from DRAM even if you throw 4 cores or 8 cores, or even 100 cores at it.

The second presumption you&#039;re making is that the 4 yellow bars are inherently parallelizable into 4 cores: while you can refactor or decompose a problem domain into parallel contexts, it is not always the case.  This is the mythical man month problem--if Y1, Y2, Y3 and Y4 represent the yellow bars above, they still have the associated latency for their corresponding memory accesses: they are NEVER zero.  If they are stalled, perhaps some other context (say Violet V1, V2, V3 ...) can consume the pipeline efficiently.

To me, multi-threading is just one of the evolving architectural choices that can provide benefits at the same time with multi-core.  In the past ten years, given that no single micro-architecture advance has yielded concomitant scale with Moore&#039;s law, the natural tendency seems to exploit multiple architectural choices simultaneously (e.g. smaller geometry, larger and multi-level caches, deeper pipeline, multi-core, and of course, multi-thread) and this is a developing trend.

The domain of interpretation for the original article appears to be within the context of a single-core, in a multi-core processing complex.  Numerous processor vendors have realized the potential benefits of offering multi-threading along with multi-core, although I see your comment as defensive with possibly not having such a feature in your micro-architecture.

-TG</description>
		<content:encoded><![CDATA[<p>With all due respect, Mr. Liu, your analysis favorably assumes that you can speed-up the processing just by throwing more cores at it, or that latency is avoided just because you parallelized a solution.</p>
<p>The figures while not drawn to scale, appears to show linearly progressing time where latency is non-zero.  If a packet requires even just a single memory reference, it still takes X cycles for it complete from DRAM even if you throw 4 cores or 8 cores, or even 100 cores at it.</p>
<p>The second presumption you&#8217;re making is that the 4 yellow bars are inherently parallelizable into 4 cores: while you can refactor or decompose a problem domain into parallel contexts, it is not always the case.  This is the mythical man month problem&#8211;if Y1, Y2, Y3 and Y4 represent the yellow bars above, they still have the associated latency for their corresponding memory accesses: they are NEVER zero.  If they are stalled, perhaps some other context (say Violet V1, V2, V3 &#8230;) can consume the pipeline efficiently.</p>
<p>To me, multi-threading is just one of the evolving architectural choices that can provide benefits at the same time with multi-core.  In the past ten years, given that no single micro-architecture advance has yielded concomitant scale with Moore&#8217;s law, the natural tendency seems to exploit multiple architectural choices simultaneously (e.g. smaller geometry, larger and multi-level caches, deeper pipeline, multi-core, and of course, multi-thread) and this is a developing trend.</p>
<p>The domain of interpretation for the original article appears to be within the context of a single-core, in a multi-core processing complex.  Numerous processor vendors have realized the potential benefits of offering multi-threading along with multi-core, although I see your comment as defensive with possibly not having such a feature in your micro-architecture.</p>
<p>-TG</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: admin</title>
		<link>http://www.multicorepacketprocessing.com/advantages-of-multi-threading-in-next-generation-multicore-processors/comment-page-1/#comment-19</link>
		<dc:creator>admin</dc:creator>
		<pubDate>Thu, 05 Nov 2009 15:38:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=467#comment-19</guid>
		<description>You will find another point of view regarding multi-threading &lt;a href=&quot;http://www.multicorepacketprocessing.com/disadvantages-of-multi-threading-in-next-generation-multicore-processors/&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>You will find another point of view regarding multi-threading <a href="http://www.multicorepacketprocessing.com/disadvantages-of-multi-threading-in-next-generation-multicore-processors/" rel="nofollow">here</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kin-Yip Liu</title>
		<link>http://www.multicorepacketprocessing.com/advantages-of-multi-threading-in-next-generation-multicore-processors/comment-page-1/#comment-17</link>
		<dc:creator>Kin-Yip Liu</dc:creator>
		<pubDate>Thu, 05 Nov 2009 01:51:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=467#comment-17</guid>
		<description>Figure 2 clearly demonstrates the multicore advantage.  If there is one core running the 4 tasks as indicated by the 4 colors, the overall throughput is the lower bar in figure 2.  If there is one core with multi-threading, the best case throughput is the upper bar in figure 2.

If there are four cores running these 4 tasks, the throughput is more than 2x the multi-threading case.  In figure 2, you can visualize it by just looking at the yellow boxes of the lower bar.  There would be four bars which correspond to four cores each running one of the tasks (colors).  In this 4 core scenario, the length of the bar is less than half of the multi-threading (upper) bar.

This shows that an actual physical core always provides more hardware resources and performance than having multiple threads share one physical core and compete for resources.</description>
		<content:encoded><![CDATA[<p>Figure 2 clearly demonstrates the multicore advantage.  If there is one core running the 4 tasks as indicated by the 4 colors, the overall throughput is the lower bar in figure 2.  If there is one core with multi-threading, the best case throughput is the upper bar in figure 2.</p>
<p>If there are four cores running these 4 tasks, the throughput is more than 2x the multi-threading case.  In figure 2, you can visualize it by just looking at the yellow boxes of the lower bar.  There would be four bars which correspond to four cores each running one of the tasks (colors).  In this 4 core scenario, the length of the bar is less than half of the multi-threading (upper) bar.</p>
<p>This shows that an actual physical core always provides more hardware resources and performance than having multiple threads share one physical core and compete for resources.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

