<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Multicore Packet Processing Forum</title>
	<atom:link href="http://www.multicorepacketprocessing.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.multicorepacketprocessing.com</link>
	<description>A forum about multicore networking software</description>
	<lastBuildDate>Tue, 24 Jan 2012 17:03:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>Accelerating Time-to-market for OEMs Using Intel® DPDK</title>
		<link>http://www.multicorepacketprocessing.com/accelerating-time-to-market-for-oems-using-intel%c2%ae-dpdk/</link>
		<comments>http://www.multicorepacketprocessing.com/accelerating-time-to-market-for-oems-using-intel%c2%ae-dpdk/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 17:03:33 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Marketing Forum]]></category>
		<category><![CDATA[Software Architecture]]></category>
		<category><![CDATA[6WINDGate]]></category>
		<category><![CDATA[DPDK]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=2063</guid>
		<description><![CDATA[By Charlie Ashton, VP of Marketing &#8211; 6WIND At the Intel Developer Forum in September 2011, 6WIND announced support for the Intel® Data Plane Development Kit (see the press release here). Since that time, we have provided expert technical support to a number of OEMs using the Intel® DPDK library to develop high-end networking, telecom [...]]]></description>
			<content:encoded><![CDATA[<p>By <a href="http://www.6wind.com/Charlie-Ashton.html" target="_blank">Charlie Ashton</a>, VP of Marketing &#8211; 6WIND</p>
<p>At the Intel Developer Forum in September 2011, 6WIND announced support for the Intel® Data Plane Development Kit (<a href="http://www.6wind.com/wp-content/uploads/PDF/press/2011/6WIND-announces-availability-of-Intel-DPDK-support.pdf" target="_blank">see the press release here</a>). Since that time, we have provided expert technical support to a number of OEMs using the Intel® DPDK library to develop high-end networking, telecom and security products.</p>
<p>As many readers of this blog will know, the Intel® DPDK is a set of data plane libraries and optimized NIC drivers, licensed by Intel for incorporation either into high-performance networking stacks (like the 6WINDGate™ packet processing software) or directly into customers’ applications.</p>
<p>6WIND gained extensive experience with the Intel® DPDK library while integrating it into 6WINDGate to achieve industry-leading packet processing performance. As a result of that experience, 6WIND decided to offer Intel® DPDK support to OEMs under two models:</p>
<p>First, we offer an enhanced version of the standard Intel® DPDK library for packet processing applications. This includes a range of value-added enhancements, developed internally as part of our own integration work. These enhancements include: crypto support using the AES-NI instructions; device monitoring and statistics features; support for additional devices such as NICs; bug fixes. We also provide a set of optional add-on modules that provide increased system functionality and performance, in areas such as virtualization and crypto acceleration.</p>
<p>This enhanced version of the standard Intel® DPDK library is maintained by 6WIND and synchronized with Intel’s on-going releases of the baseline library. We provide this library to OEMs worldwide, backed by a variety of technical support models such as standard support and maintenance, professional services for custom software development and comprehensive support for specific Software License Agreements (SLAs).</p>
<p>A number of OEMs, who have chosen to the integrate the Intel® DPDK library directly into their applications, have significantly accelerated their development time while also improving the system-level performance of their products, thanks to 6WIND’s support and assistance.</p>
<p>Our second support/distribution model is for those OEMs who need a full-featured, commercial networking stack that delivers the maximum packet processing performance on Intel® Architecture platforms. To address these needs, 6WIND provides its enhanced version of the Intel® DPDK library pre-integrated into the 6WINDGate packet processing software. This enables OEMs to license a single software solution that has been optimized to fully exploit the features of the Intel® DPDK library while delivering a comprehensive set of high-performance networking protocols (routing, firewall, security, connectivity, QoS, VLAN, protocol termination etc.).</p>
<p>As an example of the performance delivered by 6WINDGate, on a dual-Intel® Xeon® processor E5645 platform with a clock speed of 3.33GHz, running the Intel® DPDK, 6WINDGate delivers over 16 million packets per second, per core of IP forwarding performance, thereby forwarding 10Gbps of network traffic in each core (64-byte packets). This performance scales linearly with the number of cores configured to run 6WINDGate until the maximum bandwidth of the hardware platform is reached. Processor cores not used to run 6WINDGate are available to run value-added application software or Virtual Machines (VMs), resulting in a highly efficient and flexible system for advanced networking equipment.</p>
<p>As in the case of the stand-alone Intel® DPDK library, the 6WINDGate software is backed by a variety of technical support models such as standard support and maintenance, professional services for custom software development and comprehensive support for specific SLAs.</p>
<p>Since announcing our support for the Intel® DPDK library four months ago, we have seen strong interest from OEMs in incorporating it into high-performance networking equipment.</p>
<p>Are you working with the Intel® DPDK? What benefits are you seeing in terms of performance and time-to-market?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/accelerating-time-to-market-for-oems-using-intel%c2%ae-dpdk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recent News About Accelerated Deployment Of Multicore Platforms</title>
		<link>http://www.multicorepacketprocessing.com/recent-news-about-accelerated-deployment-of-multicore-platforms/</link>
		<comments>http://www.multicorepacketprocessing.com/recent-news-about-accelerated-deployment-of-multicore-platforms/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 09:35:46 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Marketing Forum]]></category>
		<category><![CDATA[cavium]]></category>
		<category><![CDATA[CES]]></category>
		<category><![CDATA[LTE]]></category>
		<category><![CDATA[netlogic]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=2052</guid>
		<description><![CDATA[By Charlie Ashton, VP of Marketing &#8211; 6WIND Some news items from last week should be of great interest to readers of this blog: In one of many stories from CES, FierceWireless reported on the wide range of LTE devices introduced at the show as well as aggressive pricing plans. As the article states, “With [...]]]></description>
			<content:encoded><![CDATA[<p>By <a href="http://www.6wind.com/Charlie-Ashton.html" target="_blank">Charlie Ashton</a>, VP of Marketing &#8211; 6WIND</p>
<p>Some news items from last week should be of great interest to readers of this blog:</p>
<p>In one of many stories from CES, <a href="http://www.fiercewireless.com/ceslive/story/lte-devices-dominate-ces-price-points-starting-drop/2012-01-10" target="_blank">FierceWireless</a> reported on the wide range of LTE devices introduced at the show as well as aggressive pricing plans. As the article states, “With AT&amp;T&#8217;s increased number of LTE devices coupled with Verizon&#8217;s data pricing promotions, it appears that the race to attract subscribers to LTE networks just got a little bit more interesting.”</p>
<p>Another CES report, from <a href="http://computerworld.co.nz/news.nsf/telecommunications/lte-explodes-at-ces" target="_blank">ComputerWorld</a>, discusses the emergence of non-traditional applications for LTE. As an example, “Alcatel-Lucent…. manned a booth showing several high-tech examples of how LTE wireless technology can support consumers, industry and government, such as providing police departments with the ability to transmit high definition, real-time video and data on crime suspects to officers in patrol cars.”</p>
<p>Microsoft’s announcement of Windows LTE smartphones received wide coverage, for example in this article from <a href="http://www.informationweek.com/news/windows/microsoft_news/232400091" target="_blank">InformationWeek</a> that describes the Nokia Lumia 900 which will be available from AT&amp;T.</p>
<p>Lastly, a couple of non-CES items…..</p>
<p>Congratulations to NetLogic, who announced <a href="http://www.netlogicmicro.com/News/pr/2011/11-12-12gsa.asp" target="_blank">here</a> that they received the distinguished 2011 Most Respected Emerging Public Semiconductor Company Award for the third consecutive year from the Global Semiconductor Alliance (GSA).</p>
<p>Finally, Cavium and Kontron announced <a href="http://www.cavium.com/newsevents_Cavium_Kontron_10-core.html" target="_blank">here</a> the availability of the second generation of the Kontron AMC Packet Processor module AM4211 for MicroTCA platforms, based on the 10-core OCTEON II CN6645 series.</p>
<p>What are the trends that you’re seeing in multicore platforms? What recent interesting news have you seen in this area?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/recent-news-about-accelerated-deployment-of-multicore-platforms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Market Study Projects Strong Growth in LTE Adoption During 2012</title>
		<link>http://www.multicorepacketprocessing.com/market-study-projects-strong-growth-in-lte-adoption-during-2012/</link>
		<comments>http://www.multicorepacketprocessing.com/market-study-projects-strong-growth-in-lte-adoption-during-2012/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 06:07:12 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Marketing Forum]]></category>
		<category><![CDATA[LTE]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=2042</guid>
		<description><![CDATA[By Charlie Ashton, VP of Marketing &#8211; 6WIND Happy New Year to all readers of the Multicore Packet Processing Forum blog! We look forward to sharing some exciting discussions with you during 2012 as new multicore solutions are announced and OEMs bring innovative products to market. Last week, FierceBroadbandWireless published an interesting study from Maravedis, [...]]]></description>
			<content:encoded><![CDATA[<p>By <a href="http://www.6wind.com/Charlie-Ashton.html">Charlie Ashton</a>, VP of Marketing &#8211; 6WIND</p>
<p>Happy New Year to all readers of the Multicore Packet Processing Forum blog! We look forward to sharing some exciting discussions with you during 2012 as new multicore solutions are announced and OEMs bring innovative products to market.</p>
<p>Last week, FierceBroadbandWireless published an interesting study from <a href="http://www.fiercebroadbandwireless.com/story/maravedis-2012-wireless-and-mobile-cloud-predictions/2012-01-03">Maravedis</a>, outlining their projections for the growth in LTE adoption over the next twelve months.</p>
<p>Amongst other predictions, the study forecasts that the worldwide LTE subscriber base will reach 54 million by the end of 2012, 46% of whom will be in North America and 36% in the Asia-Pacific region. LTE networks are expected to be up and running in more than ten countries, including China, India, Malaysia, South Korea, Taiwan and the United States. China Mobile expects to launch the first truly large-scale deployment this year which will drive major economies of scale in handsets.</p>
<p>Maravedis notes a significant slowdown in mobile WiMAX subscribers. With 28 million users at the end of 2012, the mobile WiMAX subscriber base is expected to decline as major mobile WiMAX operators migrate their networks to LTE. Operators forecasted to make WiMAX-to-LTE transitions include Clearwire (US), Yota (Russia) and P1 (Malaysia).</p>
<p>The study indicates that there will be 160 commercial LTE deployments in service by the end of 2012, an increase of more than 100% from the 61 in service at the end of 2011.</p>
<p>Maravedis anticipates that Ericsson, Huawei and Nokia-Siemens Networks will continue to be the leading infrastructure vendors chosen by LTE operators during 2012, with Nokia-Siemens Networks receiving the largest share of contracts awarded.</p>
<p>Cloud RAN is identified as one of the key technologies for 2012 (see an earlier Forum post on this topic <a href="../../2011/06/">“Cloud RAN Outlook: Fair or Cloudy?”</a>). Vendors such as Ericsson have begun to publicly announce their success in deploying cloud RAN equipment, while processor suppliers such as TI have announced specialized SoCs. At the same time, Alcatel-Lucent has demonstrated their small remote radio head solutions at several trade shows.</p>
<p>This contents of this report mirrors the activity and growth that we at 6WIND are seeing within our customer base. Our high-performance packet processing software is already deployed in a large number of LTE networks, some of which are in trials while others are in full commercial operation. We see strong interest from our OEM customers in the challenges of increasing network capacity, designing for maximum scalability, reducing cost and accelerating time-to-market.</p>
<p>What are the key trends that you’re seeing in the LTE market? What do you see as the major trends and challenges for 2012?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/market-study-projects-strong-growth-in-lte-adoption-during-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interesting Links and Documents to Share With You</title>
		<link>http://www.multicorepacketprocessing.com/interesting-links-and-documents-to-share-with-you/</link>
		<comments>http://www.multicorepacketprocessing.com/interesting-links-and-documents-to-share-with-you/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 13:23:00 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Marketing Forum]]></category>
		<category><![CDATA[LTE]]></category>
		<category><![CDATA[ultra-low latency]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=2027</guid>
		<description><![CDATA[By Eric Carmes &#8211; 6WIND Founder and CEO Today, I would like to share some interesting links and documents with you. First, you can find here an interesting post from HP “What’s the right path to a better network solution across the entire enterprise?” explaining how and why data center networks will have to evolve [...]]]></description>
			<content:encoded><![CDATA[<p>By <a href="http://www.6wind.com/Eric-Carmes.html">Eric Carmes</a> &#8211; 6WIND Founder and CEO</p>
<p>Today, I would like to share some interesting links and documents with you.</p>
<p>First, you can find <a href="http://h30507.www3.hp.com/t5/HP-Networking/What-s-the-right-path-to-a-better-network-solution-across-the/ba-p/100821" target="_blank">here</a> an interesting post from HP “What’s the right path to a better network solution across the entire enterprise?” explaining how and why data center networks will have to evolve in the near future to apply server and storage virtualization principles to networking.</p>
<p>6WIND recently had an exciting discussion with Adam Wood from High  Frequency Traders about the benefits of high performance software packet  processing in providing low-latency network solutions. You can find it <a href="http://www.highfrequencytraders.com/featured/1002/packet-processing-high-frequency-trading" target="_blank">here</a>.</p>
<p>You can also check <a href="http://bradhedlund.com/" target="_blank">here</a> the interesting “BRAD HEDLUND .com” blog about data center networking, virtualization, and computing.</p>
<p>Saying that cloud is becoming mobile is obvious. An interesting study from Visongain’s “Mobile Cloud Computing Industry Outlook Report: 2011-2016” examines mobile cloud service revenues. A summary of this study can be found <a href="http://www.electronics-eetimes.com/en/mobile-cloud-computing-will-generate-45-billion-dollars-revenue-by-2016.html?cmp_id=7&amp;news_id=222910566&amp;vID=887" target="_blank">here</a>. The numbers are impressive although mobile cloud security, privacy, feasibility and accessibility remain a major concern. It can be compared to another market study from Juniper Research focusing on LTE revenues “LTE Revenues Projected to Exceed $265 Billion Globally in 2016”. You can find a summary <a href="http://juniperresearch.com/viewpressrelease.php?pr=279" target="_blank">here</a>.</p>
<p>Good reading.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/interesting-links-and-documents-to-share-with-you/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multicore in 2011 (Texas Instruments)</title>
		<link>http://www.multicorepacketprocessing.com/multicore-2011-texas-instruments/</link>
		<comments>http://www.multicorepacketprocessing.com/multicore-2011-texas-instruments/#comments</comments>
		<pubDate>Mon, 12 Dec 2011 16:51:54 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Marketing Forum]]></category>
		<category><![CDATA[multicore processors review]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=2008</guid>
		<description><![CDATA[By Eric Carmes &#8211; 6WIND Founder and CEO Please find hereafter an addition to my Multicore in 2011 post about Texas Instruments architecture: Texas Instruments: In April, Texas Instruments announced their new Multicore Software Development Kit (press release), a free, integrated software platform enabling rapid development on TI’s multicore DSPs. In 2011, TI released a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.6wind.com/Eric-Carmes.html"><em>By Eric Carmes &#8211; 6WIND Founder and CEO</em></a></p>
<p>Please find hereafter an addition to my <a href="http://www.multicorepacketprocessing.com/multicore-in-2011/" target="_blank">Multicore in 2011 post</a> about Texas Instruments architecture:</p>
<ul>
<li><strong>Texas Instruments:</strong> In April, Texas Instruments announced their new Multicore Software Development Kit (<a href="http://newscenter.ti.com/Blogs/newsroom/archive/2011/04/25/unleashing-multicore-new-software-from-texas-instruments-gets-developers-one-step-closer-to-tapping-the-full-potential-of-ti-multicore-dsps.aspx" target="_blank">press release</a>), a free, integrated software platform enabling rapid development on TI’s multicore DSPs. In 2011, TI released a total of six new multicore devices based on its innovative KeyStone architecture which combines hardware-based acceleration and world-class programmability to provide full processing capability to <em>every core</em> in a multicore device. More information about this new KeyStone multicore architecture  scaled to support unprecedented levels of capacity for cloud RAN and networked server applications can be found here (<a href="http://newscenter.ti.com/Blogs/newsroom/archive/2011/12/05/ti-s-keystone-multicore-architecture-scaled-to-support-unprecedented-levels-of-capacity-for-cloud-ran-applications-and-networked-server-developers-880179.aspx" target="_blank">press release</a>).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/multicore-2011-texas-instruments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multicore in 2011</title>
		<link>http://www.multicorepacketprocessing.com/multicore-in-2011/</link>
		<comments>http://www.multicorepacketprocessing.com/multicore-in-2011/#comments</comments>
		<pubDate>Tue, 06 Dec 2011 07:31:26 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Marketing Forum]]></category>
		<category><![CDATA[multicore processors review]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=1995</guid>
		<description><![CDATA[By Eric Carmes &#8211; 6WIND Founder and CEO As we are now close to the end of the year, it’s an appropriate time to summarize major 2011 news stories about multicore technology. Processor suppliers are listed in alphabetical order. Calxeda: In November, Calxeda launched the ARM-based EnergyCore™ processor targeting low-power servers (Press release). Calxeda was [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.6wind.com/Eric-Carmes.html"><em>By Eric Carmes &#8211; 6WIND Founder and CEO</em></a></p>
<p>As we are now close to the end of the year, it’s an appropriate time to summarize major 2011 news stories about multicore technology. Processor suppliers are listed in alphabetical order.</p>
<ul>
<li><strong>Calxeda: </strong>In      November, Calxeda launched the ARM-based EnergyCore™ processor targeting      low-power servers (<a href="http://www.calxeda.com/company/newsroom/pressreleases/63-calxeda-launches-energycore-processor">Press release</a>). Calxeda was the first company to announce      an ARM-based processor to compete with Intel’s server solutions.</li>
<li><strong>Cavium Networks:</strong> In 2011, Cavium extended      the range of OCTEON II processors with the 10 and 32-core versions. In      October, Cavium also announced the OCTEON Fusion™ series (CNF71xx and      CN72xx) integrating multicore CPUs and DSPs (respectively 4 and 6 in the CNF7130) to address the      base station market (<a href="http://www.cavium.com/newsevents_Cavium_OCTEON_Fusion.html">Press release</a>).</li>
<li><strong>Freescale:</strong> In June      Freescale announced the new QorIQ Advanced Multiprocessing Series (<a href="http://media.freescale.com/phoenix.zhtml?c=196520&amp;p=irol-newsArticle&amp;ID=1576370">Press release</a>). This incorporates      a new, multithreaded 64-bit Power Architecture® core, 28-nm process      technology and up to 24 virtual cores.</li>
<li><strong>Intel:</strong> Intel continues to release enhanced versions      of the embedded multicore Intel® Architecture platforms.      Intel also released the Data Plane Development Kit, an optimized software library      for high performance packet processing (<a href="http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#overview">DPDK page</a>).</li>
<li><strong>LSI Logic:</strong> In February, LSI      announced the expansion of the Axxia™ Communication Processor family with the      ACP3423, targeted at equipment such as multi-radio base stations and      wireless backhaul (<a href="http://www.lsi.com/about/newsroom/Pages/20110214apr.aspx">Press release</a>). The ACP 3423 features      two PowerPC® 476FP processor cores and a wide array of intelligent      micro-coded offload engines.</li>
<li><strong>NetLogic:</strong> Of course the main news was the planned      acquisition of the company by Broadcom. On the technical side, NetLogic is      now shipping the XLP multithreaded MIPS-based processor. In September,      NetLogic also announced the XLP II architecture with the most powerful      processor of the series that will have 20 cores and 80 threads (<a href="http://www.netlogicmicro.com/News/pr/2011/11-09-07xlpII.asp">Press release</a>).</li>
<li><strong>Tilera:</strong> Tilera is now shipping the first      processor in the 64-bit TILEGx series. The TILEGx8036 implements 36      processor cores. The 100-core processor version (TILEGx8100) is expected      to be available in Q1 2012.</li>
</ul>
<p>Please feel free to comment on this list if you think information is missing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/multicore-in-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software Assisted Snoop Containment for Packet Processing Applications (Part 3 of 3)</title>
		<link>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-3-of-3/</link>
		<comments>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-3-of-3/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 12:06:43 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Software Architecture]]></category>
		<category><![CDATA[Software Implementation]]></category>
		<category><![CDATA[coherency]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[snoop]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=1926</guid>
		<description><![CDATA[By Vakul Garg (vakul@freescale.com) and Varun Sethi (varun.sethi@freescale.com), Senior Software Engineer, Freescale Semiconductor. This is the last post of a series of three. Please find the first post here and the second one here. 7 &#8211; Fixing the problem Using ‘perf’ tool, first we measured the number of snoops per second on control plane core [...]]]></description>
			<content:encoded><![CDATA[<p>By Vakul Garg (vakul@freescale.com) and Varun Sethi   (varun.sethi@freescale.com), Senior Software Engineer, Freescale   Semiconductor.</p>
<p>This is the last post of a series of three. Please find the first post <a href="http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-1-of-3/" target="_blank">here</a> and the second one <a href="http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-2-of-3/" target="_blank">here</a>.</p>
<p><strong>7 &#8211; Fixing the problem</strong></p>
<p>Using ‘perf’ tool, first we measured the number of snoops per second on control plane core with no control plane application running and the data plane paused. It was almost ‘0’. Next we measured the number of snoops per second with data plane running. The number we got represented the snoop transactions arriving at control plane due to data plane activity. In an ideal partitioning case, this number should be close to ‘0’.</p>
<p>The system under test was already running control and data plane applications in their own respective partitions representing separate coherency domains. For all the ethernet ports in the system, they originally shared a common set of buffer pools to pick buffers to receive frames. A direct implication of sharing pools for all the ports was that the memory used to seed buffer pools had to be declared ‘coherent’ across both control and data plane partitions.</p>
<p>Since each of the partition processed frames from its own ethernet port exclusively, there was no real need to use shared memory for buffer pools. We assigned two different sets of buffer pools to the ports owned by each of the partition. These pools were seeded with partition private memory buffers. Thus for any frame via 10G ports (which were owned by data plane), snoops did not reach control plane cores. After this change, we measured the snoops per second at control plane core again. It came down drastically from what was observed originally, but it was still not close to ‘0’.</p>
<p>The task was now to find the source of remaining snoop transactions on control plane. By reviewing the system configuration, we found that the scratchpad memory assigned to hardware accelerators QMAN and BMAN was declared coherent. As described previously, this is not required. We changed the attribute of scratchpad memory to be non-coherent and measured snoop rate again. This time it was close to ‘0’.</p>
<p>Finally we measured the performance of memory copy bandwith application and SIP stack again on control plane while data plane was running at its full rate. This time, the performance of control plane remained unaffected irrespective of whether data plane was running or paused.</p>
<p><strong>8 &#8211; Software design recommendations</strong></p>
<p>Since data plane and control plane have extremely low data sharing requirement, it should possible to run them under different coherency domain so as to restrict the snoops generated to respective domains.</p>
<p><strong><em>Device private memory must be marked coherent</em></strong></p>
<p>The software running on cores is supposed to never access the address range reserved as scratchpad. Hence, we are certain that none of the address in this range would ever be present in any of the core local cache. This obviates the need of declaring scratchpad address range to be coherent.</p>
<p><em><strong>Separate the buffers for I/O ports private to control plane and data plane</strong> </em></p>
<p>For each of the I/O port private to control and data plane, a different set of buffer pools must be used. Care should be taken not to seed these buffer pools with memory which is shared across control and data plane.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-3-of-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software Assisted Snoop Containment for Packet Processing Applications (Part 2 of 3)</title>
		<link>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-2-of-3/</link>
		<comments>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-2-of-3/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 06:54:16 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Software Architecture]]></category>
		<category><![CDATA[Software Implementation]]></category>
		<category><![CDATA[coherency]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[snoop]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=1920</guid>
		<description><![CDATA[By Vakul Garg (vakul@freescale.com) and Varun Sethi (varun.sethi@freescale.com), Senior Software Engineer, Freescale Semiconductor. This is the second post of a series of three. You can find the first post here. 4 &#8211; Generation of snoops due to packet processing When the ingress I/O controller (e.g. Ethernet) copies the frame from its DMA internal FIFO to [...]]]></description>
			<content:encoded><![CDATA[<p>By Vakul Garg (vakul@freescale.com) and Varun Sethi  (varun.sethi@freescale.com), Senior Software Engineer, Freescale  Semiconductor.</p>
<p>This is the second post of a series of three. You can find the first post <a href="http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-1-of-3/">here</a>.</p>
<p><strong>4 &#8211; Generation of snoops due to packet processing</strong></p>
<p>When the ingress I/O controller (e.g. Ethernet) copies the frame from its DMA internal FIFO to memory, snoops are generated to invalidate copies in any of core local caches. Also when the frame headers are brought inside local cache of cores running data plane (either on demand miss or by stashing), snoop transactions are generated. This causes all the other cores in the same the coherency domain to check whether they have a copy of the address being accessed inside their local caches. If any of them has a copy, it is invalidated (since it must be stale copy).</p>
<p>The snoop requests to control plane cores generated due to data plane activity reduce number of productive cycles in which a control plane cores can complete instructions. This results in lower IPC (Instructions per cycle) count.</p>
<p><strong>5 &#8211; Source of Snoops</strong></p>
<p><strong><em>Device Generated</em></strong></p>
<p>Embedded multicore networking processors often pack many I/O devices (e.g. ethernet, RapidIO etc) and hardware offload accelerators alongwith cores. These accelerators are capable of executing common functions required for efficient packet processing. E.g. Freescale QorIQ platform P4080 has QMAN (for queue management), SEC (for cryptographic processing), BMAN (for buffer pool management) etc. The accelerators often require system memory as a scratchpad to store their own private data structures for housekeeping. If the scratchpad memory is declared as coherent, any access to this memory by the accelerator would cause snoop transactions on the system bus.</p>
<p>When I/O device or an accelerator reads frame contents (e.g. frame transmission by ethernet controller or encryption by crypto block), snoop transactions are generated since any of the core’s local cache might have most recent modified copy of frame. Similarly when a frame is written by an accelerator (e.g. IPSEC encapsulation by crypto hardware), snoops are generated for the addresses being written to invalidate them if they are present in any core local cache.</p>
<p><strong><em>Software (core) generated</em></strong></p>
<p>Access to an address by the software running on the core would generate snoop transactions if the address falls in a page marked coherent . On a multicore SMP system, usually whole of the memory is declared as coherent. In many cases, this becomes the source of many un-necessary snoops into the cores. E.g. if a certain piece of data by design is always accessed at a fixed core, then coherency maintainence is not required for its address. In a multicore packet processing system, accessing frame headers may generate snoops depending upon whether the address being accessed is in exclusive state in core local cache.</p>
<p><strong>6 &#8211; Case Study</strong></p>
<p>To get an idea of the impact of snoops (due to data plane activity) on the control plane performance, we setup an experiment on Freescale’s multicore QorIQ platform P4080 having 8 CPUs. We used Freescale’s embedded hypervisor software to setup two static partitions in the processor.  The first partition (1 CPU) ran Linux based control plane and the second partition (7 CPUs) ran data plane based on Freescale’s LWE (Light Weight Executive).</p>
<p>The data plane was assigned two 10G ethernet ports and the control plane was assigned single 1G ethernet port.</p>
<p>The data plane was used to run a baremetal run-to-completion  packet reflecting application. It received IPv4 packets from two 10G ports on the processor. The Ethernet frame size used was 64 bytes. Both the 10G ports were used at line rate. The data plane reflected back all the incoming frames through the same ethernet port from which the frame was originally received after swapping source and destination IP addresses and MAC addresses.</p>
<p>We tried two different applications on control plane. These are described below. We observed the performance of both of these control plane applications when data plane application was paused and running. The number of snoops reaching the control plane core were counted using open source tool ‘perf’. This tool uses core’s performance monitor hardware to count snoop request events.</p>
<p><strong><em>Memory copy bandwidth test</em></strong></p>
<p>Our first application on control plane was memory copy bandwidth test (bw_mem) from Lmbench benchmark suite. We used it to execute copy of very large sized buffers (1GB). The performance metric collected was the buffer size that could be copied per second.</p>
<p><strong><em>SIP stack</em></strong></p>
<p>The second application we tried was a real world application. We used open source SIP (Session Initiation Protocol) implementation (PJSIP software) on control plane. The performance of SIP stack was measured by running PJSIP in both client and server mode on same control plane partition. The server and client were connected through loopback interface. The time taken by the SIP client to start and terminate 20000 calls was measured.</p>
<p>We found that both of the above mentioned control plane applications experience a slowdown when data plane and control plane were simultaneously run compared to when data plane was paused.</p>
<p><span style="text-decoration: underline;">The memory copy test experienced a slowdown of about 20%. In SIP stack, it was about 10%.</span></p>
<p><em>Note that here we used a minimalist data plane application. If the data plane application uses a hardware offload accelerator such as  security block for frame encryption and decryption in a pipelined processing fashion, then each frame would be received twice at data plane cores resulting in double the amount of snoops and hence even larger performance degradation at control plane cores.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-2-of-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Challenges in Flexibility and Scalability for Data Center Networking Equipment</title>
		<link>http://www.multicorepacketprocessing.com/challenges-in-flexibility-and-scalability-for-data-center-networking-equipment/</link>
		<comments>http://www.multicorepacketprocessing.com/challenges-in-flexibility-and-scalability-for-data-center-networking-equipment/#comments</comments>
		<pubDate>Thu, 17 Nov 2011 06:48:21 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Marketing Forum]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[cloud expo]]></category>
		<category><![CDATA[data center]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=1938</guid>
		<description><![CDATA[By Charlie Ashton, VP of Marketing &#8211; 6WIND During a short visit to the Cloud Expo conference in Santa Clara last week, I had conversations with both hardware and software companies about challenges associated with data center networking equipment. A couple of Network Equipment Providers (NEPs) talked about the need to quickly bring-to-market low-cost appliances [...]]]></description>
			<content:encoded><![CDATA[<p>By <a href="http://www.6wind.com/Charlie-Ashton.html">Charlie Ashton</a>, VP of Marketing &#8211; 6WIND</p>
<p>During a short visit to the <a href="http://cloudcomputingexpo.com/">Cloud Expo</a> conference in Santa Clara last week, I had conversations with both hardware and software companies about challenges associated with data center networking equipment.</p>
<p>A couple of Network Equipment Providers (NEPs) talked about the need to quickly bring-to-market low-cost appliances optimized for use in virtualized environments. More than just top-of-rack switches, these appliances need to perform a range of networking functions (switching, routing, load balancing, firewall etc). At the same time, they need to be managed remotely (via any one of a large number of data center configuration software packages) and incorporate the flexibility to dynamically reallocate processor resources between networking functions as well as between the control and data planes. Under ever-increasing cost pressure from data center operators, these NEPs need a flexible networking stack that maximizes the performance of their hardware platform while giving them the flexibility to select whichever processor architecture best matches their target price points.</p>
<p>While that represents a traditional application for 6WIND’s packet processing software, discussions with suppliers of data center hardware and software highlighted other related challenges. There’s a clear trend to deliver advanced packet processing functions (such as virtual routing, security, QoS etc) using dedicated packet processing blades based on commodity hardware platforms, rather than using high-priced networking equipment. These functions need to be provided in a fully-virtualized environment and must offer a high degree of scalability.</p>
<p>There’s clearly a wide range of networking challenges within the data center. What do you see as the most critical issues that need to be solved to ensure that end users enjoy the most cost-effective experience as they migrate their applications either to public or private clouds?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/challenges-in-flexibility-and-scalability-for-data-center-networking-equipment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software Assisted Snoop Containment for Packet Processing Applications (Part 1 of 3)</title>
		<link>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-1-of-3/</link>
		<comments>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-1-of-3/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 06:53:09 +0000</pubDate>
		<dc:creator>Eric Carmes</dc:creator>
				<category><![CDATA[Software Architecture]]></category>
		<category><![CDATA[Software Implementation]]></category>
		<category><![CDATA[coherency]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[snoop]]></category>

		<guid isPermaLink="false">http://www.multicorepacketprocessing.com/?p=1912</guid>
		<description><![CDATA[By Vakul Garg (vakul@freescale.com) and Varun Sethi (varun.sethi@freescale.com), Senior Software Engineer, Freescale Semiconductor. 1 &#8211; Introduction In case of multicore systems the cost of hardware enforced coherency increases with the increase in number of cores. This can be the attributed to the requirement of a snooping based coherent system, where each core must inspect the [...]]]></description>
			<content:encoded><![CDATA[<p>By Vakul Garg (vakul@freescale.com) and Varun Sethi (varun.sethi@freescale.com), Senior Software Engineer, Freescale Semiconductor.</p>
<p><strong>1 &#8211; Introduction</strong></p>
<p>In case of multicore systems the cost of hardware enforced coherency increases with the increase in number of cores. This can be the attributed to the requirement of a snooping based coherent system, where each core must inspect the memory traffic for every other core. Indeed, as each of the ‘n’ nodes in a multicore system must process all other (n-1) nodes’ snoop requests, the number of coherence actions scales with O(n²). The number of coherence actions will affect the overall performance of processors because these coherence requests interfere with a core’s access to its own cache.</p>
<p>The cost associated with hardware enforced coherency is not well understood and thus neglected by most programmers. While configuring system coherency, programmers unknowingly make memory coherent among all hardware blocks and cores across the system. As discussed, for multicore systems ignoring coherency cost can result in low CPU throughput due to unnecessary snoop traffic.</p>
<p>In this paper we present guidelines to mitigate performance challenges arising out of mismatch between hardware coherency configuration and actual application requirements.</p>
<p><strong>2 &#8211; Concept of Coherency</strong></p>
<p>The effectiveness of Multicore systems relies on parallel software achieving continuous exponential performance gains. Most parallel software in the commercial market rely on the shared-memory programming model in which all processors access the same physical address space. Although processors logically access the same memory, on-chip cache hierarchies are crucial to achieving fast performance for the majority of memory references made by processors. Thus a key problem of shared-memory multiprocessors is providing a consistent view of memory with various cache hierarchies. This cache coherence problem is a critical correctness and performance-sensitive design point for supporting the shared-memory model. The cache coherence mechanisms not only govern communication in a shared-memory multiprocessor, but also typically determine how the memory system transfers data between processors, caches, and memory. Assuming the shared memory programming model remains prominent, future workloads will depend upon the performance of the cache coherent memory system.</p>
<p>A widely-adopted approach to cache coherence is snooping on a bus. A bus connects all components to an electrical, or logical, set of wires. A bus provides key ordering and atomicity properties that enable straightforward coherence operations. First, all endpoints on a bus observe transmitted messages in the same total order. Second, buses provide atomicity such that only one message can appear on the bus at a time and that all endpoints observe the message. Third, buses implement shared lines that allow any endpoint to manipulate a signal or condition that is globally visible to all other endpoints during a bus transaction. Shared lines facilitate both bus arbitration and cache coherence operations.</p>
<p><strong>3 &#8211; Nature of Packet Processing Applications</strong></p>
<p>Packet processing applications such as IP routers, Layer 2 switches etc are typically split into control plane and data plane.  The control plane implements the algorithmic intensive part of application (e.g. route calculation). It typically does state full processing and executes long state machines per input event. The number of frames processed by control plane is a very small fraction compared to the number of frames processed by data plane.</p>
<p>The data plane processes bulk of the incoming frames. It typically operates upon the frame headers and the processing involves header parsing, table lookups and header modification, encapsulation, de-capsulation etc. Accessing frame headers requires the frame to be brought inside core local cache. This is accomplished either on incurring a cache miss or by stashing operation by the I/O devices which can pre-position frame headers inside core local cache.</p>
<p>The frames processed by data plane are typically not required to be accessed by control plane. Both the planes work pretty much independent of each of other and have very low data sharing pattern. The control plane occasionally communicates with data plane with special proprietary control events to manage tables and connections used by data plane. These events are passed using known IPC methods such as message queues.</p>
<p>In some cases, data plane needs to forward frame to control plane and vice versa. But overall count of such frames is extremely small.</p>
<p>On a Multicore processor, it is common to reserve two non-overlapping sets of cores for control plane and data plane each. The number of cores reserved for each plane depends on its compute horse power requirement. Larger number of cores for data plane means greater frame processing capability.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.multicorepacketprocessing.com/software-assisted-snoop-containment-for-packet-processing-applications-part-1-of-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

