<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>LeafLabs Garden &#187; Topic: DMA memcpy  mem2mem</title>
		<link>http://forums.leaflabs.com/topic.php?id=1849</link>
		<description>A place to share, learn, and grow...</description>
		<language>en-US</language>
		<pubDate>Fri, 22 Jan 2016 00:08:45 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.2</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://forums.leaflabs.com/search.php</link>
		</textInput>
		<atom:link href="http://forums.leaflabs.com/rss.php?topic=1849" rel="self" type="application/rss+xml" />

		<item>
			<title>Vasudev on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849&amp;page=2#post-105411</link>
			<pubDate>Tue, 27 May 2014 10:52:05 +0000</pubDate>
			<dc:creator>Vasudev</dc:creator>
			<guid isPermaLink="false">105411@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;Thanks Manitou,&#60;br /&#62;
Yes but those functions were for embedded systems. I was looking for some simple dma_memcpy implementation for x86 system from userspace, i searched a lot about this but because for lake of device driver experience i was enable to do dma memcpy. If you have done dma memcpy for x86 systems, that will be very very helpful.&#60;br /&#62;
Thanks
&#60;/p&#62;</description>
		</item>
		<item>
			<title>manitou on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849&amp;page=2#post-105401</link>
			<pubDate>Fri, 23 May 2014 16:25:55 +0000</pubDate>
			<dc:creator>manitou</dc:creator>
			<guid isPermaLink="false">105401@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;The first entry in this thread has the functions I used for DMA memcpy()...&#60;/p&#62;
&#60;p&#62;Performance comparisons with teensy and Due are here&#60;br /&#62;
  &#60;a href=&#34;https://github.com/manitou48/DUEZoo/blob/master/mem2mem.txt&#34; rel=&#34;nofollow&#34;&#62;https://github.com/manitou48/DUEZoo/blob/master/mem2mem.txt&#60;/a&#62;
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Vasudev on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849&amp;page=2#post-105398</link>
			<pubDate>Thu, 22 May 2014 05:04:58 +0000</pubDate>
			<dc:creator>Vasudev</dc:creator>
			<guid isPermaLink="false">105398@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;Hi,&#60;br /&#62;
Can you tell me how to implement dma_memcpy?&#60;br /&#62;
Many Thanks in adv.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>gbulmer on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849&amp;page=2#post-11319</link>
			<pubDate>Tue, 19 Jun 2012 07:57:36 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">11319@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;manitou - this is a fascinating set of experiments you've come up with. Thank you for sharing.&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;
baseline: 1024 32-bit word DMA memcpy: 96us (microsceconds) lib memcpy: 58 us (total 154us)&#60;/p&#62;
&#60;p&#62;dueling memcpy:&#60;br /&#62;
gather micros() timestamp in DMA isr. Start DMA memcpy followed by lib memcpy&#60;br /&#62;
... DMA memcpy: 119us lib memcpy: 64us
&#60;/p&#62;&#60;/blockquote&#62;
&#60;p&#62;So lib memcpy is about 6us, about 10% slower with DMA running, and DMA memcpy is about 25us slower, about 25%, with lib memcpy running for 64us of its run time.&#60;br /&#62;
Have you tried sruuning two lib memcpy's, one straight after the other, so that DMA memcpy overlaps with lib memcpy for all/most of its run time?&#60;/p&#62;
&#60;p&#62;Is DMA memcpy and lib memcpy reading or writing to the same memory?&#60;/p&#62;
&#60;p&#62;Is it feasible to run two DMA memcpy's on different DMA controllers concurrently?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>manitou on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849&amp;page=2#post-11202</link>
			<pubDate>Mon, 11 Jun 2012 07:29:16 +0000</pubDate>
			<dc:creator>manitou</dc:creator>
			<guid isPermaLink="false">11202@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;&#38;gt; When do you start counting for the DMA memcpy? &#60;/p&#62;
&#60;p&#62;Starting the clock right before dma_enable knocks off a few microseconds.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>blackswords on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11200</link>
			<pubDate>Mon, 11 Jun 2012 02:27:40 +0000</pubDate>
			<dc:creator>blackswords</dc:creator>
			<guid isPermaLink="false">11200@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;manitou &#38;gt; When do you start counting for the DMA memcpy? If it's before memcpy32() you lose some time to initialize the DMA so you don't only measure the copy time but also the initialization time. It would be better to use a DMA init function then a start function with just this inside of it&#60;/p&#62;
&#60;p&#62;dma_enable(DMAn, DMA_CHn);                    //enable it..&#60;br /&#62;
while(!DMADONE);&#60;br /&#62;
dma_disable(DMAn, DMA_CHn);
&#60;/p&#62;</description>
		</item>
		<item>
			<title>manitou on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11197</link>
			<pubDate>Sun, 10 Jun 2012 16:44:22 +0000</pubDate>
			<dc:creator>manitou</dc:creator>
			<guid isPermaLink="false">11197@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;update&#60;/p&#62;
&#60;p&#62;&#38;gt;DMA memcpy: 119us lib memcpy: 64us (total 183us)&#60;/p&#62;
&#60;p&#62;Actually, the elapsed time for the concurrent copies was 119us (551 megabits/sec) which is faster than the serial baseline time of 154us.  Though two serial lib mcmcpy's would take 116us.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>manitou on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11195</link>
			<pubDate>Sun, 10 Jun 2012 10:29:24 +0000</pubDate>
			<dc:creator>manitou</dc:creator>
			<guid isPermaLink="false">11195@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;&#38;gt; running the DMA memcpy at the same time as ordinary memcpy&#60;/p&#62;
&#60;p&#62;baseline: 1024 32-bit word DMA memcpy:  96us (microsceconds)   lib memcpy:  58 us  (total 154us)&#60;/p&#62;
&#60;p&#62;dueling memcpy:&#60;br /&#62;
gather micros() timestamp in DMA isr.  Start DMA memcpy followed by lib memcpy&#60;br /&#62;
...  DMA memcpy: 119us    lib memcpy: 64us  (total 183us)&#60;/p&#62;
&#60;p&#62;Though DMA memcpy started first, it completed after the lib memcpy.&#60;br /&#62;
I used 4 1024 32-bit word vectors.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>gbulmer on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11186</link>
			<pubDate>Sat, 09 Jun 2012 16:27:24 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">11186@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;manitou - interesting. Have you tried running the DMA memcpy at the same time as ordinary memcpy? Do they slow each other down?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>manitou on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11184</link>
			<pubDate>Sat, 09 Jun 2012 14:12:57 +0000</pubDate>
			<dc:creator>manitou</dc:creator>
			<guid isPermaLink="false">11184@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;I took a look at disassembled code for memcpy() for IDE 0.12 which uses newlib 1.17, i think.  If it can, it uses an unrolled loop of ldr.w/str.w, 16 of these word ld/st, to move 64 bytes per loop iteration.  That memcpy() takes about 59 microseconds to move 1024 32-bit words.  I then built a version using the ARM memcpy.S from newlib 1.20 and that took 53us, where it uses an unrolled loop of&#60;br /&#62;
ldrd/strd to move 64 bytes per loop iteration.  so newlib in IDE 0.12 is doing just fine.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>mbolivar on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11101</link>
			<pubDate>Wed, 06 Jun 2012 03:40:47 +0000</pubDate>
			<dc:creator>mbolivar</dc:creator>
			<guid isPermaLink="false">11101@http://forums.leaflabs.com/</guid>
			<description>&#60;blockquote&#62;&#60;p&#62;
dma.c doesn't have a function for querying DMA_CNDTRx, so i hacked&#60;br /&#62;
volatile uint32_t *dmacnt = (uint32_t *) 0x4002000C;&#60;br /&#62;
... while(*dmacnt);&#60;/p&#62;
&#60;/blockquote&#62;
&#60;p&#62;that's what register maps are for. use &#60;code&#62;DMAx_BASE-&#38;gt;CNDTRy&#60;/code&#62;, e.g. &#60;code&#62;DMA1_BASE-&#38;gt;CNDTR1&#60;/code&#62;.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>gbulmer on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11084</link>
			<pubDate>Tue, 05 Jun 2012 13:29:47 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">11084@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;'my little &#34;b&#34; in mbs is megabits/second' - Mb is mega-bits, MB is mega-bytes. 'mbs'? I guessed wrong :-)&#60;/p&#62;
&#60;p&#62;'No change' - that is surprising.&#60;/p&#62;
&#60;p&#62;348 megabits/sec = 43.5MBytes/sec seems a bit of a weird number, and I'm surprised it isn't faster; assuming 4 byte transfers, that is under 11M reads, and 11M writes.&#60;/p&#62;
&#60;p&#62;I wonder if the processor and DMA are contending for the bus matrix?&#60;/p&#62;
&#60;p&#62;The bus-architecture diagrams show the two DMA controllers as having separate connections to the bus matrix.&#60;/p&#62;
&#60;p&#62;Have you tried setting up both DMA controllers, and using them simultaneously? &#60;/p&#62;
&#60;p&#62;The other thought is to 'wait for event' to get the processor off the bus, but I've never tried that.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>manitou on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11083</link>
			<pubDate>Tue, 05 Jun 2012 13:13:07 +0000</pubDate>
			<dc:creator>manitou</dc:creator>
			<guid isPermaLink="false">11083@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;Re: newlib&#60;/p&#62;
&#60;p&#62;&#38;gt; There doesn't seem to be a specialised version of memcpy or memset for arm ...&#60;/p&#62;
&#60;p&#62;the newlib-1.20 at redhat does have arm assembler versions (memcpy.S), which uses LDRD/STRD assuming suitable sizes/alignment.&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;1:
        .irp    offset, #0, #8, #16, #24, #32, #40, #48, #56
        ldrd    r4, r5, [r1, \offset]
        strd    r4, r5, [r0, \offset]
        .endr

        add     r0, r0, #64
        add     r1, r1, #64
        subs    r2, r2, #64
        bge     1b&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;that loop runs close to 940 megabits/sec at 72MHz, if I counted cycles correctly...
&#60;/p&#62;</description>
		</item>
		<item>
			<title>manitou on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11079</link>
			<pubDate>Tue, 05 Jun 2012 08:09:48 +0000</pubDate>
			<dc:creator>manitou</dc:creator>
			<guid isPermaLink="false">11079@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;&#38;gt;while(!DMADONE); might have consumed significant memory bandwidth, and caused DMA to be slower&#60;/p&#62;
&#60;p&#62;dma.c doesn't have a function for querying DMA_CNDTRx, so i hacked&#60;br /&#62;
  volatile uint32_t *dmacnt = (uint32_t *) 0x4002000C;&#60;br /&#62;
...  while(*dmacnt);&#60;/p&#62;
&#60;p&#62;No change, DMA's of 1000 4-byte words still 93us  (348 megabits/sec)
&#60;/p&#62;</description>
		</item>
		<item>
			<title>manitou on "DMA memcpy  mem2mem"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1849#post-11078</link>
			<pubDate>Tue, 05 Jun 2012 05:59:56 +0000</pubDate>
			<dc:creator>manitou</dc:creator>
			<guid isPermaLink="false">11078@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;&#38;gt;Assuming 72MHz, and 4 bytes per clock cycle, would give a maximum of 288MBytes/second of bandwidth. &#60;/p&#62;
&#60;p&#62; my little &#34;b&#34; in mbs is megabits/second, so 288MBs = 2304mbs&#60;/p&#62;
&#60;p&#62;&#38;gt;while(!DMADONE); might have consumed significant memory bandwidth, and caused DMA to be slower than it should be. &#60;/p&#62;
&#60;p&#62;  good thought. i'll check it out&#60;/p&#62;
&#60;p&#62;&#38;gt; Maple 'standard C library' is newlib&#60;/p&#62;
&#60;p&#62;  Linux kernel's often have good hard-coded memcpy/memset.  i think fastest  use LDM/STM. Here is a discussion of memcpy on Cortex A8&#60;/p&#62;
&#60;p&#62; &#60;a href=&#34;http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html&#34; rel=&#34;nofollow&#34;&#62;http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;thanks for insights
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
