<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>LeafLabs Garden &#187; Topic: Maple speed</title>
		<link>http://forums.leaflabs.com/topic.php?id=895</link>
		<description>A place to share, learn, and grow...</description>
		<language>en-US</language>
		<pubDate>Fri, 22 Jan 2016 00:25:28 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.2</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://forums.leaflabs.com/search.php</link>
		</textInput>
		<atom:link href="http://forums.leaflabs.com/rss.php?topic=895" rel="self" type="application/rss+xml" />

		<item>
			<title>gbulmer on "Maple speed"</title>
			<link>http://forums.leaflabs.com/topic.php?id=895#post-5508</link>
			<pubDate>Sat, 02 Jul 2011 19:12:10 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">5508@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;joe_c -&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;My quick thumbnail test was to flip an I/O bit and measure the resultant clock frequency, testing the processor and environment. It is a flawed test to be sure.&#60;/p&#62;
&#60;/blockquote&#62;
&#60;p&#62;If that's what is important to you folks, it's a good test!&#60;/p&#62;
&#60;p&#62;The main flaw is using the wirish library.&#60;br /&#62;
Wirish is based on the Arduino's 'Wire' library, where simplicity and ease of use are primary requirements, and performance suffers.&#60;/p&#62;
&#60;p&#62;Maple also has the lower level libmaple library. That often uses C++ &#60;code&#62;inline&#60;/code&#62; (e.g. gpio_write_bit) to remove the overhead of instruction to do a function call, and hence can get significantly faster when the function is small because the function calling instructions may take much more time than the actual work.&#60;/p&#62;
&#60;p&#62;The Atmel XMega is cleverly designed. Many load or store instructions take a single cycle, where the ARM Cortex-M3 may take two. So the performance may be quite close on programs where 8-bit data dominates the work load, even thought the STM32F runs at 72MHz. (Sadly, Atmel seems to have hired the same moron that ST hired to revamp their web site, so I can't find the XMega datasheets and manuals, and I don't have them on this laptop)&#60;/p&#62;
&#60;p&#62;I think the STM32F has some better peripherals, but that might not be a big enough difference to matter fort your applications, and if folks are comfortable with the Atmel peripherals the Xmega may be easier to use. &#60;/p&#62;
&#60;p&#62;The Cortex-M3 handles 8, 16 or 32-bit data at a time vs 8 bit data for the Xmega, so the STM32F may win quite strongly in some areas. For example, there is a fixed point arithmetic library which likely runs on the STM32F between 2-8 times faster than the Xmega (I haven't done the measurements, that is a SWAG).&#60;/p&#62;
&#60;p&#62;One area which I prefer on the STM32F, which was a pain on the ATmega is read-only data (e.g. strings). The STM32F has a single address space, so read-only data stored in flash is handled by the same instructions as data stored in RAM. On the ATmega, that is not the case. Data stored in Flash is not the same as data stored in RAM, so you end up writing two copies of some functions (as I wrote above, I can't find the Xmega manuals which might confirm or deny that problem still exists, so I may be wrong, and the Xmega may have fixed that problem, but I don't think so)&#60;/p&#62;
&#60;p&#62;Anyway, try the code:&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;GPIOA_BASE-&#38;gt;BSRR = 0x00000010; /* set GPIO port A pin 5, LED pin 13, via BSRR register */
GPIOA_BASE-&#38;gt;BSRR = 0x00100000; /* clear GPIO port A pin 5, LED pin 13, via BSRR register */&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;(in a large block) and see how fast it toggles a pin. Then do the same on the Xmega. It should be very close.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>joe_c on "Maple speed"</title>
			<link>http://forums.leaflabs.com/topic.php?id=895#post-5507</link>
			<pubDate>Sat, 02 Jul 2011 18:18:59 +0000</pubDate>
			<dc:creator>joe_c</dc:creator>
			<guid isPermaLink="false">5507@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;gbulmer;  I have no specific project in mind.  At work, I am a technician only in a department that largely uses Atmel.  I am not a programmer except at the hobbiest level.  I leave that work to the engineers and programmers in my department.  I do have some influence at the hardware level.&#60;/p&#62;
&#60;p&#62;Currently we are using AtMega 2560/61's for the top of the line processor in our products.  Before we move into AtXmega, I wanted to see if there was a better processor and tool-chain available.  I hoped that ARM and the 32 bit wide architecture would be a winner.  &#60;/p&#62;
&#60;p&#62;My quick thumbnail test was to flip an I/O bit and measure the resultant clock frequency, testing the processor and environment.  It is a flawed test to be sure.  &#60;/p&#62;
&#60;p&#62;I am interested in your (and robodude's) comments.  I plan to read them and educate myself.  &#60;/p&#62;
&#60;p&#62;Thanks;&#60;br /&#62;
Joe C
&#60;/p&#62;</description>
		</item>
		<item>
			<title>gbulmer on "Maple speed"</title>
			<link>http://forums.leaflabs.com/topic.php?id=895#post-5504</link>
			<pubDate>Sat, 02 Jul 2011 15:06:30 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">5504@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;joe_c - what is it you are trying to do?&#60;/p&#62;
&#60;p&#62;Unless you are familiar with Thumb2 assembler, you will be spending a significant amount of time trying to produce better code than the compiler, and may discover that it is very hard, or you can't. ARM was part of the wave of RISC processors which were designed to be targeted by high-level language compilers (typified by MIPS).&#60;/p&#62;
&#60;p&#62;I am a big fan of Jon L Bentley's Writing Efficient Programs:&#60;br /&#62;
&#60;a href=&#34;http://www.amazon.com/Writing-Efficient-Programs-Prentice-Hall-Software/dp/013970244X&#34; rel=&#34;nofollow&#34;&#62;http://www.amazon.com/Writing-Efficient-Programs-Prentice-Hall-Software/dp/013970244X&#60;/a&#62;&#60;br /&#62;
(I used to teach undergrad and post grad CS &#38;amp; Software Engineering)&#60;/p&#62;
&#60;p&#62;Bentley suggest a six-layer model to consider efficiency, three are software and three hardware.&#60;br /&#62;
Each layer typically yields a 10x efficiency gain.&#60;br /&#62;
The effectiveness of translation of program code to binary is only one of the three software layers, and the compiler is likely to be pretty good (the C compiler has been developed and improved for many, many years). So it is often easier to go look at a different layer to get a significant efficiency improvement.&#60;/p&#62;
&#60;p&#62;Directly accessing the hardware is straightforward if you are accustomed to C.&#60;br /&#62;
There are a bunch of helpful pre-defined macros in the libmaple gpio.h header which correspond to the General Purpose I/O (GPIO) ports.&#60;/p&#62;
&#60;p&#62;So you could write:&#60;br /&#62;
(unsigned int*)0x40010800+0x10 = 0x00000010; /* set pin A5 via GPIO port A, BSRR register */&#60;br /&#62;
or&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00000010; /* set pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;br /&#62;
Both C statements compile to the same instructions. It is a short sequence because the compiler is dealing with constants which it can calculate for you at compile time.&#60;/p&#62;
&#60;p&#62;Once the value of &#60;code&#62;GPIOA_BASE-&#38;gt;BSRR&#60;/code&#62;and &#60;code&#62;0x00000010&#60;/code&#62; is loaded into registers,&#60;br /&#62;
the &#60;code&#62;GPIOA_BASE-&#38;gt;BSRR = 0x00000010&#60;/code&#62; is one instruction.&#60;/p&#62;
&#60;p&#62;So toggling pins with:&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00000010; /* set pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00100000; /* clear pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00000010; /* set pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00100000; /* clear pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00000010; /* set pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00100000; /* clear pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00000010; /* set pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;br /&#62;
GPIOA_BASE-&#38;gt;BSRR = 0x00100000; /* clear pin A5, which is the LED pin, via GPIO port A, BSRR register */&#60;/p&#62;
&#60;p&#62;Should generate a sequence which is as quick as the hardware can go. (This technique lets you toggle some or all 16 pins in a port)&#60;/p&#62;
&#60;p&#62;I'd recommend reading the libmaple source code, which is Open Source and available, and use some of its techniques to make your life easier.&#60;br /&#62;
You can get the source at &#60;a href=&#34;https://github.com/leaflabs/libmaple&#34; rel=&#34;nofollow&#34;&#62;https://github.com/leaflabs/libmaple&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;These threads give some concrete advice on how to do I/O quickly&#60;br /&#62;
&#60;a href=&#34;http://forums.leaflabs.com/topic.php?id=517&#34; rel=&#34;nofollow&#34;&#62;http://forums.leaflabs.com/topic.php?id=517&#60;/a&#62;&#60;br /&#62;
&#60;a href=&#34;http://forums.leaflabs.com/topic.php?id=718&#34; rel=&#34;nofollow&#34;&#62;http://forums.leaflabs.com/topic.php?id=718&#60;/a&#62;&#60;br /&#62;
&#60;a href=&#34;http://forums.leaflabs.com/topic.php?id=737&#34; rel=&#34;nofollow&#34;&#62;http://forums.leaflabs.com/topic.php?id=737&#60;/a&#62;&#60;br /&#62;
&#60;a href=&#34;http://forums.leaflabs.com/topic.php?id=774&#34; rel=&#34;nofollow&#34;&#62;http://forums.leaflabs.com/topic.php?id=774&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;I have got 18MHz on my oscilloscope, using that somewhat ugly repetitive code, and 12MHz with more normal looking stuff.&#60;/p&#62;
&#60;p&#62;Get the latest copy of the STM32F103 manual &#34;RM0008 Reference manual&#34;, currently:&#60;br /&#62;
&#60;a href=&#34;http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/REFERENCE_MANUAL/CD00171190.pdf&#34; rel=&#34;nofollow&#34;&#62;http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/REFERENCE_MANUAL/CD00171190.pdf&#60;/a&#62;&#60;br /&#62;
It describes every peripheral in loving detail, and also gives an overall memory map.&#60;br /&#62;
If you want to change or toggle pins quickly, have a look at Section 9 &#34;General-purpose and alternate-function I/Os (GPIOs and AFIOs)&#34;. Each pin of each GPIO port has a memory address, so you can read/write in a single 2 cycle instruction. You'll need to lookup 'bit-banding. and may need an ARM technical reference manual.&#60;/p&#62;
&#60;p&#62;Also get a copy of the STM32F103x8/STM32F103xB Datasheet if you have a standard Maple:&#60;br /&#62;
&#60;a href=&#34;http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATASHEET/CD00161566.pdf&#34; rel=&#34;nofollow&#34;&#62;http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATASHEET/CD00161566.pdf&#60;/a&#62;&#60;br /&#62;
of the STM32F103xC/xD/xE Datasheet if you have a RET6:&#60;br /&#62;
&#60;a href=&#34;http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATASHEET/CD00191185.pdf&#34; rel=&#34;nofollow&#34;&#62;http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATASHEET/CD00191185.pdf&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;These gives the memory map of the STM32F103 that is on your board.&#60;/p&#62;
&#60;p&#62;Please post questions if you'd like some more discussion.&#60;/p&#62;
&#60;p&#62;(full disclosure: I am not a member of LeafLabs staff)
&#60;/p&#62;</description>
		</item>
		<item>
			<title>robodude666 on "Maple speed"</title>
			<link>http://forums.leaflabs.com/topic.php?id=895#post-5503</link>
			<pubDate>Sat, 02 Jul 2011 13:13:14 +0000</pubDate>
			<dc:creator>robodude666</dc:creator>
			<guid isPermaLink="false">5503@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;It's okay. bbPress doesn't have the best search. Sometimes it's hard to find something without knowing exactly what the thread's name is.&#60;/p&#62;
&#60;p&#62;Although writing your application in high-optimized assembly would yield the best possible performance, it would also yield the most hair ripped out of your head. Using the low-level libmaple c library will yield much better performance than the wirish equivalents, and is generally good enough. Especially if you optimize your program to hide the extra clock cycles required for back-to-back GPIO access.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>joe_c on "Maple speed"</title>
			<link>http://forums.leaflabs.com/topic.php?id=895#post-5502</link>
			<pubDate>Sat, 02 Jul 2011 12:21:34 +0000</pubDate>
			<dc:creator>joe_c</dc:creator>
			<guid isPermaLink="false">5502@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;I thought I would get better performance from assembly.  I wanted to see what the high level language would give me.  You've given me a lot to think about.  Thank you for the insightful post.&#60;/p&#62;
&#60;p&#62;Joe C.&#60;/p&#62;
&#60;p&#62;I just looked at your link.  Sorry for the duplicate post.  Next time I will dig a little further.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>robodude666 on "Maple speed"</title>
			<link>http://forums.leaflabs.com/topic.php?id=895#post-5501</link>
			<pubDate>Sat, 02 Jul 2011 11:38:28 +0000</pubDate>
			<dc:creator>robodude666</dc:creator>
			<guid isPermaLink="false">5501@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;Performance has been discussed previously in the forums. I recommend taking a look at the &#60;a href=&#34;http://forums.leaflabs.com/topic.php?id=860&#34;&#62;Low Level pin manipulation&#60;/a&#62; and &#60;a href=&#34;http://forums.leaflabs.com/topic.php?id=774#post-4611&#34;&#62;High-Speed GPIO Access?&#60;/a&#62; threads.&#60;/p&#62;
&#60;p&#62;Note that in the Low Level pin manipulation thread I used &#60;code&#62;uint16_t bit9 = BIT(9);&#60;/code&#62; in attempt to optimize the toggle. This is WRONG. It turns &#60;a href=&#34;http://forums.leaflabs.com/topic.php?id=860#post-5206&#34;&#62;out&#60;/a&#62; the compiler will evaluate any constants so using &#60;code&#62;1&#38;lt;&#38;lt;9&#60;/code&#62; or &#60;code&#62;BIT(9)&#60;/code&#62; as &#60;a href=&#34;http://forums.leaflabs.com/topic.php?id=860#post-5201&#34;&#62;snigelen&#60;/a&#62; showed is preferred; it makes for cleaner code and saves a memory read.&#60;/p&#62;
&#60;p&#62;The reason for &#60;code&#62;togglePin&#60;/code&#62;, along with &#60;code&#62;digitalWrite&#60;/code&#62; and the remaining wirish functions (the same applies to Arduino's functions and wiring - not just Maple), are lacking in performance is because of the overhead incurred to make the functionality simpler to use. When you call &#60;code&#62;togglePin&#60;/code&#62; you're really calling:&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;void togglePin(uint8 pin) {
    if (pin &#38;gt;= BOARD_NR_GPIO_PINS) {
        return;
    }

    gpio_toggle_bit(PIN_MAP[pin].gpio_device, PIN_MAP[pin].gpio_bit);
}&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;You must go through a pin validation, which in itself is a few assembly instructions. You must then access the corresponding gpio device and bit for that particular digital pin which is a memory read (and more &#34;expensive&#34;). You then pass these arguments to the &#60;code&#62;gpio_toggle_bit&#60;/code&#62; function which is a jump assembly command (though the inline attempts to optimize it). Within &#60;code&#62;gpio_toggle_bit&#60;/code&#62;:&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;static inline void gpio_toggle_bit(gpio_dev *dev, uint8 pin) {
    dev-&#38;gt;regs-&#38;gt;ODR = dev-&#38;gt;regs-&#38;gt;ODR ^ BIT(pin);
}&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;there is a GPIO read, a binary shift, an OR, and a GPIO write. All of this takes some time. What could be a 18MHz square wave turns into a ~700kHz square wave.&#60;/p&#62;
&#60;p&#62;Also, the ATXMega, like discussed in the High-Speed GPIO Access thread, requires only 1 cycle per GPIO read/write. The STM32F103 series requires 2 cycles per write. There are ways to optimize performance, as discussed in the linked-to threads. Ultimately, however, the STM32F103's 72MHz clock will make it a faster MCU. It is capable of doing computations much faster than a 32MHz ATXMega and will ultimately result in faster execution even if the GPIO writes takes 2 cycles.&#60;/p&#62;
&#60;p&#62;Ultimately, you don't want to toggle bits like this manually. Use hardware features whenever possible. If you need a square wave, use timer-based PWM generation functionality of the STM32. You'll get much better performance. If you need to toggle a heartbeat LED, using &#60;code&#62;toggleLED&#60;/code&#62;, and &#60;code&#62;digitalWrite&#60;/code&#62; will suffice. If you need high-speed GPIO for reading/writing to external devices consider using the DMA controller to make sure you're getting the full 18MHz capability the GPIO can output.&#60;/p&#62;
&#60;p&#62;-robodude666
&#60;/p&#62;</description>
		</item>
		<item>
			<title>joe_c on "Maple speed"</title>
			<link>http://forums.leaflabs.com/topic.php?id=895#post-5500</link>
			<pubDate>Sat, 02 Jul 2011 11:04:47 +0000</pubDate>
			<dc:creator>joe_c</dc:creator>
			<guid isPermaLink="false">5500@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;Hi all;  I'm new to the club.  I just received my Maple r5 this week.  I wanted to get a feel for how fast this board so I created a quick program to toggle an I/O pin.  The main loop of the program is empty except for a togglePin() command.&#60;br /&#62;
I measured the square wave out with my oscilloscope. I was surprised to find that the frequency was not the fastest I'd seen.  I repeated the experiment with any processor I could find.  Here are the results: &#60;/p&#62;
&#60;p&#62; Fastest, an AtXmega128, output = 3 Mhz, code was created with a commercial compiler Codevision. Clocked at 32 Mhz.&#60;br /&#62;
 Maple was next at 810khz, code was created with the Leaf compiler.&#60;br /&#62;
 Arduino UNO, output = 120 khz, code was created with Arduino V22.&#60;br /&#62;
 Arduino MEGA 2560 = 75 khz.&#60;/p&#62;
&#60;p&#62;Could this be a result of compiler efficiencies?  register width?   I am only a hobby level programmer and I suspect my tests are not valid.  Any thoughts?  &#60;/p&#62;
&#60;p&#62;Joe C.
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
