<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>LeafLabs Garden &#187; Topic: Sure Electronics 3216 bicolor display</title>
		<link>http://forums.leaflabs.com/topic.php?id=1441</link>
		<description>A place to share, learn, and grow...</description>
		<language>en-US</language>
		<pubDate>Fri, 22 Jan 2016 00:25:26 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.2</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://forums.leaflabs.com/search.php</link>
		</textInput>
		<atom:link href="http://forums.leaflabs.com/rss.php?topic=1441" rel="self" type="application/rss+xml" />

		<item>
			<title>gbulmer on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8575</link>
			<pubDate>Sat, 10 Mar 2012 08:06:03 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">8575@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;lonewolf - my further analysis ...&#60;/p&#62;
&#60;p&#62;Clearly, a frame rate above about 60 frames/second is not necessary because of human's relatively slow persistence of vision. We just won't see any benefit above that sort of rate. So I assume frame rate is an indirect measure of how much time is available to do other things, like render text or images.&#60;/p&#62;
&#60;p&#62;Assuming the UART interface is fast enough to reach 60 frames/second, then the question becomes does this approach leave more free time for the processor to do other things? Agreed?&#60;/p&#62;
&#60;p&#62;There are three ways to use the UART:&#60;br /&#62;
1. Direct, polled UART&#60;br /&#62;
2. Interrupt driven&#60;br /&#62;
3. DMA driven&#60;/p&#62;
&#60;p&#62;3. DMA driven&#60;br /&#62;
This technique isn't possible on Arduino's AVR, but both the Maple STM32 and the ChipKit PIC32 could do this.&#60;br /&#62;
Assuming the 'frame buffer' can be arranged conveniently, then DMA should use the least processor time, and leave the most 'free CPU time' for other processing. If you are lucky, it may be possible to set up DMA to continuously refresh the LED controller without any processor intervention, ever.&#60;br /&#62;
So this has the potential of 'orders of magnitude' less CPU load than any other technique.&#60;/p&#62;
&#60;p&#62;1. Direct, polled UART.&#60;br /&#62;
I assume this uses the most CPU time to refresh the display controller of the three approaches, but, depending on the baud rate, it might still be comparable with bit-banging. A 'screen refresh' could be triggered by a timer interrupt.&#60;/p&#62;
&#60;p&#62;2. Interrupt driven&#60;br /&#62;
This will take a bit of estimation. I'll assume the frame buffer data dominates, and otherwise the UART uses a similar number of bytes.&#60;/p&#62;
&#60;p&#62;Sending a single bit, using bit-band addressing, is, roughly:&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;// send one bit
    *_data = (bits &#38;amp; msb) ? 1 : 0;
    *_wr = 0;
    *_wr = 1;
    msb &#38;gt;&#38;gt;= 1;&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;- three pin writes are 3 x 2 cycle instructions&#60;br /&#62;
- &#60;code&#62;msb &#38;gt;&#38;gt; = 1&#60;/code&#62; is 1 cycle&#60;br /&#62;
- &#60;code&#62;(bits &#38;amp; msb) ? 1 : 0&#60;/code&#62; is at least 1 cycle&#60;br /&#62;
So let's call that 8 cycles/bit.&#60;br /&#62;
Of course, it is a bit slower in reality because of loop overhead, and this may be faster than the HT3216 can receive data, but this'll do for a quick and dirty comparison (if the UART has lower overhead, then these other bit-banging overheads tilt the balance further towards the UART).&#60;/p&#62;
&#60;p&#62;Taking and returning from an interrupt is 12+12 cycles.&#60;br /&#62;
writing a byte to the UART might be&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;// write one byte to UART
    UART_DATA_BUFFER = *current_buffer++;  // must be a byte because the interrupt was on
    if (current_buffer &#38;gt;= BUFFER_END) disable_UART_interrupt;&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;So, assuming the buffer contains all the data, this will get called repeatedly, until the whole 'frame buffer' has been sent.&#60;br /&#62;
I estimate:&#60;br /&#62;
- load &#60;code&#62;current_buffer&#60;/code&#62; - worst case 4 cycles&#60;br /&#62;
- load &#60;code&#62;UART_DATA_BUFFER&#60;/code&#62; - worst case 4 cycles&#60;br /&#62;
- load &#60;code&#62;*current_buffer&#60;/code&#62; - 2 cycles&#60;br /&#62;
- store &#60;code&#62;*current_buffer&#60;/code&#62; to &#60;code&#62;UART_DATA_BUFFER&#60;/code&#62; - 2 cycles&#60;br /&#62;
- &#60;code&#62;current_buffer++&#60;/code&#62; - 2 cycles to store&#60;br /&#62;
- load &#60;code&#62;BUFFER_END&#60;/code&#62; - worst case 4 cycles&#60;br /&#62;
- compare &#60;code&#62;current_buffer&#60;/code&#62; with &#60;code&#62;BUFFER_END&#60;/code&#62; - 1 cycle&#60;br /&#62;
- return without disabling interrupts (the common case) 1 cycle&#60;br /&#62;
that ~= 20 cycles&#60;br /&#62;
interrupt (most common case) 44 cycles, which is less than bit banging 6 bits&#60;/p&#62;
&#60;p&#62;So even this may be faster than bit banging, but it is so close, it'd take careful analysis.&#60;/p&#62;
&#60;p&#62;But it may be possible to do better!-) &#60;/p&#62;
&#60;p&#62;The UART has two buffers, a data buffer, and the shift register which shifts out the data. According to RM0008, section 27.3.2 'Transmitter', the value written into the data register gets immediately copied to the shift register, if their is no active transmission. So if the interrupt can be arranged to happen at the end of a transmission, two bytes can be copied into the UART for each interrupt.&#60;/p&#62;
&#60;p&#62;The interrupt service routine will be slightly longer, but, I guess something like:&#60;br /&#62;
   test UART data register empty, or that there is no transmission - about 6 cycles&#60;br /&#62;
   load data byte if there is one, and store to UART data register - about 6 cycles&#60;br /&#62;
So for about 12 more cycles, a total of about 56 cycles, 16 bits are sent via the UART, in the time that bit-banging sends 7 bits.&#60;/p&#62;
&#60;p&#62;So this might be significantly faster, i.e. less CPU load, than bit-banging.&#60;/p&#62;
&#60;p&#62;The STM32 has, in effect, a two byte buffer, so this could be quite efficient.&#60;br /&#62;
The PIC32 has a 4 or 8 byte deep buffer, and so should work even better.&#60;br /&#62;
Using interrupts might work on AVR's with more than one USART, but the cost of AVR instructions will be very different, and so may be much worse.&#60;/p&#62;
&#60;p&#62;Observation: the fastest technique on an 8bit AVR may be far from optimal on a 32bit processor, with more peripherals and DMA support.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>gbulmer on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8568</link>
			<pubDate>Fri, 09 Mar 2012 22:24:29 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">8568@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;Is this the device:&#60;br /&#62;
&#60;a href=&#34;http://www.sureelectronics.net/goods.php?id=972&#34; rel=&#34;nofollow&#34;&#62;http://www.sureelectronics.net/goods.php?id=972&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;The manual (at that page) says it is a UART device.&#60;/p&#62;
&#60;p&#62;If that is true, you could use a Maple UART or USART to talk to it, and remove the bit banging. This isn't an option with an Arduino as the USART is connected to the USB interface, but the extra peripherals is one reason to use Maple.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>gbulmer on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8564</link>
			<pubDate>Fri, 09 Mar 2012 20:46:19 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">8564@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;lonewolf - do you have any measurements about which functions, or lines dominate the run time?&#60;/p&#62;
&#60;p&#62;I looked at:&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;void _writebits (register byte bits, register byte msb)
{
  while (msb) {
    !!(bits &#38;amp; msb) ? _data_set() : _data_clr();
    _wr_clr();
    _wr_set();
    msb &#38;gt;&#38;gt;= 1;
  }
}&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;If this were something like:&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;static inline void _writebits (register byte bits, register byte msb)
{
  static volatile uint32* const _data = bitband_peripheral_address(...);
  static volatile uint32* const _wr = bitband_peripheral_address(...);

  while (msb) {
    *_data = (bits &#38;amp; msb) ? 1 : 0;
    *_wr = 0;
    *_wr = 1;
    msb &#38;gt;&#38;gt;= 1;
  }
}&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;would likely be significantly quicker because there is no read, masking to build the bit pattern, then write for a whole 16bit port. That lot will be about 3x slower than a simple, single bit-band address write (as I explained above).&#60;/p&#62;
&#60;p&#62;The compiler might even unroll some of the loops if msb is a constant, making it even faster.&#60;/p&#62;
&#60;p&#62;Maybe you should reevaluate the idea of writing one common source implementation, with #defines to encapsulate the differences?&#60;/p&#62;
&#60;p&#62;Encapsulating differences is what classes are for.&#60;/p&#62;
&#60;p&#62;Instead write different classes, one for each processor, in simple, straightforward code, with no #if. Then look at the common parts, abstract them into a common base class, and maybe use specialised techniques in each processor specific derived class.&#60;/p&#62;
&#60;p&#62;If the specialised classes were template classes, much of the variability would be visible to the compiler at compile time, and it could do a good job of generating high-quality code. For example, the class might be templated on the I/O pins. In that case, likely all of the I/O can be done using port addresses known at compile time, so bit-band addressing, and the equivalent for AVR, could be calculated by the compiler, and optimal code generated.&#60;/p&#62;
&#60;p&#62;With a template class, _writebits could have all addresses known at compile time.&#60;br /&#62;
With static inline, the compiler could unroll loops too.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>ala42 on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8563</link>
			<pubDate>Fri, 09 Mar 2012 20:45:15 +0000</pubDate>
			<dc:creator>ala42</dc:creator>
			<guid isPermaLink="false">8563@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;Looks like you did not read gbulmer's comment&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;ARM digital I/O can be speeded-up enormously by using &#38;#39;bit band addressing&#38;#39;.
On STM32F every pin has two unique addresses,
one gives single instruction write to a pin, and
the other gives single instruction read of a pin.&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;Currently you use a read/modify/write access to set a bit, but a single write was sufficient.&#60;/p&#62;
&#60;p&#62;If you want to use just one port base address, you should at least do what gpio_write_bit does:&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;/**
 * Set or reset a GPIO pin.
 *
 * Pin must have previously been configured to output mode.
 *
 * @param dev GPIO device whose pin to set.
 * @param pin Pin on to set or reset
 * @param val If true, set the pin.  If false, reset the pin.
 */
static inline void gpio_write_bit(gpio_dev *dev, uint8 pin, uint8 val) {
    if (val) {
        dev-&#38;gt;regs-&#38;gt;BSRR = BIT(pin);
    } else {
        dev-&#38;gt;regs-&#38;gt;BRR = BIT(pin);
    }
}&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;and use dev-&#38;gt;regs as port address.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>lonewolf on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8562</link>
			<pubDate>Fri, 09 Mar 2012 20:20:51 +0000</pubDate>
			<dc:creator>lonewolf</dc:creator>
			<guid isPermaLink="false">8562@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;PS: Another optimization peculiarity I just noticed.&#60;/p&#62;
&#60;p&#62;Look at the function _update_fb():&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://code.google.com/p/ht1632c/source/browse/ht1632c.cpp#329&#34; rel=&#34;nofollow&#34;&#62;http://code.google.com/p/ht1632c/source/browse/ht1632c.cpp#329&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;On AVR and PIC32 the fastest way to set or reset - on the base of the &#34;taget&#34; value - the bit &#34;pixel&#34; pointed by &#34;ptr&#34; it's setting that value and, then, toggling it if &#34;target&#34; is zero.&#60;br /&#62;
On ARM the fastest way it's a simple set or reset. Confirmed by benchmarks and by dumping assembler of compiled code. Useful hint for optimization :)
&#60;/p&#62;</description>
		</item>
		<item>
			<title>lonewolf on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8561</link>
			<pubDate>Fri, 09 Mar 2012 20:05:02 +0000</pubDate>
			<dc:creator>lonewolf</dc:creator>
			<guid isPermaLink="false">8561@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;Hi gbulmer, I suspect who you're looking at an old version of my code.&#60;/p&#62;
&#60;p&#62;In the last version low level function were moved in ht1632c.cpp:&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://code.google.com/p/ht1632c/source/browse/ht1632c.cpp#121&#34; rel=&#34;nofollow&#34;&#62;http://code.google.com/p/ht1632c/source/browse/ht1632c.cpp#121&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;...and bit-banding and cpp macros are not used anymore.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>gbulmer on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8559</link>
			<pubDate>Fri, 09 Mar 2012 19:39:09 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">8559@http://forums.leaflabs.com/</guid>
			<description>&#60;blockquote&#62;&#60;p&#62;Take a look to inlined functions in ht1632c.cpp.&#60;/p&#62;
&#60;/blockquote&#62;
&#60;p&#62;Okay, I did. There aren't any.&#60;br /&#62;
Did you mean ht1632c.h?&#60;br /&#62;
There are some &#60;code&#62;inline&#60;/code&#62; functions there.&#60;br /&#62;
Normally I'd declare them &#60;code&#62;static inline&#60;/code&#62; otherwise the compiler might need to generate a none-inline function too, just in case it is called from outside the file.&#60;/p&#62;
&#60;p&#62;&#60;code&#62;inline&#60;/code&#62; is *not* the normal default for C/C++, so if it is happening all the time, that is due to the options passed by the IDE. I have not checked recently, but it was certainly &#60;strong&#62;not&#60;/strong&#62; the Arduino IDE default a while ago.&#60;/p&#62;
&#60;p&#62;I prefer &#60;code&#62;inline&#60;/code&#62; to follow the rules of C/C++ by default, then I know what is happening.&#60;br /&#62;
I don't think I'd like inline to be the default on something as small as an UNO where default &#60;code&#62;inline&#60;/code&#62; could cause significant code expansion.&#60;/p&#62;
&#60;p&#62;I haven't checked this for years, but it was the case, that if the compiler was compiling for debugging, the compiler might choose to *not* inline a function, and that decision might be different for &#60;code&#62;static inline&#60;/code&#62; vs none-static &#60;code&#62;inline&#60;/code&#62;.&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;I cannot use bit-banding in my code due to dynamic port/bit usage.&#60;/p&#62;
&#60;/blockquote&#62;
&#60;p&#62;Why?&#60;br /&#62;
1. All of a bit band addresses can be calculated at run time, it doesn't need to be done at compile time. If the compiler can see that the address is used repeatedly, it'll try to do that efficiently.&#60;br /&#62;
2. The SBIT, _set, _clr, and _out macros in ht1632c.h uses compile time string concatenation to build functions which are more specific. So at least the &#60;code&#62;bit&#60;/code&#62; and &#60;code&#62;name&#60;/code&#62; is known at compile time.&#60;br /&#62;
3. The calculation of the bit band address is an ordinary arithmetic expression, so if part of the expression is known at compile time, the compiler can simplify the calculation.&#60;br /&#62;
4. Are you saying that the wiring for the display can be changed at run-time, or are you trying to use a single compiled binary to support different wiring? If not, the wiring to port and pin addresses is known at compile time, and there is no dynamic port/bit information.&#60;/p&#62;
&#60;p&#62;It might be worth looking at the generated code for that SBIT technique. You might find that using a word address, and the bit offset (which must be known at compile time, or the macro string concatenation would not work) might be as fast and with less complexity than the bit-packed struct. If that is the case, then a bit band address might be quicker.&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;pgm_read_byte and pgm_read_word are not so important&#60;/p&#62;
&#60;/blockquote&#62;
&#60;p&#62;Agreed. &#34;pgm_read_*&#34; is just an artifact of a specific 8bit processor with a split address space, which should not be imposed on 'cleaner' single address space processors. I &#60;strong&#62;much&#60;/strong&#62; prefer to ditch that baggage and have simple, direct use of '*', '[]', '-&#38;gt;', variables, structs, class instances, class members, array and struct initialisation, and no constraints on parameter passing, function returns, etc.&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;I defined as inline ((uint8_t)*ptr) and ((uint16_t)*ptr) and that is enough for me...&#60;/p&#62;
&#60;/blockquote&#62;
&#60;p&#62;Good.&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62; but in other Arduino-like platform there is 100% compatibility, so this little details should be defined...&#60;/p&#62;
&#60;/blockquote&#62;
&#60;p&#62;LeafLabs have always been clear that they are &#60;strong&#62;not&#60;/strong&#62; aiming for 100% Arduino compatibility. In the past, they have certainly said they would prefer to use all of the features of an STM32 rather than aim for Arduino compatibility. They have recently announced working towards Wiring compatibility, but again, have not claimed 100% identical is the goal.&#60;/p&#62;
&#60;p&#62;You are more than welcome to add a page to the wiki, and contribute your macros or inline functions to help people port their Arduino code to Maple, and other ARM or single address space processors.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>lonewolf on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8554</link>
			<pubDate>Fri, 09 Mar 2012 15:35:42 +0000</pubDate>
			<dc:creator>lonewolf</dc:creator>
			<guid isPermaLink="false">8554@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;gbulmer, thanks for the long reply :) Many answers to your questions are in the code of the project I linked. Take a look to inlined functions in ht1632c.cpp.&#60;br /&#62;
With Arduino and ChipKIT I've not to specify inline. I think gcc is called by the IDE with different parameters (I've to investigate about it).&#60;br /&#62;
About IO optimization, take a look to _set(), _reset() and _toggle() functions. As you can see, I already use bit manipulation on port registers instead of wiring pinMode and digitalWrite function. In the Arduino version I found a way to further optimized using some lines of assembler. I cannot use bit-banding in my code due to dynamic port/bit usage.&#60;br /&#62;
pgm_read_byte and pgm_read_word are not so important, I defined as inline ((uint8_t)*ptr) and ((uint16_t)*ptr) and that is enough for me... but in other Arduino-like platform there is 100% compatibility, so this little details should be defined...
&#60;/p&#62;</description>
		</item>
		<item>
			<title>gbulmer on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8550</link>
			<pubDate>Fri, 09 Mar 2012 14:41:12 +0000</pubDate>
			<dc:creator>gbulmer</dc:creator>
			<guid isPermaLink="false">8550@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;lonewolf -&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;I noticed that, respect to Arduino and PIC32, the compiled code of Maple is not optimized.&#60;br /&#62;
Is it a &#34;problem&#34; of the IDE (about the way of calling the compiler) or of the compiler itself?
&#60;/p&#62;&#60;/blockquote&#62;
&#60;p&#62;I strongly believe it is &#60;strong&#62;not&#60;/strong&#62; the compiler.&#60;br /&#62;
Arduino uses gcc, so does Maple, and I think ChipKit does too.&#60;/p&#62;
&#60;p&#62;If you dig into your Maple IDE install, you will find arm-none-eabi-gcc, arm-none-eabi-g++, etc. There is some documentation on using the compiler directly from the command line at &#60;a href=&#34;http://leaflabs.com/docs/unix-toolchain.html&#34; rel=&#34;nofollow&#34;&#62;http://leaflabs.com/docs/unix-toolchain.html&#60;/a&#62; which would allow you to do everything and anything.&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;I had to explicitly specify &#34;inline&#34; functions, even &#34;__attribute__((always_inline))&#34; to force inline.
&#60;/p&#62;&#60;/blockquote&#62;
&#60;p&#62;That is the default for C/C++. What functions did you expect to be inlined automatically?&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;Moreover, some datatypes present in Arduino's clones eg. &#34;uint8_t&#34; and &#34;uint16_t&#34; are missing
&#60;/p&#62;&#60;/blockquote&#62;
&#60;p&#62;Maple uses &#60;code&#62;uint8&#60;/code&#62;, &#60;code&#62;uint16&#60;/code&#62;, which, IIRC were used for a while on Arduino.&#60;br /&#62;
You should be able to &#60;code&#62;#include &#38;lt;stdint.h&#38;gt;&#60;/code&#62; to get the ANSI type names, e.g. &#34;uint8_t&#34;.&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;Are also missing some AVR-compatibibility functions like &#34;pgm_read_*&#34;.
&#60;/p&#62;&#60;/blockquote&#62;
&#60;p&#62;Good news here.&#60;br /&#62;
Those &#34;pgm_read_*&#34; are the Atmel AVR mechanism to deal with the details of the AVR processor; RAM and Flash are different address spaces. RAM and Flash need different instruction to access them.&#60;br /&#62;
The ARM is a 'proper' 32bit processor, with a single address space. Pointers can access Flash and RAM without the extra mechanism that the AVR processors need.&#60;br /&#62;
So there is no need for &#34;pgm_read_*&#34;, pointers just work. So it is straightforward to put anything into Flash, and access it using normal C/C++ operations, like dereferencing '*', and indexing '[]'.&#60;/p&#62;
&#60;blockquote&#62;&#60;p&#62;Finally, I have to find a way to better optimize IO
&#60;/p&#62;&#60;/blockquote&#62;
&#60;p&#62;Precisely what is it you are trying to optimise? &#60;/p&#62;
&#60;p&#62;ARM digital I/O can be speeded-up enormously by using 'bit band addressing'.&#60;br /&#62;
On STM32F every pin has two unique addresses, one gives single instruction write to a pin, and the other gives single instruction read of a pin.&#60;br /&#62;
The pinMode of the pin must be set, and only read or write will work at any one time (depending on pinMode INPUT or OUTPUT) but after that the appropriate bitband address gives direct access. &#60;/p&#62;
&#60;p&#62;The STM32F103 in the Maple takes two cycles to execute the instruction, so digital I/O has a theoretical speed of 36MHz. That is &#60;strong&#62;much&#60;/strong&#62; faster than digitalWrite, and the compiler generates the correct instructions without any assembler.&#60;/p&#62;
&#60;p&#62;Please have a search for the threads which discuss this, but if you'd like a bit more help, please post, and we'll try to help.&#60;/p&#62;
&#60;p&#62;(Full disclosure: I am not a member of LeafLabs staff)
&#60;/p&#62;</description>
		</item>
		<item>
			<title>lonewolf on "Sure Electronics 3216 bicolor display"</title>
			<link>http://forums.leaflabs.com/topic.php?id=1441#post-8548</link>
			<pubDate>Fri, 09 Mar 2012 13:02:20 +0000</pubDate>
			<dc:creator>lonewolf</dc:creator>
			<guid isPermaLink="false">8548@http://forums.leaflabs.com/</guid>
			<description>&#60;p&#62;I wrote an Arduino library for Sure Electronics 3216 bicolor led display. I released the library under GPLv3 within a repository on Google Code:&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://code.google.com/p/ht1632c/&#34; rel=&#34;nofollow&#34;&#62;http://code.google.com/p/ht1632c/&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;I ported the library to Leaflab Maple and ChipKIT Uno32.&#60;/p&#62;
&#60;p&#62;Some benchmarks (with two cascaded 3216 displays):&#60;/p&#62;
&#60;p&#62;* Arduino Uno and clones (AVR ATMega328 16Mhz) 135 fps&#60;br /&#62;
* Leaflab Maple (ARM STM32 F103RB 72Mhz) 500 fps&#60;br /&#62;
* ChipKIT Uno32 (Microchip PIC32MX320F128 80Mhz) 1050 fps&#60;/p&#62;
&#60;p&#62;NB: This library is a work in progress! Not extensively tested with all platforms, not fully working with &#38;gt; 2 cascaded displays.&#60;/p&#62;
&#60;p&#62;Some notes specific to Maple platform. I noticed that, respect to Arduino and PIC32, the compiled code of Maple is not optimized.&#60;br /&#62;
Is it a &#34;problem&#34; of the IDE (about the way of calling the compiler) or of the compiler itself? I had to explicitly specify &#34;inline&#34; functions, even &#34;__attribute__((always_inline))&#34; to force inline.&#60;br /&#62;
Moreover, some datatypes present in Arduino's clones eg. &#34;uint8_t&#34; and &#34;uint16_t&#34; are missing. Are also missing some AVR-compatibibility functions like &#34;pgm_read_*&#34;.&#60;br /&#62;
Finally, I have to find a way to better optimize IO (using assembler? any suggestions are welcome!): 500 fps seems to me a poor result compared to 1050 fps of a PIC32.
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
