Silntknight
I'm also curious about direct registry access for pins (like Arduino's DDR & PORT). I know Maple doesn't support direct registry access (http://forums.leaflabs.com/topic.php?id=268) but if I created methods for this type of access, would that increase performance?
poslathian talks about register access on thread http://forums.leaflabs.com/topic.php?id=268
The Maple libraries hide direct register access, but, as long as you are careful you can use direct register access. For example, use pinMode to get the pin in the right mode, then use direct register access to manipulate the pin.
Some of the registers are already named in Maple headers.
Direct access can make a huge difference over the speed of using some of the libraries.
digitalWrite or digitalRead have function calls, then check that the pin number is valid, and convert the Maple pin number to the actual port and pin. This isn't very slow, but is slower than going directly to the pin.
For example:
void digitalWrite(uint8 pin, uint8 val) {
if (pin >= NR_GPIO_PINS) {
return;
}
gpio_write_bit(PIN_MAP[pin].port, PIN_MAP[pin].pin, val);
}
Then
static inline void gpio_write_bit(GPIO_Port *port, uint8 gpio_pin, uint8 val) {
if (val){
port->BSRR = BIT(gpio_pin);
} else {
port->BRR = BIT(gpio_pin);
}
}
Then
#define BIT(shift) (1UL << (shift))
if you know the address of the register, e.g.
#define GPIOA (0x40010800)
#define GPIOA_BSRR (GPIOA+0x10)
#define SPARK_PIN (0b00000000001000000)
GPIOx_BSRR = (0xFFFF0000 | SPARK_PIN); // set pin on
GPIOx_BSRR = (SPARK_PIN << 16) // set pin off;
`
These will each compile to a single instruction run in 1 or 2 cycles.
The difference is even more significant if several pins on the same port are being changed.
(If I remember, I'll run this into an oscilloscope.)
I haven't had a look at the code produced by the compiler for digitalWrite, but this will be significantly faster.
The differences between direct access and analogRead are even bigger because analogRead blocks while the value is being sampled and converted. I guess 90%+ of the time spent inside analogRead is just waiting doing nothing.
As far as writing assembler. I do know folks who are very good at it. But, unless you *really* know what you are doing, don't use assembler.
It is feasible that you will do something that prevents the compiler from optimising the code, so clumsy assembler may make things slower.
Machine code is just numbers, and is exactly the same stuff as the assembler generates. Writing machine code is just a very hard way to write the numeric values that assembler produces for you.
As josheeg says, if you really want to turn something into assembler, write C code, then look at the assembler produced by the compiler.
If the application you've been describing in the other threads is representative of what you want to do, then IMHO you'd get much better value from investigating how to use the on-board peripherals, e.g. timers.
I learned that writing in lower-level code is better for performance, but to what extent is this true? How much of a gain can be expected if everything compared to nothing was written in low-level code.
I assume you mean faster by "better for performance"? It is a good idea to be clear about speed, size or some other performance quality.
A good software engineering rule of thumb is more than 80% of the run time is spent in less than 20% of the code.
So, rewriting the 80% which only takes 20% of the run time is highly unlikely to show a useful improvement, even if it goes 5x faster, the overall run time has gone down to 84%, and was probably not worth the effort.
Further, there is a LOT of evidence that developers are not very succesful at choosing the right part of the program to optimise ahead of writing it. Further, we are not much better without careful measurement, and lots of developers are quite bad at doing those careful measurements.
Anyway. The only parts worth considering are the 20% that dominate. It is often true that using a better algorithm makes more difference than hand-crafting assembler code. For example, choosing a faster sort algorithm instead of a slower algorithm. Even trying different compiler optimisation flags may do the trick.