High-Speed GPIO Access? « LeafLabs Garden

LeafLabs Garden » Support » Maple IDE Support

High-Speed GPIO Access?

(39 posts) (9 voices)

Started 4 years ago by robodude666
Latest reply from hans leuthold

« Previous 123 Next »

gbulmer
Moderator

I spoke to one of my professors yesterday about this, and he had an interesting idea. Instead of using the DMA controller with 8-bit data, use it with 16-bit data. The lower 8-bit would contain data, while the next couple bits would contain control bits. This would bloat the amount of DMA writes needed by 2x, but it may still be faster than using CPU time.

I had the same thought, but I decided that because you might not have enough RAM in a Maple to store the whole image (and it has just got 2x worse because the control signal needs to be stored), the processor would probably need to build up a "scan line" for every displayed line. If that is the case, the instructions to interlace the data with the control would probably run slower than just picking up the data, and writing it to the GPIO.

But, using DMA with the timer generating control signals would reduce the amount of RAM needed (good) and avoid using the processor to interleave data and control (good).

One other thought I had was using an external chunk of SRAM (~4Mbit) that could act as an external buffer for the LCD ...

Yes, you could do that.
This is exactly like an old fashioned PC with a 'random access port' for updating the screen RAM, and 'serial port' for generating the display.

This is the sort of job Oak should be able to do, with the FPGA handling all of the external logic. But unless LeafLabs have one ready, I guess that isn't an option.

I still think software should be able to put up a few million pixels/second on that display in a 'benchmark', so I don't understand what is so slow. My outline code is 12 clock cycles for a 16-bit pixel+some house keeping, so triple that, and its still only 36 cycles/pixel, which should yield 2 million pixels/second.

I assume the display is storing the unchanged parts, so what are you trying to do that needs so much performance? For example, if it is video data can't you find a way to stream it directly to the LCD, and miss out the processor?

What is it you are trying to do?

Posted 4 years ago #
robodude666
Member

2 megapixels/second is right around 26 FPS, which is significantly less than a 60 Hz refresh rate... but we'll see. I'll stop doing benchmarks and worrying about performance and work on an application. Maybe the improvements made previously, coupled with the STM32's higher clock speed will workout.

My current project is... not sure, honestly. I started working on this LCD stuff 2-3 years ago. Since then my goals/requirements have changed so much I don't remember what the real project is. Currently it's just to get a high-performance hardware platform setup. Although I'm getting my degree in Electronics Engineering, and I love messing around with hardware, I much prefer writing software. With a stable high-performance platform to work with, I can experiment with software and see what I could make. I won't be beating Apple, but it'll still be interesting.

-robodude666

Posted 4 years ago #
trevden
Member

It's definitely easy to get caught up in eking performance out of something, and it can be great fun and educational at the same time.

But -- if you're looking for an affordable, high-performance embedded platform to drive LCDs at high refresh rates and use to play with software, erm.......Beagleboard?

Posted 4 years ago #
robodude666
Member

I have a BeagleBoard, but I'm not wrapping my head around the idea you have to read/write to a file to access hardware features and everything living inside of a true OS. While the BeagleBoard's ARM Cortex-A8 processor is a huge step up in terms of performance, it's more than I desire. Ultimately, whenever I "finish" this project I'd like to make my own PCB and smexy case and depending on the BeagleBoard as my main board would make that much more difficult. The STM32 requires relatively little support circuitry and would make a design much easier; I mean, just look at the Mini. That thing's tiny.

I'm going to port over the AD7843 TouchScreen code. I'm hoping the SPI -> SRAM DMA functionality will make things easier. The newly published SDFat library is also something worth playing around with.

Posted 4 years ago #
gbulmer
Moderator

robodude666 -

I'm hoping the SPI -> SRAM DMA functionality will make things easier.

Are you thinking of using SPI to write to a RAM block which updates the LCD? I thought you were considering using the FSMC on the higher density, higher pin-count STM32F103.
The SPI is only 18MHz max, and if I have understood your architecture, there will be a write and a read from the static RAM, which may reasonably be expected to 1/2 that.

I still do not understand how direct LCD update can be less than 2M+pixels/second from Maple (it is only 12 CPU cycles to write and toggle 16 bits in software), so my feeling is, SPI+memory won't beat that anyway.

Posted 4 years ago #
robodude666
Member

No. The ADS7843 is a SPI-based Touch Screen Controller that my LCD breakout board uses to read the Touch Screen. I was considering using the DMA functionality to store SPI samples into SRAM for processing, but decided against it later as I realized performance was good enough already.

For the LCD itself, I would ultimately like FSMC available in the Maple Native, but it will have to wait. I'm going to use my existing 8-bit parallel setup and see what the most I can do with it is. If it's not enough, I'll try to hunt down a board that gives me full 16-bit access the ILI9325 LCD controller can handle.

-robodude666

Posted 4 years ago #
gbulmer
Moderator

robodude666 - okay, I think I understand.

If you can't get the performance you need try posting code (at gist.github), and someone might be able to help.

I'll try to hunt down a board that gives me full 16-bit access the ILI9325 LCD controller can handle

Irritatingly, ST Micro stuck a BOOT pin in the middle of port B, which would otherwise be ideal for the job. Of course, an STM32F with more pins may have an entire 16-bit port free.

Posted 4 years ago #
poslathian
LeafLabs

A lot of LCD/OLED screens have onboard dram in their controller circuitry. This is super handy. You can fast blit just parts of the screen (for sprite style graphics) to get much faster refresh times than rendering all the pixels in every shot. Try one of these devices out perhaps?

In terms of GPIO banging, I myself have never gotten it to go above 18MHz. A long time a go there was a guy who hung out around here, randomskk who claimed to have gotten 36MHz out of it, but he didnt say how and I couldnt figure it out!

I am not sure why the GPIO modes (50MHz etc) are rated that way since the micro cant bang them fast enough anyway. I think gbulmer solved this though, where the modes correspond to the rise times for the square waves.

Posted 4 years ago #
robodude666
Member

I already write directly to the LCD's GRAM, which does yield good performance when sending data from local RAM or a single color to a large portion of the display. The problem comes when drawing a non-straight line as each pixel requires its own GRAM configuration. The ILI9325, and similar controllers, don't have onboard commands for things like lines, and shapes so those pixels must be drawn manually.

A local buffer increases performance as the line can be written into local RAM, then the entire block can be dumped to the LCD with only 1 GRAM configuration. In order for this to be possible, there must be at least 76k RAM available (twice would be better for double buffering). This would require, at minimum, a board like the Maple Native with its onboard RAM and the FSMC interface for LCD control. At best, the Maple r5 cannot buffer more than ~10k pixels (each pixel is 16-bit) which would eat up the entire 20KB RAM. This is a mere fraction (<1/7th) of the 320x240 LCD screen. Though yes, for small bitmaps and icons it works wonderfully.

As for the GPIO output, I reviewed the ST documentation and they themselves say the GPIO cannot be toggled any faster than 18MHz. It is possible to get 36MHz, but that's for a single access. If you do two accesses back-to-back, the requests get buffered and are delayed by another instruction i.e.

-robodude666

Posted 4 years ago #
poslathian
LeafLabs

good to know about single versus double transactions

Posted 4 years ago #
gbulmer
Moderator

I reviewed the ST documentation and they themselves say the GPIO cannot be toggled any faster than 18MHz. It is possible to get 36MHz, but that's for a single access. If you do two accesses back-to-back, the requests get buffered and are delayed by another instruction i.e.

I think it is important that the speed quoted is for changing the pin state twice because the instruction timing might mislead folks to thinking it is 2 cycles for the first write, then 1 for subsequent writes, which isn't the case. I read toggling as two state changes.

Oak would be a way to drive this LCD at maximum speed, assuming Oak's FPGA could completely take over all the address, data and control pins. I would expect an FPGA to be significantly faster than 10Mpixels/second for things like drawing lines, and for 'bitblit' with fast memory.

Posted 4 years ago #
SDL
Member

Hi robodude666

I don't know is it useful for you, but you may have a look and the stm's appnote:

QVGA TFT-LCD direct drive
using the STM32F10xx FSMC peripheral

http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/APPLICATION_NOTE/CD00278141.pdf

They're using 16 bit line though and they get 40 fps for the static image but their lcd does not have an onboard memory.

Posted 4 years ago #
hans leuthold
Member

I ran a test to compare digitalWrite(bitX) with (GPIOy_BASE)->BSRR = BIT(bitX). Using this more direct approach saves 0.3us or about 21 cpu clock cycles

One question though - does anybody have some hints how to reduce the impact that the "Arduino/Maple OS" has on the execution time of loop()? My little test shows that loop times can vary by as much as 3.9us

thanks
Hans

Posted 3 years ago #
gbulmer
Moderator

hans leuthold - how are you measuring the loop time variability?

There should be no variability in loop time because the instructions are constant and a fixed duration, but there are interrupts happening 'in the background' which may be measured as variability.

Specifically, there is a clock interrupt which provides the underlying event for delay() and millis(), and there is also the USB interrupt. Both of these will increase the apparent time of some iterations of loop().

(Full disclosure: I am not a member of LeafLabs staff.)

Posted 3 years ago #
hans leuthold
Member

hi gbulmer,
I am scoping out a piece of code which should (once everything is ironed out) eliminate 2 stationary signals out of a bitstream, like a comb filter. The bitstream is clocked. The input clock SCK should in the end be around 500kHz.

To test this out, a 80kHz digital signal is on pin 31, the main loop waits for the rising edge, executes some instructions, turns on the LED, then waits for the falling edge, executes some more instructions, turns off the LED, and exits.

'
#define ONEREV 31U
#define SCK_PIN 31
#define BS_PIN 30

#define OCK_PIN_HIGH (GPIOA_BASE)->BSRR = BIT(5) // simulate using LED
#define OCK_PIN_LOW (GPIOA_BASE)->BRR = BIT(5)

int clockState = LOW;
uint16 erotable[ONEREV+1];
uint16 eroindex = 0;
const uint16 onerev = ONEREV;

void setup() // does all the pin initializations and some prep work on tables

void loop() {
while((clockState = digitalRead(SCK_PIN)) == LOW);
eroindex++;
eroindex&=onerev;
// the real thing will have some more code here..
OCK_PIN_LOW;

while((clockState = digitalRead(SCK_PIN)) == HIGH);
if ((bsState = digitalRead(BS_PIN)) == HIGH) {
if (erotable[eroindex] < MAXVAL)
erotable[eroindex]++;
}
else { // bistream is LOW
if (erotable[eroindex] > 0)
erotable[eroindex]--;
}
// the real thing will have some more code here..
OCK_PIN_HIGH;
}`

I measure the timing with a scope that is triggered by the incoming clock signal. What I observed is that the time between a state change of the clock and the corresponding LED change is typically 0.6us for the rising edge and 1.5us for the falling edge. Either of these times may increase by as much as 3.9us sporadically.

What I am trying to figure is whether there is a routine that would allow me to turn the interrupts you mention off.

thanks
Hans

Posted 3 years ago #

RSS feed for this topic

« Previous 123 Next »

Reply »

You must log in to post.