FSMC + DMA? « LeafLabs Garden

LeafLabs Garden » Support » Maple IDE Support

FSMC + DMA?

(8 posts) (3 voices)

Started 2 years ago by robodude666
Latest reply from robodude666

robodude666
Member

Howdy,

Finally decided to try out the Maple Native's FSMC test example, and am pretty impressed by the performance... Write access is only at 6.49 MHz, but that's for writing 16-bit words which comes out to 12.37 MBps (512K x 16bit values in 80,758 microseconds); not too bad. Reading the entire 512K locations, one value at a time takes 255ms.... which is pretty slow.

So, I was wondering can you take advantage of the DMA channels to read/write to/from the FSMC controller? I read through the RM0008 reference manual, and didn't see FSMC listed as a possible DMA option.

Unrelated - kind of - is it possible to use a DMA channel to transfer data two different areas of a single FSMC bank without first buffering the data on the STM32's SRAM. And, is it also possible to transfer data directly between FSMC and a parallel bus and FSMC and an SPI bus?

Thanks!,
-robodude666

Posted 2 years ago #
gbulmer
Moderator

robodude666 - I don't have a Maple Native, so I have never tried accessing the FSMC with DMA.

However, the memory on the STM32's FSMC is memory from the processors perspective. The FSMC's job is to make the external chips behave like memory from the processors perspective. That memory has an address and the FSMC has a 32-bit wide buffer to make read and write transactions look like normal memory accesses to the processor. So I would expect memory on the DSMC to look like memory to the DMA controller too.

Looking at R0008 ...

section 13.2 "DMA main features" describes the Direct memory access (DMA) features:
● Memory-to-memory transfer
● Peripheral-to-memory and memory-to-peripheral, and peripheral-to-peripheral transfers
● Access to Flash, SRAM, APB1, APB2 and AHB peripherals as source and destination

The memory on the FSMC is accessed across the AHB

Further, in section 21.3 "AHB interface" it describes the errors that the FSMC can raise, and it says:
The effect of this AHB error depends on the AHB master which has attempted the R/W access:
● If it is the Cortex™-M3 CPU, a hard fault interrupt is generated
● If is a DMA, a DMA transfer error is generated and the corresponding DMA channel is automatically disabled.

I interpret that as saying the FSMC makes external memory behave the same for both the processor and DMA controllers.

Hence, I think FSMC memory can be used in the same way as STM32's internal RAM. I think you could test this. Set up a DMA transfer between memory blocks or to/from a peripheral using internal RAM, then run the same transfer but to memory in the FSMC address range.

FSMC-attached memory is only 8 or 16 bits wide. The FSMC peripheral manual does seem to suggest that access to the AHB bus might be blocked while a 32 bit transfer is assembled from multiple FSMC-external-memory transfers. You might want to avoid that by doing one or two byte transfers.

Posted 2 years ago #

robodude666
Member

gbulmer,

Excellent point! I forgot completely that FSMC-accessed RAM is designed to appear as a regular memory address. In which case, you should be able to do a regular memory-to-memory transfer and the STM32 will figure out everything needed to get that done.

I'll try it out, and report back!

EDIT: Oh brother! I forgot libmaple transitioned over to "tubes" while I was dealing with life. Time to read some source code comments! This might take longer than I thought.

EDIT2:

While trying to figure out how to use the new DMA tubes I ran across this in the libmaple source code:

enum dma_atype _dma_addr_type(__io void *addr) {
    switch (stm32_block_purpose((void*)addr)) {
    /* Notice we're treating the code block as memory here.  That's
     * correct for addresses in Flash and in [0x0, 0x7FFFFFF]
     * (provided that those addresses are aliased to Flash, SRAM, or
     * FSMC, depending on BOOT[01] and possibly SYSCFG_MEMRMP). It's
     * not correct for other addresses in the code block, but those
     * will (hopefully) just fail-fast with transfer or bus errors. If
     * lots of people get confused, it might be worth being more
     * careful here. */
    case STM32_BLOCK_CODE:      /* Fall through */
    case STM32_BLOCK_SRAM:      /* ... */
    case STM32_BLOCK_FSMC_1_2:  /* ... */
    case STM32_BLOCK_FSMC_3_4:
        return DMA_ATYPE_MEM;
    case STM32_BLOCK_PERIPH:
        return DMA_ATYPE_PER;
    case STM32_BLOCK_FSMC_REG:        /* Fall through */
        /* Is this right? I can't think of a reason to DMA into or out
         * of the FSMC registers. [mbolivar]  */
    case STM32_BLOCK_UNUSED:          /* ... */
    case STM32_BLOCK_CORTEX_INTERNAL: /* ... */
        return DMA_ATYPE_OTHER;
    default:
        ASSERT(0);              /* Can't happen */
        return DMA_ATYPE_OTHER;
    }
}

SRAM and FSMC Banks 1 - 4 are all treated as a memory transfer. Looking good towards being able to do a simple memory-to-memory transfer so far.

Thanks,
-robodude666

Posted 2 years ago #

robodude666
Member

Yup.

I can confirm that you're able to use the FSMC address as your source/destination in a DMA transfer.

I was able to transfer 1,920 16-bit wide values from FSMC into a buffer in 485 microseconds. That's a read rate of 3.95 MHz (per 16bit value).
Without the DMA transfer, reading the address manually into your buffer (loop not unrolled; just one at a time as worst case) takes 1070 microseconds, which is a read rate of 1.79 MHz. That's a 220% performance gain :).

For speed reference.... Doing the same 1,920 16-bit value transfer between two SRAM (on the STM32) buffers takes 670 uS without DMA and 165 uS with DMA.

FSMC memory access via the DMA controller certainly gains you some performance, but it's not nearly as fast as the onboard memory. The larger your transfer size is, the faster it'll get transfered but up to a point. A FSMC to SRAM DMA transfer of 240 16-bit values took 65 uS. 1920 is 8x the size and only took 485 instead of the expected 520. Doing 3072 transfers took 770 uS. With a buffer that's 1.6x larger we'd expect a transfer time of (485*1.6)=776... Not much of a savings. So it's clearly not worth it to allocate a giant buffer on the STM32 as there's very little performance gain after some point. I'll stick to 1920 for my application, after I finish with these benchmarks :).

I'm going to look into write speeds next. Followed by FSMC to FSMC transfers. But first, coffee.

EDIT:

The results are in, and they're quite surprising. Manually writing 1920 16-bit values (not unrolled) to FSMC took 830 uS, or a write rate of 2.3 MHz. That's quite a bit slower than the 16x unrolled loop the FSMC test demo uses, however... Doing it with DMA took on 325 uS or a write rate of 5.9 MHz. That's just a little slower than the unrolled loop... However the unrolled loop demo just wrote garbage values that weren't very useful. If you also unroll your manually write loop it also goes down to 325 uS to manually write 16 values at a time in a loop.

EDIT2:

Interesting! I went back and edited the STM32 SRAM to FSMC benchmark program and unrolled the manual read loop to 16x. Performance is actually FASTER than DMA for my 1920 values. Comes out to 445 (unrolled) vs 485 (DMA) uS.

But, there's something to keep in mind. While manually doing the transfers you're taking away CPU time from whatever your application is. DMA transfers happen behind the scenes, and supposedly have only a 1% CPU usage. So, in that 480 microseconds you're waiting for the DMA transfer to read/write to FSMC, you can be processing user input, filtering analog data, etc. Which still makes it worthwhile to implement if performance is a high priority.

For my particular application, I will be using the external SRAM to buffer frames for my LCD and then use DMA transfers to get data from the external SRAM, to an internal buffer, and dump it to GPIO. I'm considering possibly going with FSMC -> SPI directly instead of getting a parallel LCD.

Some other cool things you could do is maybe have FLASH/EEROM chip and use a DMA transfer to load data into SRAM on bootup :)! Or, if we ever get SDIO working buffer stuff off of an SD card.

Next I'm going to look into transferring data between parts of external FSMC-controlled memory. And then publish everything on gist/github.

-robodude666

Posted 2 years ago #
robodude666
Member
More data!

FSMC to FSMC DMA transfers are also possible (what a surprise?).

Same 1920 x 16bit transfer took 640 uS with a manual 16x unrolled loop. That's a transfer rate of 3.0 MHz.
With a DMA controller it comes to about 620 uS, or a transfer rate of 3.09 MHz.

This is a setup of reading from one FSMC address and writing to another. This effectively exercises both reading and writing to FSMC-controlled external memory. Based on the analysis above, reading is clearly slower than writing which explains why this type of a transfer is just slightly slower than FSMC to SRAM buffer read (which came in at 3.95 MHz).

I currently don't have any SPI devices I could use to try FSMC to SPI transfers..

So, to summarize so far, with a 16-bit wide payload:
```
SRAM to SRAM = 11.63 MHz
SRAM to FSMC = 5.9 MHz
FSMC to SRAM = 3.95 MHz
FSMC to FSMC = 3.0 MHz

Keywords:
SRAM = onboard SRAM
FSMC = external SRAM
```
To convert to Mbps, multiple transfer rate by payload width.

-robodude666
Posted 2 years ago #
mlundinse
Member

Where is FLASH to SRAM in that comparison?

Posted 2 years ago #
mlundinse
Member

Also timings of reading one 32 bit word into a core register

Posted 2 years ago #
robodude666
Member

Last I remember, accessing onboard FLASH isn't that slow. If you mean external SPI flash, I don't have any working chips anymore. Burnt the last one I had.

What do you mean by "reading one 32 bit word into a core register" ?

FSMC supports up to 16-bit wide SRAM modules. So that would just be the time it takes to read two addresses (manually, since DMA won't give you at benefits here), anding them together and writing a register.

-robodude666

Posted 2 years ago #

RSS feed for this topic

Reply

You must log in to post.