ventosus - Thank you for the links.
My changes are based on ST's annotationAN4296:
http://www.st.com/st-web-ui/static/active/en/resource/technical/document/application_note/DM00083249.pdf
I had missed that, so thank you very much.
"Have you done any experiments to measure the effect?"
No real benchmarking, but I've put interrupts and critical routines into CCM in one of my projects for testing and got an immediate speed-up in the range of 10% to 20%.
10%-20% is pretty impressive for a chip which is already pretty quick .
"Is there a slow-down if both code and data reside in core coupled memory (CCM)?"
This I don't know. Would you expect this?
Well, I have looked through the STM32F303 datasheet, and all it says about the CCM is
"2.3 Embedded SRAM
The STM32F30xx features up to 48 Kbytes of static SRAM. It can be accessed as bytes, halfwords (16 bits) or full words (32 bits):
● ...
● 8 Kbytes of CCM RAM. ... This memory can be addressed at maximum system clock frequency without wait state."
So it looks fine. However there isn't anything more concrete.
Also, I vaguely remember there being a buffer between the CPU and SRAM, which 'pipelined' the 2nd and subsequent write, but was complex enough that it might slow the first write, or interact with reads. I haven't found this mentioned any where for Cortex-M4 so I may be thinking Cortex-M3.
Summary: I am happy to believe it does everything perfectly, no matter what mix of instructions and data are used, but I am also willing to believe that there are interactions between reading instructions and writing data.
However, 10-20% is enough to convince me it's worth it anyway :-)