There's libmaple support for this via HardwareSPI::beginSlave()
As there are two SPI on Maple (and three SPI on RET6 and Maple Native), it should be practical to generate or receive two (or three) pins-worth of input/output using two (or three) SPI interfaces run as slaves, synchronised to the same timer output, at upto 18MHz.
I'd start with the table technique mentioned here
After all, if there is only 16224 bits, so you could generate an initialised data table/array (in flash) long enough (2*16224*sizeof(uint32)) to be written to the port, and wiggle the pins for the entire 16224 bit data exchange.
You could even generate a very long function, made of 2*16244 (GPIOB_BASE)->BSRR = ...;
(and 'noops' to get the timing right)
I believe the data table based technique should be capable of going faster than 1.6MHz, so it might need 'noops' to slow down to the correct speed.
(full disclosure: I am not a member of LeafLabs staff.)