Are you using a newer Arduino UNO, or one of the slightly older Arduino's which use the FTDI USB to USART chip?
1. The latency to send to an Arduino should be about a couple of milliseconds, unless:
a. the host is opening the connection everytime (sloooooow, so I don't think that's what is happening), or
b. only a few bytes are written and the host is not flushing the buffer.
AFAIK, the FTDI USB chips buffer has 16 bytes, and the host flushes when it is only part full periodically, but not every millisecond. I think, if you write 16 or more bytes it should be sent with less latency (try 64, 32, 16). Also, if you use Windows, there is an FTDI library available on Windows has extra API calls to control buffer flushing, but you'd have to change your host application.
2. IMHO If your host side program *must* open the connection each time it sends data, it won't much matter much what is on the other end as the host operating system will waste lots of the time.
3. USB latency, once you get down to the 'bare metal' is supposed to be a couple of milli-seconds (polled every millisecond) even for the slowest, 1.2Mbps USB, but it is a protocol polled by the host, so that determines a lot of the latency, not the device end.
4. USB on the Maples STM32F is built-in. There is no intervening USB to serial converter, it's all on chip. STM32F USB is full-speed (12Mbs), which is theoretically the same as the old Arduino's FTDI, and the new UNO. The STM32F USB (like most USB interfaces) has its own memory buffer, and so the effective latency experienced by your program depends (somewhat) on how that USB receiver and buffer are configured.
The actual USB throughput for the STM32F is significantly higher than the old Arduino FTDI chip, but I don't know if that is at the cost of latency. I have no measurements for latency.
(I am not an employee of Leaf Labs)