LeafLabs Garden » Topic: Computation with floats

LeafLabs Garden » Topic: Computation with floats http://forums.leaflabs.com/topic.php?id=852 A place to share, learn, and grow... en-US Fri, 22 Jan 2016 00:12:52 +0000 http://bbpress.org/?v=1.0.2 <![CDATA[Search]]> q http://forums.leaflabs.com/search.php gbulmer on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852&page=2#post-5370 Tue, 21 Jun 2011 17:49:13 +0000 gbulmer 5370@http://forums.leaflabs.com/ cromda - good test. I can't explain 10x. I am willing to believe 2x difference, maybe even 3x, for similar code on PIC32 vs STM32F, but 10x has me flummoxed. For a difference that big, I assume the algorithms are different. I did read that the PIC32 UNO tool chain contains a proprietary part 'for performance reasons' but I have no idea what that may mean. Maybe I need to get one of those PIC32 UNO's. cromda on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5368 Tue, 21 Jun 2011 16:25:48 +0000 cromda 5368@http://forums.leaflabs.com/ Hi crenn I've a Sparkfun Razor 9DOF (old version) wich work fine with the Python viewer. But the SF9DOF_AHRS_1_0 soft does'nt complie on UNO32 (The bottom window if full of red garbage). The code does'nt compile too on Maple32 (white garbage). What to do with the oscilloscope ? crenn on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5361 Mon, 20 Jun 2011 16:10:57 +0000 crenn 5361@http://forums.leaflabs.com/ cromda, do you have access to an oscilloscope and a sparkfun Razer 9DOF? It could be using a look up table to implement those functions, but I'm now curious as to what happens when you run the Razer's IMU code on the Uno32. cromda on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5359 Mon, 20 Jun 2011 15:28:34 +0000 cromda 5359@http://forums.leaflabs.com/ gbulmer, In order to check your theory, I modified the code runing on the UNO32 in such a way that the computed value is used and should not be "optimised out" : /* Vitesse de calcul flottant */ float y=0; float tv=0; float tb=0; float a=0; float p=0; float pi=3.14159265359; int N=10000; unsigned long t1=0; unsigned long t2=0; void setup() { Serial.begin(9600); } void loop() { p=100*2*pi/N; // time for base loop a=-p; t1=micros(); for (int i=0; i <= N; i++) { a=a+p; y=a; } t1=micros()-t1; tv=float(t1)/float(N+1); // time for loop with function to check a=-p; t2=micros(); for (int i=0; i <= N; i++) { a=a+p; y=tan(a); // function to check } t2=micros()-t2; tb=float(t2)/float(N+1); // print the results to the serial monitor: Serial.print("empty loop = " ); Serial.print(tv); Serial.print(" us"); Serial.print("\t computation = "); Serial.print(tb-tv); Serial.print(" us"); Serial.print("\t last x = "); Serial.print(a); Serial.print("\t last function value = "); Serial.print(y); Serial.println(); } The average time to compute a tangent is 5.58 µs (a little smaller than before ; 6.59 µs) : the quick and dirty code I first used before is not "optimised out" by UNO32. (in addition if some code was "optimised ou", as crenn already mentioned, the computation time should not increase with complexity + - / sqrt sin tan) So I think that UNO32 is definitly realy up to 10 time faster than Maple 32 with floats computations. May be that UNO32 code uses pre-computed tables and interpolations instead of iterative routines ? It would be interesting to compare UNO32 and Maple32 accuracy when computing with floats. gbulmer on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5331 Fri, 17 Jun 2011 18:18:49 +0000 gbulmer 5331@http://forums.leaflabs.com/ cromda - The code doesn't use any results from float(), sqrt(), or tan(), so if the compiler were capable of spotting those functions have no side effect, it could eliminate them. I had a quick look at gcc 4.5 release notes and, as zoofdxp wrote, it talks about some improvements. Several may matter, for example: "A new link-time optimizer has been added (-flto). When this option is used, GCC generates a bytecode representation of each input file and writes it to special ELF sections in each object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit. This enables interprocedural optimizations to work across different files (and even different languages), potentially improving the performance of the generated code." If PIC32 uses ELF object files, this might be enough for gcc to spot that some of those functions have no side effect and can be eliminated, which might account for the significant (~10x) improvement shown on the UNO32. Maybe worth investigating. poslathian on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5274 Thu, 16 Jun 2011 16:48:59 +0000 poslathian 5274@http://forums.leaflabs.com/ An eventual flavor of Maple in the pipeline - not anytime soon but in the pipeline - will be armed with a Cortex M4 - with a floating point unit. cromda on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5271 Thu, 16 Jun 2011 16:08:28 +0000 cromda 5271@http://forums.leaflabs.com/ crenn : yes the loop runs 10001 intead of 10000 times. The computation time increase with complexity of the function tested (+ - * / sqrt, sin tan), is it compatible with optimising out ? Snigelen : I test with sqrtf, sinf and tanf instead of sqrt, sin and tan and the times are a little smaller : 8,14 µs, 32,89 *s and 57,93 µs instead of 25,43 µs, 55,08 µs and 92,48 µs. Its better, but UNO32 remain faster. robodude666 on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5180 Mon, 13 Jun 2011 21:00:06 +0000 robodude666 5180@http://forums.leaflabs.com/ I suppose if you need high-precision, it's worth the extra time. It might be worth looking at the <a href="http://en.wikipedia.org/wiki/Texas_Instruments_TMS320">TI TMS320</a> then. For me, I'd be happy to loose 2.3% accuracy on a sin calculation if it means 4x faster performance. crenn on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5176 Mon, 13 Jun 2011 20:24:48 +0000 crenn 5176@http://forums.leaflabs.com/ cromda, just a note, your code is running 10001 passes, not 10000 as you intended. I'm also wondering if the PIC compiler is optimising out the tan calculations as the data is never used. snigelen, good point. I'm not sure about the PIC, but I know that 'doubles' on the Arduino have been recast as floats. This probably has been done on the PIC as well to make it 'directly' compatible. robodude666, With fixed point maths, it's possible to lose some of the precision in calculations. It depends on your application on whether it's better to use fixed point or floating point maths. robodude666 on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5166 Mon, 13 Jun 2011 16:42:19 +0000 robodude666 5166@http://forums.leaflabs.com/ Keep in mind that the STM32 is running at 72MHz @ 1.25 DMIPS/MHz. The "UNO32" uses a PIC32MX320F128H which runs at 80MHz @ 1.56 DMIPS/MHz. While these are only theoretical numbers, the difference between them is quite large. Regardless, neither of these microcontrollers has a fully dedicated floating-point unit, so why bother using floating-point calculations? I've experienced myself what kind of performance fixed-point mathematics can yield and don't see any reason to use floating-point math. snigelen on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5159 Mon, 13 Jun 2011 14:15:41 +0000 snigelen 5159@http://forums.leaflabs.com/ sin, tan, sqrt etc do the computation with doubles. sinf, tanf, sqrtf etc. is for float. They will probably be a little faster. cromda on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5158 Mon, 13 Jun 2011 13:27:05 +0000 cromda 5158@http://forums.leaflabs.com/ here is the (quick and dirty) code I wrote to measure computation speed : /* computation time r. cormier 2011-06-11 */ float x=0; float y=0; float z=0; float tv=0; float tb=0; int N=10000; uint32 t1=0; uint32 t2=0; void setup() { } void loop() { // time for base loop x=0; t1=micros(); for (int i=0; i <= N; i++) { x=float(i); y=sqrt(x); z=x; } t1=micros()-t1; tv=float(t1)/float(N); // time for tan x=0; t2=micros(); for (int i=0; i <= N; i++) { x=float(i); y=sqrt(x); z=tan(y); } t2=micros()-t2; tb=float(t2)/float(N)-tv; // print the results to the serial monitor: SerialUSB.print("t base loop : " ); SerialUSB.print(tv); SerialUSB.print(" us" ); SerialUSB.print("\t tan computation : "); SerialUSB.print(tb); SerialUSB.print(" us" ); SerialUSB.println(); } _______________________________________ I used that code to check the Mapple computation with double instead of float : the computation times with doubles are a little smaller. So I think that there is only double computation with Mapple. When performing float computation, floats are converted to double and the result is converted from double to float. This is the reason why float computation is a little slower that double computation and why computation speed is much smaller with Maple32 than with UNO32. When there Mapple will have a dedicated float library, float computation will compare good with UNO32. crenn on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5151 Mon, 13 Jun 2011 11:26:32 +0000 crenn 5151@http://forums.leaflabs.com/ Can we see the code used for testing the 3 platforms? zoofdxp on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5147 Sun, 12 Jun 2011 19:54:30 +0000 zoofdxp 5147@http://forums.leaflabs.com/ The maple ide 0.0.11 package is using codesourcery's 2010q1 compiler. There have been two releases/updates, the latest is 2011.03-42. In the interest of code stability I can understand not mucking with the compiler and introduce possible errors..but I was wondering if anyone has tried the latest compiler to see if it works well and possibly provides more optimized code. Their site states "Significant optimization improvements in GCC"...these optimizations could just be compile performance but maybe there are processor performance improvements as well. Microchips FP library must be incredibly hand tuned, the disparity in performance is far beyond the 72Mhz to 80Mhz clock speeds, either that or MIPS architecture as they have implemented compared to ST's ARM is embarrasingly superior. robodude666 on "Computation with floats" http://forums.leaflabs.com/topic.php?id=852#post-5146 Sun, 12 Jun 2011 13:25:19 +0000 robodude666 5146@http://forums.leaflabs.com/ Unlikely =]. The STM32 used in the Maple, and just about every other MCU beyond DSP-specific chips like TI's TMS320, do not have floating-point units on them. Doing floating point calculations is like grinding teeth. If you're using the built-in functions via the math.h header you'll get rather poor results, as you've seen for yourself. However, the STM32F103xxx does have a 72MHz clock as you mentioned and regular fixed-point mathematics is blazing fast! If you will be doing a lot of computation, try out the <a href="http://code.google.com/p/libfixmath/">libfixmath</a> library. It provides fix-point variants to trig, sqrt, exp, etc. as well as basic operations such as addition, subtraction, multiplications and addition. I've seen an average of 2-4x increase in performance from switching to this library vs the "built in math.h". It takes up very little FLASH/RAM and is rather simple to use once you get it setup. -robodude666