LeafLabs Garden » Topic: Speed of If statements

LeafLabs Garden » Topic: Speed of If statements http://forums.leaflabs.com/topic.php?id=12887 A place to share, learn, and grow... en-US Fri, 22 Jan 2016 00:17:35 +0000 http://bbpress.org/?v=1.0.2 <![CDATA[Search]]> q http://forums.leaflabs.com/search.php ala42 on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887&page=2#post-27921 Thu, 18 Jul 2013 18:10:05 +0000 ala42 27921@http://forums.leaflabs.com/ Just use a mask that fits the width of the variables you use. pyrohaz on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887&page=2#post-27920 Thu, 18 Jul 2013 17:53:58 +0000 pyrohaz 27920@http://forums.leaflabs.com/ Cheers for the reply Magnus, this certainly looks promising! I'll be sure to implement it. Since all of my samples are 16 bit, I just replace 0xFFFFFFFF with 0xFFFF right? Cheers, mlundinse on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887&page=2#post-27862 Wed, 17 Jul 2013 02:00:56 +0000 mlundinse 27862@http://forums.leaflabs.com/ The bitcrush function seems to set the lower BitDepth number of bits in Sample to 0. The same effect can be done with a bitwise and. The bitmask to use is precomputed in the main loop. In loop: <pre><code>if (BitcrushOn == 1) BitCrushMask = (0xFFFFFFFF<<BitDepth); else BitCrushMask = 0xFFFFFFFF;</code></pre> So if the Bitcrush function is off then the mask will be all ones; In the interrupt: <pre><code>Sample = Sample & BitCrushMask;</code></pre> A single bitwise and operation with no. I haven't tested this but it should work :) Regards Magnus pyrohaz on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887&page=2#post-27853 Tue, 16 Jul 2013 17:48:24 +0000 pyrohaz 27853@http://forums.leaflabs.com/ Right, I've had a complete rewrite of the code today, it is much more cleaner now and the non interrupt section takes about 60us now, which includes reading multiple inputs (one of these using the 4051 mux chip), along with writing to a few external shift registers. This has 'laxed the requirements a bit as I can consume a bit more processor time in the interrupt than originally. I've taken your advice for using correct data types and I completely agree, reading the code is much easier to understand which variables are boolean etc. One thing to ask about functions within the maple IDE is the use of tabs. I've been doing my functions in a seperate tab so I can keep a hold on every part of the code with ease. One thing i've found though is declaring a variable in one of these tabs is not the same as declaring in the main tab (the one named after the program), this is no problem as I just write the variables used in the tab, inside of the tab and global variables in the main tab. Does this mean that the compiler will react different to variables declared globally and variables declared in these tabs? I really think that i'm going to have to make my code more efficient, it seems that I can now see that an if statement isn't a MASSIVE consumption of interrupt time (I do after all only have 14 or so and 2 or three of them are for output clipping, I could probably do this another way? Unfortunately, the if statements are decided in runtime so I don't think that i'm helping the compiler there. Most are simple statements though. The way i'm processing a sample (lets use bitcrushing for example) ... Sample if(BitcrushOn==1){ Sample = (Sample>>BitDepth); Sample = (Sample<<BitDepth); } ... As the above code says, if BitcrushOn = 0, the sample does not get bitcrushed and passes through just fine, most of the if statements in the sample interrupt are like this. And that is a completely fair assumption, I don't claim to be a good coder at all! Its all part of the learning process for me. Is there any simple tutorial where I can output my code as ASM to take a look? ala42 on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27838 Mon, 15 Jul 2013 17:11:24 +0000 ala42 27838@http://forums.leaflabs.com/ >You say about putting each if statement into a separate function, will this really save time? No, this is nonsense, no one said this AFAIK. gbulmer on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27837 Mon, 15 Jul 2013 16:58:38 +0000 gbulmer 27837@http://forums.leaflabs.com/ pyrohaz - "I wreckon I can shave off a few more us by correcting the data types." You're unlikely to get anything much from this. But, it is better to be accurate. "I never would've imagined changing the variable declaration would have such a large impact on the time consumption!" To me, there are two issues. One is how to optimise the piece of code you have. The other is how to gain portable, long-term skills which will help in this case, and also be useful for years to come. IMHO learning how to use the tools properly will significantly help you in this case, and likely be repeatedly useful. Learning to use the tools will allow you to collect evidence and do proper analysis, analysis which is superior to opinions. For example, looking at the mingled source+assembler would let you see how a variable definition has an effect, and might help you understand why it has the effect. You could see how many assembly language instruction are in a section, and hence which is likely an improvement. You might develop an improved mental model of what volatile, static, extern, automatic, const, *p, a{], if, while, switch, etc. mean to a compiler. "You say about putting each if statement into a seperate function" This is quite a complex idea. Fundamentally, to speed up code, it needs to do fewer instructions, or quicker instructions. In general, the best approach is to use a better algorithm. Next try to improve the code. Branch instructions are usually worse than a load or store instructions, which are in turn worse than 'arithmetic' (etc.) on registers. Hence reducing the number of branch instructions executed can help. The question is, how to do that. That post was my attempt at an explanation of a process which is 'almost automatic'. I would use command line tools like the C-preprocessor, or generate text with a scripting language, to make the code maintainable. If all values used in an <code>if</code> test are constant, and known at compile time, the compiler can dramatically transform and improve the generated code. So if there are 16 paths through the code, produce 16 functions contaning the code, each with a specific set of constants to force a unique path. Each function should be much quicker than the original, 16-path-capable code. EDIT: "I see that you've mentioned the command line compiler, now i'm a bit of a progamming 'newb' if i'm honest, is this suitable for windows?" Is it fair to say you don't understand the components that make up the IDE? The IDE that you see (with menus and windows) is pretty much a fancy text editor which can also run the command-line compiler, linker and upload program. The compiler, linker and upload program do the majority of the work. The command line compiler is installed on your system when you installed the IDE. It *is* the compiler which is run by the IDE to generate code. There is no other compiler. AFAIK, most C/C++ IDEs use a command-line compiler, and many use GNU's gcc, including some commercial products. So yes, the command line compiler is suitable for Windows, and is the only way that code gets generated by the IDE for the Maple. pyrohaz on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27834 Mon, 15 Jul 2013 10:47:38 +0000 pyrohaz 27834@http://forums.leaflabs.com/ Right, I will change the variables to all the desired types now it is cleared up that it processes bytes etc as fast as ints. Changing from volatile to static made a dramatic difference and shaved off 3.5us! Definitely keeping that change, thank you for that, I wreckon I can shave off a few more us by correcting the data types. This is a great saving though as its reduced processor consumption to 54%! I will take your advice and stay away from learning the whole assembly for now, I don't think I really understood just how efficient this compiler was so my apologies there. Since i'm new to assembly, I don't think i'd be able to code better than a compiler at all anyway. I see that you've mentioned the command line compiler, now i'm a bit of a progamming 'newb' if i'm honest, is this suitable for windows? There are quite a few that I can declare as constants so i'll make sure to change these. You say about putting each if statement into a seperate function, will this really save time? As I can choose what form of processing is done on the data, how will using a function save time here as I would've though i'd need an if statement within the function to choose? I'd love to have a go at changing the compiler but i'll be honest, I haven't really got a clue how to do it! Thank you very much for all the help so far, I never would've imagined changing the variable declaration would have such a large impact on the time consumption! @ala42: I'd love to post the code but its a bit sensitive at the moment, i'd hate to have to make you all sign an NDA, sorry about that! Harris ala42 on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27823 Sun, 14 Jul 2013 17:58:46 +0000 ala42 27823@http://forums.leaflabs.com/ Post your code so we can have a look... gbulmer on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27818 Sun, 14 Jul 2013 09:00:35 +0000 gbulmer 27818@http://forums.leaflabs.com/ pyrohaz - "As mlundise mentioned, within the interrupt, what data types should I be using?" As I wrote "- Tell the compiler 'the truth, the whole truth, and nothing but the truth', don't claim volatile or extern, or int when it isn't" I don't know how to make it any clearer and be succinct. If something is a byte size use byte or char/unsigned char, etc. If it is only used within a function, declare it within the function. If it must be stored between function calls but is not shared, make it <code>static</code> to the function or file. As mlundinse explained, there are ways to avoid using volatile, and doing this may have a much bigger impact than tiny details. Have you done the change <code>volatile</code> into <code>static</code> and measured the effect? I do not know of a document which explains this sort of stuff. Some of it is clear if you think about it, e.g. static vs extern or volatile. Some of it can be worked out by looking at the mingled source+assembler listing and reading the appropriate ARM documentation. Some of it needs very sensitive and accurate measurement. If you had access to professional ($1000+) tools, you would want to get the cycle-accurate simulator, which might answer some of these questions for you. You could get much of the information that would likely answer questions about code by running the command line compiler, with the two character option to generate the intermingled source+assembler listing. There is no need to use different libraries. There is no need to use a different IDE. You could use another IDE if it gave access to the compiler's command line options, and use libmaple, but that may be more work. There is a command-line technique for programming Maple described at <a href="http://leaflabs.com/docs/unix-toolchain.html" rel="nofollow">http://leaflabs.com/docs/unix-toolchain.html</a> You can skip the start because you already have the tool chain installed and working because it comes with the IDE. You will need to create a small piece of a makefile which names the source code files you have written. The build process is automated by LeafLabs makefile, you just type <code>make</code>. There are threads on this forum which answer most of the questions about using this mechanism. It is probably worth a couple of hours of your time to attempt to get it working. It will give you more accurate answers than you are likely to get asking questions on a forum (unless the answerer runs the tools themselves). "When setting a variable as conditions, how am I meant to do it? Do I do it all with bitwise comparisons or boolean comparisons (As in & or &&)?" 1. Your measurement mechanism should be sensitive, accurate, and robust enough to tell you the answer. Or it isn't good enough for the resolution this question requires you to work at. IMHO, if the measurement system isn't good enough, you should stop working at such a tiny level of detail, and focus on bigger impact changes. 2. If you looked at the mingled source code + assembler listing, you might be able to work this out, armed with the appropriate ARM documents e.g. DDI0337E_cortex_m3_r1p1_trm.pdf which you can download from the arm.com web site. It has a chapter on instruction timings. It is relatively straightforward, and can be read initially in maybe an hour. As you read the intermingled source+assembler listing you'll want to refer back to that manual. It'll likely take a few days to get familiar. This'll likely be so much quicker than trying to learn ARM assembly language any other way it isn't even funny. 3. The question is likely more complex than can be reliably answered without looking at the code generated by the compiler. The compiler might analyse and generate a way to use a single variable with a few bit flags more efficiently than separate variables. The code sequence might be sensitive to tiny changes, hence I would not guess. So I believe gathering real evidence, using the compiler will be far more accurate and productive than asking people to guess!-) IMHO the list of techniques at <a href="http://www.crowl.org/lawrence/programming/Bentley82.html" rel="nofollow">http://www.crowl.org/lawrence/programming/Bentley82.html</a> is worth pursuing if you want to get good at this sort of thing. At the level of detail you seem to be focusing on, a good technique is to use constants instead of variables. This also means variables are defined as late as practical (preferably as <code>const</code>), which is usually a 'good thing'. Using constants gives the compiler good opportunities to optimise for you. You might, for example, generate the specific code for the major alternatives paths, by copying the source of the interrupt routine into several different functions (as many as their are independent if tests), and filling in the constants in the if statements. You might see which stuff becomes irrelevant, or can fill in more constants. By putting each path into a separate function, the compiler can analyse more specific code. Then you might get rid of more if tests, by 'calculating' which of the alternative functions applies, and use a function pointer to choose the the single path. I am not suggesting this is a 'must do' technique. However, the simpler, more specific and transparent your code, the more likely the compiler will analyse and generate the best code for you. EDIT: My goal is to avoid writing assembler to 'beat the compiler', or to have mixed language programs. They are hard to maintain, and hard to get other folks to take up and use. Instead I try to simplify the code, so the compiler can optimise the s#!t out of it! In my experience, this is quicker and easier than learning to write better assembler than the compiler. More importantly it is a more transferable, portable, useful, long-term skill. It can even result in more maintainable code! The compiler has a lot of optimisations which it does not use with the default optimisation level. IFF you could use the command line build process, you could experiment with those options and get at he mingled assembler+source listings. You might gain deeper insight into your code, how it is translated, and as a result gain more dramatic improvements, and gain some transferable, portable skills From your description, IMHO it would be straightforward to reduce the time spent in the interrupt routine. It might make the entire system run a tiny bit slower. Or, you may reduce the amount of <code>volatile</code> memory accesses so much that the system runs much faster. You could get evidence from the mingled source+assembler listing, or your measurement system using smaller, simpler test cases, and not rely on guesses. One change, which might have a big impact, is using a much newer version of gcc. The beauty is learning how to solve using a newer gcc might yield a handy, portable, long-term skill. Newer versions usually do a better job of optimisation, and recent versions (from launchpad) support fixed point arithmetic. You might find that coding the 'DSP-like' code using a fixed-point data type, properly understood by the compiler, will generate better code than your hand-coded-fixed-point approach. Of course, I might be wrong. However, knowing that for sure, may be a much more profound insight than anything we can ever guess !-) pyrohaz on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27817 Sun, 14 Jul 2013 07:35:45 +0000 pyrohaz 27817@http://forums.leaflabs.com/ Ok, thats an easy change I can do then. When setting a variable as conditions, how am I meant to do it? Do I do it all with bitwise comparisons or boolean comparisons (As in & or &&)? I've fortunately already taken the step to avoid float and double calculations. ala42 on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27812 Sat, 13 Jul 2013 21:24:47 +0000 ala42 27812@http://forums.leaflabs.com/ Quite obviously the interupt code should not reevaluate complex conditions each time that are only changed once in a while by the main code. You should also avoid any float or double operations. pyrohaz on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27809 Sat, 13 Jul 2013 19:55:43 +0000 pyrohaz 27809@http://forums.leaflabs.com/ In addition to the above, some of the if statements have a few complex evaluations (I say complex, they're more than just if A==1), would it be faster if I said C = (A==1)&(B==1), if(C==1)..., predeclaring the situation outside of the interrupt loop? pyrohaz on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27808 Sat, 13 Jul 2013 19:52:05 +0000 pyrohaz 27808@http://forums.leaflabs.com/ mlundise - I have a fair few variables that are used only inside of the interrupt, will it make a notable difference to the speed if I don't declare these as volatile? Luckily I don't have to call any functions so i'm ok there. gbulmer - Wow! I'm utterly impressed with the time that you had put into that reply, it answered every question I could have. I think the wikipedia quotes are brilliant, i'll definitely keep in mind from now on about program optimization. Some optimizations i've done are replacing all possible divisions with bitshifts (for dividing by powers of two). This has so far given me a drastic increase. As with the standard model of a DSP system, where you input data with an ADC, process it then output it with a DAC, in my system, i'm reading the data input in the interrupt, processing it (what is being done with the data is selected by lots of if statements where the if statements used are decided outside of the interrupt), external pots are also read outside of the interrupt as their sample rate isn't particularly important. Data inside of the interrupt is 16bit but processed using 32bit values so I can do most calculations by multiplying and shifting. Most of the processing is normal things like filtering (standard MAC style filtering, IIR 1st order series stages mainly) and bitcrushing (simply shifting down them up again to eradicate lower bits). As mlundise mentioned, within the interrupt, what data types should I be using? Do I need to declare variables that are used in the interrupt but not modified as volatile? Above, you mention about using different data types. Will I get any advantages by declaring true/false statements as bool vs int? I have actually invested in the STM32F0 (worse than the maple specification wise, but cheap!), though I couldn't find many easily used tutorials, in order of ease, i'd currently say it goes arduino > maple > standalone processors with no main dedicated IDE. Since i'd still call myself "new" to programming (<1 year since I started), i've not really learnt how to master an IDE like programming fuses etc. I can manage the low level maple stuff but all the main internals are pre programmed. Do you know any really useful tutorials for the discovery set of chips? I've used coocox to make mine have flashing LED's but its pretty low level code and I can imagine it would take me a few months to get to grips with the feel of the chip. But yes, I would like the interrupt to be a bit shorter so I can do a bit more processing out of interrupt time, i'm currently sitting at about 16us of 22us, which results at about ~70% processor time consumed inside the interrupt. gbulmer on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27804 Sat, 13 Jul 2013 08:30:14 +0000 gbulmer 27804@http://forums.leaflabs.com/ pyrohaz - I think ala42 identified the most important questions. There are some <a href="http://en.wikipedia.org/wiki/Program_optimization#Quotes">great quotes about program optimisatio at wikipedia</a> My favourite is: "The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet." If you are interested in understanding how to improve code performance, I would still recommend Jon Louis Bentley's "Writing Efficient Programs'" (if you can find a copy), or his ACM column, "Programming Pearls", or the books of his column. He identifies 6 'levels' where a program can be optimise, 3 are software and 3 hardware. Trying to code in assembler instead of a high-level language corresponds to one of those levels, 'translation'. Based on a quick glance, this seems an okay summary <a href="http://www.crowl.org/lawrence/programming/Bentley82.html" rel="nofollow">http://www.crowl.org/lawrence/programming/Bentley82.html</a> though it misses that important abstraction. I think it's important to understand that writing some in-line assembler may worsen program performance. Some people seem to think it is at worst 'net neutral' but it is not; performance can get much worse. Mixing two different languages, C/C++ and assembly language, also makes codes harder to understand, change and maintain. This cost may be catastrophic. The compiler does not analyse your assembler code. So it can't move it to a more efficient place (maybe outside a loop, or a limb of an <code>if</code> which isn't executed every time round a loop), it must leave the registers your assembler has 'reserved' free for use by your code (when it might have made better use of them), and make memory look 'right enough' for the assembler to work. The compiler likely understands the processor pipeline, and can sequence instructions to exploit it (IMHO if you have no idea what this means, please don't attempt to write any assembler until you do). Writing in-line assembler generally pokes a hole in the compiler's analysis of the C/C++ code, which is a 'bad thing'. Even a single line of badly judged assembler could force the compiler to generate code which runs a couple of times slower. Before attempting to write any assembler, make sure you know how to get a listing (from the tool chain) of the assembly language produced by the compiler mingled with the source code, and learn to read it. IMHO, until you've understood this, making assembler changes are unlikely to be rational or improve the program performance. Further, you need a reliable way to make accurate measurements of the code. At the level of improving an <code>if</code> test, that measurement system needs to be sensitive to a couple of machine cycles (i.e. 1/72MHz useconds). Assume the compiler has been polished for 10 years by people who intimately understand the architecture of the processor, until proven otherwise. Further, the compiler can keep track of a lot of stuff which a human brain will struggle with. As a simple example, have a look at the assembler code generated by the compiler, then change a couple of variables from <code>volatile</code> to <code>static</code>, comparing the generated assembler code. The set of assumptions the compiler makes changes, and the code reflects that. It is quite hard to make all of the correct changes to the code by hand, as the 'book keeping' can be quite subtle yet tedious. People tend to make much simpler assumptions and hence write inferior code. A piece of inline assembler may force the compiler to use much 'safer', less 'aggressive', simpler analysis optimisations which may swamp the improvement from in-line assembly language. I think one way to do better than a good optimising compiler (e.g. gcc), is to exploit something you know about the code that the compiler can't deduce. That usually means writing better algorithms, or re-organising code to exploit some property of the algorithm. For example using a sort which exploits the order of data, or merging for loops to avoid reloading lots of coefficients. A simple exercise is to try the different compiler optimisation flags, and measure the differences. Another is to change all <code>volatile</code> keywords to <code>static</code>, and measure those differences. This might indicate that it is faster to 'double buffer' data which is shared with the main loop (letting the compiler do the most aggressive optimisation); use a single volatile to communicate to the main loop which buffer-full of data is ready, but otherwise avoid <code>volatile</code>. Declare data at the top level (outside a function) as <code>static</code> so the compiler can know functions outside a file can't access it. mlundinse wrote "Cortex M3/4 are 32 bit processors so declaring variables as byte or halfword will in general only make code slower" That was true for earlier generations of ARM processor. However Cortex-M3/M4 are ARMv7-M architecture processors, where this is not necessarily true. Cortex-M3/M4 have byte and half-word (2 byte) load and store instructions which are the same size and take the same time as word (4 byte int) load and store instructions. So code for byte (char), 2-byte (short) and 4-byte (int) data is the same size and runs at the same speed. Further, for some data structures like arrays of structs, making the struct bigger by using int for each struct member might run slower because the compiler may be forced to use less compact addressing modes. I don't think this is the case for C/C++ to Cortex-M3, but in general, using a word (int) where a byte or half-word is correct, may generate longer code sequences. For example in multiply where the compiler might need an extra register for a double word (8 byte) result. The register might otherwise be available for a variable, saving memory access. There is a problem with using non-word data. If a half-word or word cross the natural word 4-byte boundary, then the processor will need to access memory twice, which is slower than reading memory once. This is invisible to the code; it is 'automatic', and is done by the hardware. The compiler may be instructed by a command line flag to 'pack' data, which would create this problem. The compiler can be told to align on boundaries using the __attribute__ syntax: <code>short s __attribute__ ((aligned (4))) = 0;</code> This ensures the variable is on a word (4-byte) boundary. There is a command line option to the compiler which ensures the compiler aligns variables across a word boundaries to avoid reading an extra cycle. I think it is the default, but I don't know what flags the IDE uses. A 'trick' is to define all int variables first, then all short, then all char to avoid having to worry about this, though that might make code harder to understand. Summary: - "The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet." - Ensure there are extremely good reason for changing working, tested, debuggable code into harder to test, harder to debug code - Ensure there are good measurement mechanisms, sensitive enough to measure differences reliably before changing code - The compiler is probably better than a person at generating assembler for simple C statements, and better at using registers - Using assembler may significantly worsen speed by restricting the compilers ability to re-organise and optimise code - Be prepared to analyse the compiler-generated assembly code before attempting to optimise by writing in-line assembly code - Exploit things the compiler can't deduce from code to optimise program code for example better algorithms - Tell the compiler 'the truth, the whole truth, and nothing but the truth', don't claim <code>volatile</code> or <code>extern</code>, or <code>int</code> when it isn't EDIT { Budget for time to get competent with the tools, processor and measurement techniques otherwise you'll underestimate the cost. IMHO it might be as useful a use of time to use e.g. an STM32F3 or STM32F4 MCU if you can afford the $15, which might give a lot more throughput for heavily DSP-like problems. } Another question is: Do you want the interrupt routine to have a shorter duration, so interrupts are blocked for less time, even if the whole system runs a bit slower? The main reason for doing an interrupt is to service I/O. What I/O is happening that takes so long? It sounds like a lot of processing is happening which is not I/O. Should that be moved to main? Such an architecture might be easier to test, measure and debug. Alternatively, the I/O might be better serviced by DMA. Properly using DMA may make a much bigger difference to run-time than re-coding parts of the program in assembly language. mlundinse on "Speed of If statements" http://forums.leaflabs.com/topic.php?id=12887#post-27734 Thu, 11 Jul 2013 10:42:56 +0000 mlundinse 27734@http://forums.leaflabs.com/ Only I/O registers and the data used both inside the interrupt handler and also in the main loop context must be declared volatile. Volatile means that it can change unexpectedly so it must be read from memory every time it is used. That is not necessary for data used only inside the interrupt handler, even if its data stored in ram. It will be reloaded at the start of the next interrupt. You can safely call other functions from within the interrupt, but if you want to shave a few clock cycles, inling is often better.