October 7, 2009

Module : Console Development Part 3

Posted in Development Report - PSP Demo at 13:35 by markcampbellprogrammer

Optimisation Techniques

After learning a little bit about the compiler and the way executables are made for the PSP, the main aim I have for writing my project is to keep my data structures and programming as simple as possible. The more complex calculation, the longer amount of time the complier will take to change my code into machine code and thus slowing down operations (usually). Below is a short list of methods I will be using when writing my program –

  • Use inline functions for small functions called a lot.
  • Avoid floating point arithmetic, especially for divide and multiplication operations.
  • Use integers with integers and floats with floats as conversion can slow processing.
  • Use bit shifting by powers of 2 for integer multiplication and division.
  • When using matrix operations make use of sparseness i.e. zero entries.
  • Avoid square roots and trigonometry functions.
  • When performing mathematical calculations, see if I can reduce the expressions manually before coding them. For example, n*(f+1)/n is equivalent to (f+1) because the multiplication and division of n cancel out.
  • Cache any complex mathematical results that may be used later on down the pipeline.
  • Make effective use of loop unrolling.
  • Make use of look up tables for fixed values, especially with cos and sin functions.
  • Use ASM language to gain that little bit more speed.
  • Use pro tools like VTune for effective profiling.
  • Make use of effective memory allocation by using pools, keeping data structures together and reuse immediately.

The main focus is to get something working on screen as soon as possible then to make use of these techniques to make the demo run faster and more efficiently.  There is no point in trying to write the most optimized code from the start as it is likely to not work or perform as well as refactored code.  What I aim to do is to apply these techniques to code that is already working and behaving as well as it should to improve performance and speed.

“Pre-mature optimization is the root of all evil”

Implementation

After deciding to do try and implement the plasma fractal, I decided to first investigate exactly what happens to create this effect, and more importantly how quickly I can implement it for the PSP. I started working through one of the examples and reading up on the online documentation.  

How the plasma effect is made

The plasma effects are generated by looping through colours and warping them in a way to create a fluid liquid type effect.

There are quite a few ways to implement the plasma effect so I decided to first look at the best methods to implement one relevant to the PSP’s hardware and then look at ways to optimise the performance once I had a demo up and running. 

The first step for the demo is to set up the screen using a cosine wave.  As cosine always gives a value between -1 and +1, I can use this to set the colour value of the pixel to the sine of its x co ordinate. This then can be set to 2 different colours, each representing the peaks and valleys of the wave.

Implementing a colour palette can also be done in many ways however the way I decided to do it was to have a data structure holding the hex values of the colours I wanted to use and then use the pre – calculated cos values stored in a buffer to re – draw the pixels.     

Key Optimisation

Look Up Tables – Using functions like sin and cos can be expensive to compute and so instead I decided to use a look up table which instead only requires a simple array indexing operation. A value is returned from memory rather than working out a significantly more expensive operation. I used 2 look up tables to hold the data for the cosine computations as well as the hex values for the colour tables.

Loop unrolling – In the update method for my program I originally had the row colour data being worked out in 1 big for loop which meant that a lot of data was being processed all at once causing a slight bottleneck in my program. I decided to compute each row in its own loop rather than let the data all be done at once.

Using a 1D instead of a 2D array – The screenData.h file which holds the co – ordinate and colour data was originally in a 2D array which I was using to work out the 2 columns for the plasma. However after realising that the y co – ordinates was not really needed when working out this step in the program, I instead set just the x value as there is no change in the y value which meant that I had less data to process. I could have used a 2D array if I wanted to pass a cosine value through the y value as well, however for the purpose of this program it wasn’t needed.

Further Optimisations – Looking at my program there are still things I would like to implement and optimise. The screenData.h file holds a lot of data that I would prefer to be dynamically allocated and eventually I would like to do some more profiling on my code itself. I would also like to implement a sin function in the y co ordinate for each pixel to create a more traditional plasma effect.

Leave a comment