Saturday 22 March 2014

Programming language (micro) benchmark

Out of curiosity, arising from a discussion on the Raspberry Pi forum, I did some micro benchmarking of C, Java and Python programming languages on Raspberry Pi. The application loops through numbers from 3 to 10000 and checks whether it is a prime number. I tried my best to make the code as close the same in all three languages (even not using boolean types) and eliminate any interference like console output. Also I did repeat the tests multiple times; seeing only some milliseconds deviation between runs.

Python code:
result = 0
result |= 2
for i in range(3,10001):
    prime = 1
    for j in range(2, i):
        k = i/j
        l = k*j
        if l == i:
            prime = 0
    if prime == 1:
        result |= i
print result

Saved as prime2.py and ran timing the execution:
$ time python prime2.py
16383

real    8m25.081s
user    8m7.960s
sys     0m1.230s

$ python --version
Python 2.7.3
...yes, that is 8 and a half minutes.

C code (repeats the test function 10 times to match Java...):
#include  <stdio.h>
int test() {
  int result = 0;
  result |= 2;
  int i;
  for (i=3;i<=10000;i++) {
    int prime = 1;
    int j;
    for (j=2;j<i;j++) {
      int k = i/j;
      int l = k*j;
      if (l==i) prime = 0;
    }
    if (prime) result |= i;
  }
  printf("%i\n", result);
}

int main() {
  int i;
  for (i = 0; i < 10; i++) {
    test();
  }
}

Saved as prime2.c, compiled with gcc -O2 -o prime2 prime2.c (with typical optimisation level 2) and ran:
$ time ./prime2
16383
16383
16383
16383
16383
16383
16383
16383
16383
16383

real    0m35.957s
user    0m35.780s
sys     0m0.070s

$ gcc --version
gcc (Debian 4.6.3-14+rpi1) 4.6.3
...which is 36 seconds for 10 rounds = less than 4 seconds for the same one round as in Python... (gcc (Debian 4.6.3-14+rpi1) 4.6.3)

Java code (repeats the test function 10 times to eliminate the effect of virtual machine start up time and possibly give the Hotspot compiler a chance to kick in):
public class Prime2 {

  public static void main(String [] args) {
    for (int i = 0; i < 10; i++) {
      test();
    }
  }

  public static void test() {
    int result = 0;
    result |= 2;
    for (int i=3;i<=10000;i++) {
      boolean prime = true;
      for (int j=2;j<i;j++) {
        int k = i/j;
        int l = k*j;
        if (l==i) prime = false;
      }
      if (prime) result |= i;
    }
    System.out.println(result);
  }
}

Saved as Prime2.java, compiled with javac Prime2.java and ran:
$ time java Prime2
16383
16383
16383
16383
16383
16383
16383
16383
16383
16383

real    0m33.490s
user    0m33.130s
sys     0m0.240s

$ java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
...which is pretty impressive - even slightly faster than C - the Oracle guys have done a good job with optimising the Java platform for RPi.

Of course this is just a micro benchmark and one cannot draw too definite conclusions from the results. Also there are other aspects (accessibility, productivity, availability of domain specific libraries etc.) to consider when choosing a programming language.

Friday 21 March 2014

Low-level Graphics on Raspberry Pi (part X+2)

In the previous part we did some animation with page flipping ...requiring a bit ugly 'incantation' outside of the application code.

It is worth noting that not all animation or screen effects require page flipping (or v-syncing). There is no need for such for example if the change between 'frames' is small enough - like updating textual or numerical indicators - or the change is stochastic enough - like for example the legendary (at least back in the day in early-mid 1990's) fire effect.

The fire effect might in fact work as a good example ...also admittedly I love it myself and cannot wait to show it off ;) I wrote my first version of the code for a 386sx PC with a plain vanilla VGA card - tweaked into 320x240 pixels 256 color 'Mode X' somewhere around 1993 after seeing the effect on one university course mate's Amiga. Part of the demoscene ethos and big part of the fun was of course to not copy somebody else's ready-made code but to figure out the idea your self. However I am now going to be a spoiler and reveal the code to this effect (well, there are versions already floating in the internet)...

First we have to come up with a nice fiery color palette - this should give us a nice one from golden yellow, to orange, to red, to fading the red to black:
    // Create palette
    unsigned short r[256]; // red
    unsigned short g[256]; // green
    unsigned short b[256]; // blue
    int i;
    for (i = 0; i < 256; i++) {
        if (i < 32) {
            r[i] = i * 7 << 8;
            g[i] = b[i] = 0;
        }
        else if (i < 64) {
            r[i] = 224 << 8;
            g[i] = (i - 32) * 4 << 8;
            b[i] = 0;
        }
        else if (i < 96) {
            r[i] = 224 + (i - 64) << 8;
            g[i] = 128 + (i - 64) * 3 << 8;
            b[i] = 0;
        }
        else {
            r[i] = g[i] = 255 << 8;
            b[i] = 128 << 8;
        }
    }
    struct fb_cmap palette;
    palette.start = 0;
    palette.len = 256;
    palette.red = r;
    palette.green = g;
    palette.blue = b;
    palette.transp = 0; // null == no transparency settings
    // Set palette
    if (ioctl(fbfd, FBIOPUTCMAP, &palette)) {
        printf("Error setting palette.\n");
    }

...and remember to restore the palette at the end of the execution! Good exercise would be to work out other nice color transitions - something like gas fire like white-yellow-blue for example.

For plotting the pixels, the algorithm has three phases: seed, smooth and convect. In seed phase, we 'sow the seeds for the fire' by plotting semi-randomly bright yellow pixels along the bottom line of the screen. In smooth phase, we scan through all pixels on the screen and average them with the surrounding pixels to smooth out clear color edges. And in the convect phase, we make the flames to rise up the screen slowly cooling down and fading into the background. As it is evident in the code, it is good to play around with the values for seed colors, frequency of seed pixels, smoothing factors etc:

        // Draw
        int maxx = var_info.xres - 1;
        int maxy = var_info.yres - 1;
        int n = 0, r, c, x, y;
        int c0, c1, c2;
        while (n++ < 200) {
            // seed
            for (x = 1; x < maxx; x++) {

                r = rand();
                c = (r % 4 == 0) ? 192 : 32;
                put_pixel(x, maxy, fbp, &var_info, c);
                if ((r % 4 == 0)) { // && (r % 3 == 0)) {
                    c = 2 * c / 3;
                    put_pixel(x - 1, maxy, fbp, &var_info, c);
                    put_pixel(x + 1, maxy, fbp, &var_info, c);
                }
            }

            // smooth
            for (y = 1; y < maxy - 1; y++) {
                for (x = 1; x < maxx; x++) {
                    c0 = get_pixel(x - 1, y, fbp, &var_info);
                    c1 = get_pixel(x, y + 1, fbp, &var_info);
                    c2 = get_pixel(x + 1, y, fbp, &var_info);
                    c = (c0 + c1 + c1 + c2) / 4;
                    put_pixel(x, y - 1, fbp, &var_info, c);
                }
            }

            // convect
            for (y = 0; y < maxy; y++) {
                for (x = 1; x < maxx; x++) {
                    c = get_pixel(x, y + 1, fbp, &var_info);
                    if (c > 0) c--;
                    put_pixel(x, y, fbp, &var_info, c);
                }
            }
        }
It appears that for this we need an utility function for reading the current value from a pixel on the screen:
// utility function to get a pixel
char get_pixel(int x, int y, void *fbp, struct fb_var_screeninfo *vinfo) {
    unsigned long offset = x + y * vinfo->xres;
    return *((char*)(fbp + offset));
}
And then there's the annoying blinking console cursor we need to hide...
#include <linux/kd.h>

...
    // hide cursor
    char *kbfds = "/dev/tty";
    int kbfd = open(kbfds, O_WRONLY);
    if (kbfd >= 0) {
        ioctl(kbfd, KDSETMODE, KD_GRAPHICS);
    }
    else {
        printf("Could not open %s.\n", kbfds);
    }

...and remember to restore at end!

[Full source available in GitHub]

This should produce the awesome effect as seen in this screenshot (already sneak-peeked in part 2):
Starfield, shadebobs and plasma effects would not require page flipping / v-syncing and might be worth attempting... some more examples for inspiration or dwelling in the good oldskool days for example here ;)

[Continued in next part Shapes]

Sunday 16 March 2014

Low-level Graphics on Raspberry Pi (part X+1)

NOTE 2015-01-27: On later Raspbian builds this approach may lead to a lock up! It is adviced to use the 'pure fb' approach - code available in this post.

In previous part we tried a simple animation - with not so perfect results...

Obviously with so many different avenues to tackle for the Raspberry Pi driver developers it is not possible for us users to get everything 'for free' and at instant. I sure wish I had the time and/or the drive to attempt extending the fb driver myself ...or maybe the legendary 'Somebody Else' could do it ;) As the thing is, it does not seem that big of a job: there is already support for the 'page flipping' using double size buffer and panning the display in the Raspberry Pi firmware Mailbox interface.

In the meantime, we could take a stab at trying this out. Raspberry Pi forum users hacking on the 'bare metal' (thanks guys) pointed me before to look at the arch/arm/mach-bcm2708/include/mach/vcio.h in the RPi firmware GitHub sources and to the way to talk to the mailbox:
...
#include "vcio.h"
...
int mboxfd = 0;
...

// helper function to talk to the mailbox interface
static int mbox_property(void *buf)
{
   if (mboxfd < -1) return -1;
   
   int ret_val = ioctl(mboxfd, IOCTL_MBOX_PROPERTY, buf);

   if (ret_val < 0) {
      printf("ioctl_set_msg failed:%d\n", ret_val);
   }

   return ret_val;
}

// helper function to set the framebuffer virtual offset == pan
static unsigned set_fb_voffs(unsigned *x, unsigned *y)
{
   int i=0;
   unsigned p[32];
   p[i++] = 0; // size
   p[i++] = 0x00000000; // process request

   p[i++] = 0x00048009; // get physical (display) width/height
   p[i++] = 0x00000008; // buffer size
   p[i++] = 0x00000000; // request size
   p[i++] = *x; // value buffer
   p[i++] = *y; // value buffer 2

   p[i++] = 0x00000000; // end tag
   p[0] = i*sizeof *p; // actual size

   mbox_property(p);
   *x = p[5];
   *y = p[6];
   return p[1];
}
...
void draw() {
...

        // switch page
        /*
        vinfo.yoffset = cur_page * vinfo.yres;
        vinfo.activate = FB_ACTIVATE_VBL;
        if (ioctl(fbfd, FBIOPAN_DISPLAY, &vinfo)) {
            printf("Error panning display.\n");
        }
        */
        vx = 0;
        vy = cur_page * vinfo.yres;
        set_fb_voffs(&vx, &vy);
        
        //usleep(1000000 / fps);

}

...
// main

    // open a char device file used for communicating with kernel mbox driver
    mboxfd = open(DEVICE_FILE_NAME, 0);
    if (mboxfd < 0) {
        printf("Can't open device file: %s\n", DEVICE_FILE_NAME);
        printf("Try creating a device file with: mknod %s c %d 0\n", DEVICE_FILE_NAME, MAJOR_NUM);
        
    }
...

From the "vcio.h" we are using the two defines: DEVICE_FILE_NAME "char_dev" and . This char_dev is a special file for communicating with the mailbox. The file must be created using the command mknod (see man):
$ sudo mknod char_dev c 100 0
Save the code as say fbtestXII.c (full source in GitHub), download the vcio.h file to the same directory, build with:
gcc -lrt -o fbtestXII fbtestXII.c
(as the code uses the clock functions from librt) and run with ./fbtestXII. This should display the same gliding and bouncing rectangle, but this time with no tearing and with minimal flicker.

The program outputs the timing info - the (most likely) 16 seconds (and some 700 ms) comes from the fps = 100 and secs = 10 ...it is quite obvious that since the screen refresh is now tied to the vertical sync, we 'only' get 60 fps and 100 * 10 = 1000 loops takes 1000 / 60 = 16.6 s.

Now if we change the code a bit:
...
#define NUM_ELEMS 200
int xs[NUM_ELEMS];
int ys[NUM_ELEMS];
int dxs[NUM_ELEMS];
int dys[NUM_ELEMS];
...
void draw() {

    int i, x, y, w, h, dx, dy;
    struct timespec pt;
    struct timespec ct;
    struct timespec df;

    // rectangle dimensions
    w = vinfo.yres / 10;
    h = w;

    // start position (upper left)
    x = 0;
    y = 0;
    int n;
    for (n = 0; n < NUM_ELEMS; n++) {
        int ex = rand() % (vinfo.xres - w); 
        int ey = rand() % (vinfo.yres - h);
        //printf("%d: %d,%d\n", n, ex, ey);
        xs[n] = ex;
        ys[n] = ey;
        int edx = (rand() % 10) + 1; 
        int edy = (rand() % 10) + 1;
        dxs[n] = edx;
        dys[n] = edy;
    }

    // move step 'size'
    dx = 1;
    dy = 1;

    int fps = 60;
    int secs = 10;
    
    int vx, vy;

    clock_gettime(CLOCK_REALTIME, &pt);
    
    // loop for a while
    for (i = 0; i < (fps * secs); i++) {

        // change page to draw to (between 0 and 1)
        cur_page = (cur_page + 1) % 2;
    
        // clear the previous image (= fill entire screen)
        clear_screen(0);
        
        for (n = 0; n < NUM_ELEMS; n++) {
            x = xs[n];
            y = ys[n];
            dx = dxs[n];
            dy = dys[n];
            
            // draw the bouncing rectangle
            fill_rect(x, y, w, h, (n % 15) + 1);

            // move the rectangle
            x = x + dx;
            y = y + dy;

            // check for display sides
            if ((x < 0) || (x > (vinfo.xres - w))) {
                dx = -dx; // reverse direction
                x = x + 2 * dx; // counteract the move already done above
            }
            // same for vertical dir
            if ((y < 0) || (y > (vinfo.yres - h))) {
                dy = -dy;
                y = y + 2 * dy;
            }

            xs[n] = x;
            ys[n] = y;
            dxs[n] = dx;
            dys[n] = dy;
        }
        
        // switch page
        vx = 0;
        vy = cur_page * vinfo.yres;
        set_fb_voffs(&vx, &vy);
        
    }

    clock_gettime(CLOCK_REALTIME, &ct);
    df = timediff(pt, ct);
    printf("done in %ld s %5ld ms\n", df.tv_sec, df.tv_nsec / 1000000);
}
...
(full source)...and build with (optimisation on):
gcc -O2 -lrt -o fbtestXIII fbtestXIII.c
...we should get 200 colorful, bouncing 'sprites' going all over the screen:
Using the "char_dev" (especially as it has to be created as root) is not the most elegant way, but so far the only solution I know (if we want to stick to the fb) and at least for some uses this may be quite enough.

[Continued in part X+2>

Friday 14 March 2014

Low-level Graphics on Raspberry Pi (part X)

Now that we have been gradually building the example program to allow us to do something interesting - how about trying a bit of animation?
...
// helper function to draw a rectangle in given color
void fill_rect(int x, int y, int w, int h, int c) {
    int cx, cy;
    for (cy = 0; cy < h; cy++) {
        for (cx = 0; cx < w; cx++) {
            put_pixel(x + cx, y + cy, c);
        }
    }
}

// helper function to clear the screen - fill whole 
// screen with given color
void clear_screen(int c) {
    memset(fbp, c, vinfo.xres * vinfo.yres);
}

void draw() {

    int i, x, y, w, h, dx, dy;

    // start position (upper left)
    x = 0;
    y = 0;
    // rectangle dimensions
    w = vinfo.yres / 10;
    h = w;
    // move step 'size'
    dx = 1;
    dy = 1;

    int fps = 100;
    int secs = 10;
    
    // loop for a while
    for (i = 0; i < (fps * secs); i++) {

        // clear the previous image (= fill entire screen)
        clear_screen(8);
        
        // draw the bouncing rectangle
        fill_rect(x, y, w, h, 15);

        // move the rectangle
        x = x + dx;
        y = y + dy;

        // check for display sides
        if ((x < 0) || (x > (vinfo.xres - w))) {
            dx = -dx; // reverse direction
            x = x + 2 * dx; // counteract the move already done above
        }
        // same for vertical dir
        if ((y < 0) || (y > (vinfo.yres - h))) {
            dy = -dy;
            y = y + 2 * dy;
        }
        
        usleep(1000000 / fps);
        // to be exact, would need to time the above and subtract...
    }

}
...

Save as fbtestX.c (complete code in GitHub) - build with make fbtestX. This should give us a moving white rectangle that bounces off the screen sides... Unfortunately the updates are not smooth (at least on most displays) - there is a quite prominent tearing effect.

Linux framebuffer interface does define some methods to overcome this - we could make the framebuffer virtual size double the height of the (smaller) physical one using the FBIOPUT_VSCREENINFO call:
  // Set variable info
  vinfo.xres = 640; // try a smaller resolution
  vinfo.yres = 480;
  vinfo.xres_virtual = 640;
  vinfo.yres_virtual = 960; // double the physical
  vinfo.bits_per_pixel = 8;
  if (ioctl(fbfd, FBIOPUT_VSCREENINFO, &vinfo)) {
    printf("Error setting variable information.\n");
  }

  //long int screensize = vinfo.xres * vinfo.yres;
  // have to use the virtual size for the mmap...
  long int screensize = vinfo.xres_virtual * vinfo.yres_virtual;

And change our drawing loop to use the two halves of the virtual buffer using a call to FBIOPAN_DISPLAY for page-flipping tied to a vertical sync using FB_ACTIVATE_VBL:
    int vs = 0;
    // initially show upper half (0) - draw to lower half (1)
    int cur_half = 1;
    for (i = 0; i < 1000; i++) {

        fill_rect(x, y, 40, 40, 4);

        x = x + dx;
        y = y + dy;

        if ((x < 0) || (x > (vinfo.xres - 40)) {
            dx = -dx;
            x = x + 2 * dx;
        }
        if ((y < 0) || (y > (vinfo.yres - 40)) {
            dy = -dy;
            y = y + 2 * dy;
        }

        // switch page
        vinfo.yoffset = cur_page * vinfo.yres;
        vinfo.activate = FB_ACTIVATE_VBL;
        if (ioctl(fbfd, FBIOPAN_DISPLAY, &vinfo)) {
            printf("Error panning display.\n");
        }
    }

Unfortunately these calls have not been implemented in the RPi framebuffer driver (does not seem to be in later versions either yet, see also http://www.raspberrypi.org/phpBB3/viewtopic.php?f=67&t=19073). So running the above code (full code) results in repeated output of Error panning display. and the rectangle flashing even worse (as we miss every second screen update by drawing outside of the visible area).

[Continued in next part]

[UPDATE on changes in the fb driver here]

Restart

Well, nearly a full year in hibernation. Life happens: family, day job, second job, other hobbies... and by the last post the tutorial series looked sort of 'good enough' which caused a kind of a writer's block.

Apologies if someone was feeling left hanging - also apologies for not answering many comments posted. I have now disabled comments as I cannot commit to answering - also I might not be a guru enough for many of them ;) I have added a link on the right panel to the discussion on the Raspberry Pi forum - where I will pop in from time to time, but more importantly there should be other like-minded, helpful people too on the board.

Ok, enough of that - let's take a look at some graphics coding in the next part of 'Low-level Graphics on Raspberry Pi'.