It’s alive!

We haven’t posted for a while, but that’s because we’ve been too busy working on the car! A lot has changed, and much has improved. We have a car, an app that controls the car, a car that stops when an object is detected (mostly..), and some flashing lights!

We’ll post some more updates on how things are working under the hood in the future, but almost there! There is still some work to be done on the logic that controls the braking of the car, but look, flashing lights!

Creating a tachometer with an Arduino and some correction fluid


One of the requirements for our project is being able to get the speed the car is travelling at. As well as providing useful feedback for the human controller, this information can be used in the decision making processes.

We decided to implement this in the simplest possible way, using a digital line sensor (

From the product description page:

The board’s QRE1113 IR reflectance sensor is comprised of two parts – an IR emitting LED and an IR sensitive phototransistor. When you apply power to the VCC and GND pins the IR LED inside the sensor will illuminate. A 100Ω resistor is on-board and placed in series with the LED to limit current. The output of the phototransistor is tied to a 10nF capacitor. The faster that capacitor discharges, the more reflective the surface is.

In short, the IC outputs a lower value if more light is reflected (i.e. the object is bright), and a higher value if less light is reflected (i.e. the object is dark).


Sample output. (synthesised)

On the Arduino microcontroller, we can test the output of the line sensor in a loop. When we detect lower numbers, we know that the sensor is over the brighter part of the wheel. Every time a drop is seen, a revolution counter is incremented (because the wheel will have spun once). Every second, an interrupt is raised, and the speed calculated by using the following formula:

speed(m/s) = rpm / (60×2πr)   where r is the radius of the wheel

This method of getting speed is extremely simple and requires a minimal amount of circuitry. It is less reliable than using something like a Hall Effect Sensor, as the surrounding light may interfere with the measurements, or the IR sensor on the IC may get dirty/somehow have it’s line of sight to the wheel blocked. For our controlled environment however, the system works well and performs to specification.


Performance and System Optimisation of complete Stereo Vision program

Previously, I have written about the performance of different algorithms, and explained why we chose the OpenCV implementation of Block Matching (StereoBM). I have also written about some results we obtained after trying as yet unpublished technique. At that stage, we were able to achieve 7.5 frames per second with a maximum number of disparities of 80, and image size of 320 x 240.

After implementing the entire system around it, like capturing images from camera, doing blob detection and display, and taking it into account, we were getting a paltry 3 – 4 fps.

Initial Optimisations

  1. Decreased the maximum number of disparities from 80 to 48. This increases the distance to the closest object that we can detect to about 1m, but that is acceptable, as anything closer would probably be within the car’s stopping distance anyway, and so there would be a crash regardless of whether we detect it or not.
  2. Cropped the image to reduce the peripheral information in the image. The decisions should be made by what’s directly in front of the car, so there is no point in wasting time doing computations on the sky.

With these changes, we were able to achieve around 5 fps. A small improvement, but an improvement nonetheless, so we have made these changes permanent.

In-depth Performance Analysis

To understand what was eating up the CPU time, I measured the time taken for all of the individual steps in the process per iteration (i.e. one stereo image set). Bearing in mind that all of the steps are happening sequentially, the numbers (in fps) are as follows:

Acquiring Images Pre-processing images Calculating Disparity + Blob Detection Overall
25 24 8.5 5

Just an aside – we have recently discovered that acquiring images is dependent on the light level as well, because our camera is a bit rubbish. Assuming an adequate quantity of light, however, we see that we can acquire images at 25 fps from the camera, pre-process them in real-time (if happening in parallel). I combined Calculating Disparity and Blob detection because Blob Detection is very fast. Since all this is happening sequentially, and there are other smaller overheads, the overall speed turns out to be 5 fps.

System Optimisations

The Pandaboard ES has dual core ARM Cortex –A9 with the cores running at 1.2GHz each. At this stage, we are using one core for all of the process, and one core for hosting a server. The server core is mostly idle. We need to ignore the server for the moment, and modify our system to introduce some serious concurrency!

Implementation option 1

The simplest way to do this would be to acquire images and pre-process them on one core, and calculate the disparity and blob detection on the second core. This can be achieved by creating two storage buffers, buf0 and buf1. The cameras grab the image, and after the pre-processing, we store it in buf0. Buf0 can then be copied to buf1, and we can start doing disparity calculations on buf1. While this is happening, the first thread can refresh the images from the camera in buf0, and so on, so that acquiring + pre-processing images and calculating disparity + blob detection happen concurrently. Here is the idea in diagram form:

There are some drawbacks to this approach however, primarily the fact that we have to copy. Also, as we established before, image acquisition + pre-processing happens faster than calculating disparity, so we would have to frequently wait for disparity calculation to finish, before being able to copy data into buf1. This would affect performance significantly.

Implementation option 2

The second way to do this is a little more complicated, so naturally, that is what we have implemented. Instead of having a separate buffer for storing images, then copying them for calculating disparity, we now have a two way buffer system. We have two buffers that are being shared between the two cores, and we write alternatively to them. So we have two buffers, buf0, and buf1. We first grab the images from the camera and store them in buf0. We then grab the next pair of images and store them in buf1. While we are working on getting the next pair of images, we start calculating on buf0. Once that is complete, we start calculating on buf1, and storing images in buf0. This way, we deal with two sets of frames in one loop iteration, and avoid any overhead associated with copying the data.

I should mention that we are using OpenMP for parallelisation. We need to make sure that only one process is access any of the buffers at one time. We don’t want to be calculating on buf0 while grabbing and storing images in it at the same time! This would result in all sorts of horribleness. To avoid this issue, we use the concept of mutual exclusion, and make use of OpenMP critical directive. While a section of code is surrounded by a #pragma omp critical(name), only one thread is allowed to execute inside that region. So all of the areas of code which access buf0 wil be surrounded by #pragma omp critical(buffer0). If a thread reaches that statement, while the other thread is inside the critical region, that thread will be forced to wait until the other one has exited.

Figure 3 Diagramatically: the system only progresses when both conditions are met for a transition to take place.

Initially, before we get inside the loop for an indefinite amount of time, we need to fill the buffers.

The program has two distinct critical sections, one for each buffer. They occur on both cores, and the execution of the program is as in the animated diagram:

Now that we have the basic system in place, we need to take care of Blob detection. As we know, image acquisition is faster than disparity calculation, so to avoid Core 0 to wait long, we should give it as much work to do as possible. As such, we’ve decided to do the Blob Detection and decision making on core 0. To do this, the output of the disparity map is stored by core1 in a buffers dispBuf0, and dispBuf1. Before getting the next image from the camera, we perform blob detection on the previous disparity map. This is best explained with another diagram:

The different colours represent the two critical sections. The two rows inside the box are the processes happening on the two cores. The blue and orange boxes that are in the same columns execute concurrently. So while core1 is calculating disparity from imgBuf1, core0 is doing blob detection on the disparity calculated from imgBuf0, then getting the next images from the camera, and so on. Notice that, when we do blob detection, and decide that the car needs to slow down or speed up, we change the maximum number of disparities. However, by the time we are ready to communicate this decision, the system has already calculated disparity for the next frame using the existing max number of disparities, so we need to invalidate the data in that dispBuf, as it no longer represents the current situation.

In every iteration of our main program loop, we are able to process two new frames. And so with this system in place, we can get 15 – 16 frames per second, almost double of what we measured as the stand-alone time for calculating disparity + blob detection earlier!


With our new concurrent system, we are able to acquire and pre-process images while calculating disparities at the same time. We avoid unnecessary overhead by making use of a two way buffering system. Since we know that the bottleneck of the system is Calculating disparity, we can offload all of the other work to Core0 so it doesn’t have to wait as long for the buffers to be depleted. We have managed to get about 15 – 16 frames, which is good enough for real-time stereo vision, and is approximately 3 times faster than our previous sequential, single core system.

Changing disparity range dynamically

Hello! It’s been a while since the last post! That doesn’t mean we’ve been idle, though. There has been lots of progress on the stereo vision setup in terms of performance and robustness as we gear up for a demonstration in a week’s time. More on that in later posts though, right now I’d just like to talk about a small but important feature.

In a real car, when you see an object in the distance, you slow down as you approach the object, eventually coming to a halt close to it. We wanted to accomplish something similar to this for our autonomous car.

To achieve this, we need to be able to perceive the distance to said object. From the stereo pair of images and the disparity map, it is possible to calculate the absolute distance to an object, but that is computationally expensive, and we wanted to avoid this, and make the most of the necessary computations we have already performed.

As such, the solution we have implemented is thus:

  1. Let’s say the maximum number of disparities is set to 48 (16 * 3). With our camera setup, the closest object we can detect (i.e. brightest/whitest on the disparity map) is roughly 1 meter away.
  2. Let’s say we are 2 meters away from this object. As we get closer to it, the level of white corresponding to this object increases. Once this passes a certain threshold, we know we are close to the object, and can tell the car to slow down a notch.
  3. As the car slows down, we increase the maximum number of disparities to 16 * 4. Now the distance to the closest object we can detect decreases, so we don’t lose sight of it. We can then repeat the same procedure, where passing over a certain threshold level of white in our disparity map causes us to slow down, and increase the number of disparities.
  4. We repeat this process until we get to a maximum maximum number of disparities. Let’s say this is 128 (16 * 8), which gives a range of about 15 – 20 cm. Now when we cross over the whiteness threshold, we are travelling very slowly, and we know that we are about 15 – 20 cm away from an object, so we can tell the car to stop.

The actual numbers can be set depending on the required range, how close you want to stop to the object, and basically just experimenting around with the actual car.

There are some obvious limitations to this approach though:

  1. As we get closer to the object and increase the max number of disparities, the frame rate drops as calculating the disparity map takes longer. However, this coincides with the car slowing down, so we have more time per frame.
  2. Unless you have been ‘tracking’ an object as it gets closer, any object closer than what the initial number of disparities allows will not be detected. For example, if I stick my leg out about 20 cm in front of the car, it will not be detected and the car will crash into me. The disparity map will mostly be black (i.e. error pixels). To overcome this problem, we could perhaps modify the algorithm so that it increases the max number of disparities (thus allowing us to ‘look’ closer) if the number of error pixels is over a threshold, or simply stop the car if that is the case. We could also use a hardware proximity sensor as an input to put on an emergency break.

We have yet to see how this approach works with a relatively fast moving car, but testing it with moving objects in front of a stationary camera indicates promising results.


Progress and Plans

We’ve lost a bit of momentum recently, what with having to focus on projects for other modules and waiting for some of the equipment to arrive, so I thought I’d take a moment to collate everything we’ve done so far, and things we plan to do in the future.


  1. We have a stereo vision module that is flexible enough to change number of disparities while running (to allow for different speeds the car might be travelling at) and that shouts if there are objects which are too close.
  2. We have a server and a client that can currently send and receive an image integrated with the stereo vision code.


  1. We need to extend the server and client so that we can transfer more than just an image.
  2. We need to devise a mechanism for controlling the car. We only just got the microcontroller that we’ll be using for this today, so we’ll get started on this soon.
  3. We need to implement a method for measuring the speed of the car, so we can adapt the stereo vision algorithm accordingly. We are still waiting for some equipment to do this, but it should be pretty interesting to implement.
  4. I’ve recently discovered that image acquisition and rectification is quite expensive. This is one part of the stereo vision process I was ignoring in terms of performance consideration, but we’ll have to do something about this.
  5. We need to design a mount (and potentially a power supply) for the car, so we can hook everything up in a neat package!
  6. We need to create a basic iPhone (or Android, or desktop) application for demonstration purposes.

It seems like we need to do a lot more than we’ve done already, but I think the two things we’ve done (plus all the setup) was quite a big chunk of it all. Also, bearing in mind that we (still!) haven’t received all of the equipment that we need (including the car, or our own Pandaboard..), there isn’t much more that we can do at the moment.


Quick way to make sure camera input is correct (overlaying images using OpenCV)

As I mentioned in one of the earlier posts, we need to limit the camera resolution to 320 x 240 to allow streaming to the Pandaboard. This means that the camera needed to recalibrated. I did that, but when using the images obtained at the lower resolution and rectified by the new calibration parameters, I ran into a strange problem: closer objects seemed darker than the ones further away in the disparity map! This is the opposite of what you’d except, and since running the same algorithm on a bunch of test images was producing the correct output, I knew it was something to do with the camera input. So after a suggestion from our supervisor, I wrote a small function that overlays one image on top of the other. Theoretically, objects closer to the camera should have a larger distance between them in the two pictures than objects further away.

The right image is superimposed on the left image.

Sorry about the picture above being quite dark (it was the first image captured by the camera as it turned on, converted to grayscale, and blurred), but you can see that there is a huge difference between the position of the microwave oven in the two pictures, more so than the head (in the bottom right). The microwave is far enough away that it should be roughly in the same place in both the images. This explains why the disparity was inverted on the disparity map, so this problem should be fixed once I correctly (re)calibrate the cameras.

I could not find a lot of resources for writing a function that overlays two images, so here it is:

//Takes in a custom data structure that holds the left and right image.
//Can be easily modified to take in individual images instead.
//Scale determines the weighting of each image
Mat OverlayImages(StereoPair camImages, double scale)
    //Create new matrix for storing output
    Mat overlay = Mat(camImages.leftImage.size(), camImages.leftImage.type());

    //Initialise pointers to data in each of the matrices
    uint8_t* l = (uint8_t*);
    uint8_t* r = (uint8_t*);
    uint8_t* n = (uint8_t*);

    int cn = camImages.leftImage.channels();
    int cols = camImages.leftImage.cols;

    for(int i = 0; i < camImages.leftImage.rows; i++)
        for(int j = 0; j < camImages.leftImage.cols; j += cn)
            Scalar_<uint8_t> lPixel;
            Scalar_<uint8_t> rPixel;

            lPixel.val[0] = l[i*cols*cn + j*cn + 0]; // B
            lPixel.val[1] = l[i*cols*cn + j*cn + 1]; // G
            lPixel.val[2] = l[i*cols*cn + j*cn + 2]; // R

            rPixel.val[0] = r[i*cols*cn + j*cn + 0]; // B
            rPixel.val[1] = r[i*cols*cn + j*cn + 1]; // G
            rPixel.val[2] = r[i*cols*cn + j*cn + 2]; // R

            n[i*cols*cn + j*cn + 0] = ((int)lPixel.val[0] * scale) + ((int)rPixel.val[0] * scale);
            n[i*cols*cn + j*cn + 1] = ((int)lPixel.val[1] * scale) + ((int)rPixel.val[1] * scale);
            n[i*cols*cn + j*cn + 2] = ((int)lPixel.val[2] * scale) + ((int)rPixel.val[2] * scale);
    return overlay;


Detecting objects in our disparity map

Now that we are fairly happy with our disparity map, we need a way to find objects in it, so that we can calculate the distance to them and decide whether the car needs to stop or not.

There are several methods that we came across for doing this, but the one we’ve decided on is segmenting the image via blob detection.

I started implementing this from scratch using contours, where you detect the contours in an image, and that essentially gives you a closed region of the image. You can then do further analysis on the contours, and discard those which are below a certain area, calculate the mean intensity of the pixels within a contour and so on.

Figure 1 Result of extracting contours, and filling them in with a random colour.

That’s how far I had got with my implementation when I discovered the cvBlobsLib library. It’s a complete library for blob detecting that integrates with OpenCV (note that OpenCV has a SimpleBlobDetector class but that is quite limited at the moment). cvBlobsLib basically implements all of the features that we might require, and probably does them faster than I could implemented, so why reinvent the wheel, right?

Installing cvBlobsLib on Linux

First, you must download appropriate archive for Linux from here. Extract the contents in a directory on your desktop, then follow the instructions in the readme file.

Then, in your eclipse project, under C/C++ Build -> Settings -> GCC C++ Linker -> Libraries, add blob in Libraries(-l), and under GCC C++ Compiler -> Includes, add /usr/local/include/cvblobs. And finally, in the working .cpp file, add an include directive #include <cvblobs/blob.h> (or #include “blob.h” if you stored the header files locally).

Using cvBlobsLib

cvBlobsLib only works with the C style IplImage object instead of the Mat object in OpenCV, but converting between the two is not is not that big an issue. Plus you can only change the header file, and then not have to copy all of the data from one format to the other, so there is no real performance impact.

//bm is our disparity map in a Mat
IplImage *dispIpl = new IplImage(bm);	//create an IplImage from Mat

//Declare variables
CBlobResult blobs;
CBlob *currentBlob;
int minArea = 1;

blobs = CBlobResult(dispIpl, NULL,0);  //get all blobs in the disparity map
blobs.Filter( blobs, B_EXCLUDE, CBlobGetArea(), B_LESS, minArea ); //filter blobs by area and remove all less than minArea

//Display blobs
IplImage *displayedImage = cvCreateImage(Size(640,480),8,3); //create image for outputting blobs
for (int i = 0; i &lt; blobs.GetNumBlobs(); i++ )
	currentBlob = blobs.GetBlob(i);
	Scalar color( rand()&amp;255, rand()&amp;255, rand()&amp;255 );
	currentBlob-&gt;FillBlob( displayedImage, color);
Mat displayImage = displayedImage; //Convert to Mat for use in imshow()
imshow("Blobs", displayImage);

Figure 2 Disparity map

Figure 3 Result of blob detection with minArea set to 1

We can also do some noise filtering by excluding blobs that are below a certain size.

Figure 4 Result of blob detection with minArea set to 50. Note that there is a lot less noise.

Another big problem we can see immediately is that the person in the foreground and the car in the background are detected as one region. This is because the edge of the person is not closed, and so if you were to draw a contour, it would go around the edges of the person and the car, like so:

Figure 5 Contours in disparity map. Note that the person and the object to the right share are surrounded by the same contour.

We’ve dealt with this problem using morphological filters.

Morphological filtering

“Morphological filtering is a theory developed in the 1960s for the analysis and processing of discrete images. It defines a series of operators which transform an image by probing it with a predefined shape element. The way this shape element intersects the neighbourhood of a pixel determines the result of the operation” [1].

OpenCV implements Erosion and dilation filters as simple functions. For this problem we need to use the erode filter.

erode(src, dst, Mat());  //default kernel is of size 3 x3

//Optionally, select kernel size
cv::Mat element(7,7,CV_8U,cv::Scalar(1));
erode(src, dst, element);

Figure 6 Filter with a 2×2 kernel. It does the job! But we need to experiment with more images to set a final value for the kernel size.

Figure 7 Erosion with kernel size 3×3 (left) and 7×7 (right) for illustration purposes. 7×7 is clearly very destructive.


We’ve isolated our objects of interest! Now all that remains to be done is go over the blobs, find their average intensity, and calculate the distance!

Sources: [1], cvBlobsLib