It’s alive!

We haven’t posted for a while, but that’s because we’ve been too busy working on the car! A lot has changed, and much has improved. We have a car, an app that controls the car, a car that stops when an object is detected (mostly..), and some flashing lights!

We’ll post some more updates on how things are working under the hood in the future, but almost there! There is still some work to be done on the logic that controls the braking of the car, but look, flashing lights!

Creating a tachometer with an Arduino and some correction fluid

Image

One of the requirements for our project is being able to get the speed the car is travelling at. As well as providing useful feedback for the human controller, this information can be used in the decision making processes.

We decided to implement this in the simplest possible way, using a digital line sensor (https://www.sparkfun.com/products/9454).

From the product description page:

The board’s QRE1113 IR reflectance sensor is comprised of two parts – an IR emitting LED and an IR sensitive phototransistor. When you apply power to the VCC and GND pins the IR LED inside the sensor will illuminate. A 100Ω resistor is on-board and placed in series with the LED to limit current. The output of the phototransistor is tied to a 10nF capacitor. The faster that capacitor discharges, the more reflective the surface is.

In short, the IC outputs a lower value if more light is reflected (i.e. the object is bright), and a higher value if less light is reflected (i.e. the object is dark).

Picture1

Sample output. (synthesised)

On the Arduino microcontroller, we can test the output of the line sensor in a loop. When we detect lower numbers, we know that the sensor is over the brighter part of the wheel. Every time a drop is seen, a revolution counter is incremented (because the wheel will have spun once). Every second, an interrupt is raised, and the speed calculated by using the following formula:

speed(m/s) = rpm / (60×2πr)   where r is the radius of the wheel

This method of getting speed is extremely simple and requires a minimal amount of circuitry. It is less reliable than using something like a Hall Effect Sensor, as the surrounding light may interfere with the measurements, or the IR sensor on the IC may get dirty/somehow have it’s line of sight to the wheel blocked. For our controlled environment however, the system works well and performs to specification.

Hassan

Performance and System Optimisation of complete Stereo Vision program

Previously, I have written about the performance of different algorithms, and explained why we chose the OpenCV implementation of Block Matching (StereoBM). I have also written about some results we obtained after trying as yet unpublished technique. At that stage, we were able to achieve 7.5 frames per second with a maximum number of disparities of 80, and image size of 320 x 240.

After implementing the entire system around it, like capturing images from camera, doing blob detection and display, and taking it into account, we were getting a paltry 3 – 4 fps.

Initial Optimisations

  1. Decreased the maximum number of disparities from 80 to 48. This increases the distance to the closest object that we can detect to about 1m, but that is acceptable, as anything closer would probably be within the car’s stopping distance anyway, and so there would be a crash regardless of whether we detect it or not.
  2. Cropped the image to reduce the peripheral information in the image. The decisions should be made by what’s directly in front of the car, so there is no point in wasting time doing computations on the sky.

With these changes, we were able to achieve around 5 fps. A small improvement, but an improvement nonetheless, so we have made these changes permanent.

In-depth Performance Analysis

To understand what was eating up the CPU time, I measured the time taken for all of the individual steps in the process per iteration (i.e. one stereo image set). Bearing in mind that all of the steps are happening sequentially, the numbers (in fps) are as follows:

Acquiring Images Pre-processing images Calculating Disparity + Blob Detection Overall
25 24 8.5 5

Just an aside – we have recently discovered that acquiring images is dependent on the light level as well, because our camera is a bit rubbish. Assuming an adequate quantity of light, however, we see that we can acquire images at 25 fps from the camera, pre-process them in real-time (if happening in parallel). I combined Calculating Disparity and Blob detection because Blob Detection is very fast. Since all this is happening sequentially, and there are other smaller overheads, the overall speed turns out to be 5 fps.

System Optimisations

The Pandaboard ES has dual core ARM Cortex –A9 with the cores running at 1.2GHz each. At this stage, we are using one core for all of the process, and one core for hosting a server. The server core is mostly idle. We need to ignore the server for the moment, and modify our system to introduce some serious concurrency!

Implementation option 1

The simplest way to do this would be to acquire images and pre-process them on one core, and calculate the disparity and blob detection on the second core. This can be achieved by creating two storage buffers, buf0 and buf1. The cameras grab the image, and after the pre-processing, we store it in buf0. Buf0 can then be copied to buf1, and we can start doing disparity calculations on buf1. While this is happening, the first thread can refresh the images from the camera in buf0, and so on, so that acquiring + pre-processing images and calculating disparity + blob detection happen concurrently. Here is the idea in diagram form:

There are some drawbacks to this approach however, primarily the fact that we have to copy. Also, as we established before, image acquisition + pre-processing happens faster than calculating disparity, so we would have to frequently wait for disparity calculation to finish, before being able to copy data into buf1. This would affect performance significantly.

Implementation option 2

The second way to do this is a little more complicated, so naturally, that is what we have implemented. Instead of having a separate buffer for storing images, then copying them for calculating disparity, we now have a two way buffer system. We have two buffers that are being shared between the two cores, and we write alternatively to them. So we have two buffers, buf0, and buf1. We first grab the images from the camera and store them in buf0. We then grab the next pair of images and store them in buf1. While we are working on getting the next pair of images, we start calculating on buf0. Once that is complete, we start calculating on buf1, and storing images in buf0. This way, we deal with two sets of frames in one loop iteration, and avoid any overhead associated with copying the data.

I should mention that we are using OpenMP for parallelisation. We need to make sure that only one process is access any of the buffers at one time. We don’t want to be calculating on buf0 while grabbing and storing images in it at the same time! This would result in all sorts of horribleness. To avoid this issue, we use the concept of mutual exclusion, and make use of OpenMP critical directive. While a section of code is surrounded by a #pragma omp critical(name), only one thread is allowed to execute inside that region. So all of the areas of code which access buf0 wil be surrounded by #pragma omp critical(buffer0). If a thread reaches that statement, while the other thread is inside the critical region, that thread will be forced to wait until the other one has exited.

Figure 3 Diagramatically: the system only progresses when both conditions are met for a transition to take place.

Initially, before we get inside the loop for an indefinite amount of time, we need to fill the buffers.

The program has two distinct critical sections, one for each buffer. They occur on both cores, and the execution of the program is as in the animated diagram:

Now that we have the basic system in place, we need to take care of Blob detection. As we know, image acquisition is faster than disparity calculation, so to avoid Core 0 to wait long, we should give it as much work to do as possible. As such, we’ve decided to do the Blob Detection and decision making on core 0. To do this, the output of the disparity map is stored by core1 in a buffers dispBuf0, and dispBuf1. Before getting the next image from the camera, we perform blob detection on the previous disparity map. This is best explained with another diagram:


The different colours represent the two critical sections. The two rows inside the box are the processes happening on the two cores. The blue and orange boxes that are in the same columns execute concurrently. So while core1 is calculating disparity from imgBuf1, core0 is doing blob detection on the disparity calculated from imgBuf0, then getting the next images from the camera, and so on. Notice that, when we do blob detection, and decide that the car needs to slow down or speed up, we change the maximum number of disparities. However, by the time we are ready to communicate this decision, the system has already calculated disparity for the next frame using the existing max number of disparities, so we need to invalidate the data in that dispBuf, as it no longer represents the current situation.

In every iteration of our main program loop, we are able to process two new frames. And so with this system in place, we can get 15 – 16 frames per second, almost double of what we measured as the stand-alone time for calculating disparity + blob detection earlier!

Conclusion

With our new concurrent system, we are able to acquire and pre-process images while calculating disparities at the same time. We avoid unnecessary overhead by making use of a two way buffering system. Since we know that the bottleneck of the system is Calculating disparity, we can offload all of the other work to Core0 so it doesn’t have to wait as long for the buffers to be depleted. We have managed to get about 15 – 16 frames, which is good enough for real-time stereo vision, and is approximately 3 times faster than our previous sequential, single core system.

Changing disparity range dynamically

Hello! It’s been a while since the last post! That doesn’t mean we’ve been idle, though. There has been lots of progress on the stereo vision setup in terms of performance and robustness as we gear up for a demonstration in a week’s time. More on that in later posts though, right now I’d just like to talk about a small but important feature.

In a real car, when you see an object in the distance, you slow down as you approach the object, eventually coming to a halt close to it. We wanted to accomplish something similar to this for our autonomous car.

To achieve this, we need to be able to perceive the distance to said object. From the stereo pair of images and the disparity map, it is possible to calculate the absolute distance to an object, but that is computationally expensive, and we wanted to avoid this, and make the most of the necessary computations we have already performed.

As such, the solution we have implemented is thus:

  1. Let’s say the maximum number of disparities is set to 48 (16 * 3). With our camera setup, the closest object we can detect (i.e. brightest/whitest on the disparity map) is roughly 1 meter away.
  2. Let’s say we are 2 meters away from this object. As we get closer to it, the level of white corresponding to this object increases. Once this passes a certain threshold, we know we are close to the object, and can tell the car to slow down a notch.
  3. As the car slows down, we increase the maximum number of disparities to 16 * 4. Now the distance to the closest object we can detect decreases, so we don’t lose sight of it. We can then repeat the same procedure, where passing over a certain threshold level of white in our disparity map causes us to slow down, and increase the number of disparities.
  4. We repeat this process until we get to a maximum maximum number of disparities. Let’s say this is 128 (16 * 8), which gives a range of about 15 – 20 cm. Now when we cross over the whiteness threshold, we are travelling very slowly, and we know that we are about 15 – 20 cm away from an object, so we can tell the car to stop.

The actual numbers can be set depending on the required range, how close you want to stop to the object, and basically just experimenting around with the actual car.

There are some obvious limitations to this approach though:

  1. As we get closer to the object and increase the max number of disparities, the frame rate drops as calculating the disparity map takes longer. However, this coincides with the car slowing down, so we have more time per frame.
  2. Unless you have been ‘tracking’ an object as it gets closer, any object closer than what the initial number of disparities allows will not be detected. For example, if I stick my leg out about 20 cm in front of the car, it will not be detected and the car will crash into me. The disparity map will mostly be black (i.e. error pixels). To overcome this problem, we could perhaps modify the algorithm so that it increases the max number of disparities (thus allowing us to ‘look’ closer) if the number of error pixels is over a threshold, or simply stop the car if that is the case. We could also use a hardware proximity sensor as an input to put on an emergency break.

We have yet to see how this approach works with a relatively fast moving car, but testing it with moving objects in front of a stationary camera indicates promising results.

Hassan

Google Protobuf

Back from holidays !! Did some work on the project over the start of the holidays, so gonna share what I’ve done over the past 2 weeks. First of all, we had a problem of transferring multiple data from the server to the client. This is crucial, because for our real time system, we would want to inform the client about the speed and maybe other things as well. The implementation we’re looking for to solve this problem would be serialization. Based on wikipedia, serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memorybuffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment.

Serialization

http://en.wikipedia.org/wiki/Serialization

In our case, we need serialization because we want to recreate the data in our client, which might be of different platform. For example, in our project we need to transfer data from the Pandaboard (ARM) server to a PC client (x86). I have yet to try whether without serialization the image remains the same. However, based on the result I got from trying a PC(x86) to an iPod Touch 3G (ARMv7), there seems to be some error. My guess would be because of different architecture type having different byte size or its just poor coding.

Having looked through multiple serialization, we have decided on using Google Protocol Buffers. There are other serialization frameworks such as Apache Thrift(used by Facebook), Boost Serialization (part of the popular Boost library). The reason why protocol buffer was chose is because it is the fastest and the cleanest amongst all. Apache Thrift would be good for multi-platform support, but there isn’t enough documentation on how to use it.

Google Protocol Buffer (Protobuf)

You can find the main page here ( http://code.google.com/p/protobuf/ ). It is a bit like XML and JSON. After installing you will need to create a .proto file using the compiler. Just use any text editor and create the file by adding a .proto extension when you save it. The .proto file looks a bit like java. You create a class, and you define its variables. For our project we used something like this.

package tutorial;

option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";

message Packet {
  repeated int32 data = 1 [packed=true];
  optional int32 speed = 2;
  optional int32 size = 3;
}

Not very sure the exact definition of package, but to my understanding its just the class. Message is like a new type used in protobuf. In it you can declare fields depending on what you need. You set the fields to either optional or required. Optional is good especially when you want to reuse the buffer but you want to not send certain things. Repeated means you can store data in that field multiple times, which can be accessed through index. It is similar to arrays. The downside is that multi-dimension arrays are not supported natively, so you will need google around to find out how to do it.

How to add it in your code :

	 //Declare protocol buffer
	tutorial::Packet packetSize;

    //Serializes protocol buffer into array

	int arraysize = packetData.ByteSize();
	// Sets a value for the size field in the protofbuf
	packetSize.set_size(arraysize);
	// Create an array of the size of packetSize
	char* id[packetSize.ByteSize()];
	//Serialize it to the array created
	packetSize.SerializeToArray(id, packetSize.ByteSize());
	//Send it over the network
	bytes = send(clientsock, id, sizeof(id), 0);
	packetSize.Clear();

Installing on Xcode for iOS:

http://www.kotancode.com/2012/10/14/using-google-protocol-buffers-in-objective-c-on-ios-and-the-mac/

This is much more complicated than installing on a PC or Mac. I had to fiddle around a little bit.

Results:

I’ve managed to get it working between computers. I have coded for iOS as well. It seems to be slightly slow because I’ve only managed to borrow an old iPod Touch 3G from a friend. Hoping to get something faster. Although it might not improve performance significantly because I’ve yet to implement concurrency on it, but I’m sure it will work faster than now because of the better network card.

Between PC and iOS :

IMG_20130107_143724

Will update more when I’ve implemented more stuff. Happy New Year Everyone !!

Progress and Plans

We’ve lost a bit of momentum recently, what with having to focus on projects for other modules and waiting for some of the equipment to arrive, so I thought I’d take a moment to collate everything we’ve done so far, and things we plan to do in the future.

Progress:

  1. We have a stereo vision module that is flexible enough to change number of disparities while running (to allow for different speeds the car might be travelling at) and that shouts if there are objects which are too close.
  2. We have a server and a client that can currently send and receive an image integrated with the stereo vision code.

Plans:

  1. We need to extend the server and client so that we can transfer more than just an image.
  2. We need to devise a mechanism for controlling the car. We only just got the microcontroller that we’ll be using for this today, so we’ll get started on this soon.
  3. We need to implement a method for measuring the speed of the car, so we can adapt the stereo vision algorithm accordingly. We are still waiting for some equipment to do this, but it should be pretty interesting to implement.
  4. I’ve recently discovered that image acquisition and rectification is quite expensive. This is one part of the stereo vision process I was ignoring in terms of performance consideration, but we’ll have to do something about this.
  5. We need to design a mount (and potentially a power supply) for the car, so we can hook everything up in a neat package!
  6. We need to create a basic iPhone (or Android, or desktop) application for demonstration purposes.

It seems like we need to do a lot more than we’ve done already, but I think the two things we’ve done (plus all the setup) was quite a big chunk of it all. Also, bearing in mind that we (still!) haven’t received all of the equipment that we need (including the car, or our own Pandaboard..), there isn’t much more that we can do at the moment.

Hassan

Quick way to make sure camera input is correct (overlaying images using OpenCV)

As I mentioned in one of the earlier posts, we need to limit the camera resolution to 320 x 240 to allow streaming to the Pandaboard. This means that the camera needed to recalibrated. I did that, but when using the images obtained at the lower resolution and rectified by the new calibration parameters, I ran into a strange problem: closer objects seemed darker than the ones further away in the disparity map! This is the opposite of what you’d except, and since running the same algorithm on a bunch of test images was producing the correct output, I knew it was something to do with the camera input. So after a suggestion from our supervisor, I wrote a small function that overlays one image on top of the other. Theoretically, objects closer to the camera should have a larger distance between them in the two pictures than objects further away.

The right image is superimposed on the left image.

Sorry about the picture above being quite dark (it was the first image captured by the camera as it turned on, converted to grayscale, and blurred), but you can see that there is a huge difference between the position of the microwave oven in the two pictures, more so than the head (in the bottom right). The microwave is far enough away that it should be roughly in the same place in both the images. This explains why the disparity was inverted on the disparity map, so this problem should be fixed once I correctly (re)calibrate the cameras.

I could not find a lot of resources for writing a function that overlays two images, so here it is:

//Takes in a custom data structure that holds the left and right image.
//Can be easily modified to take in individual images instead.
//Scale determines the weighting of each image
Mat OverlayImages(StereoPair camImages, double scale)
{
    //Create new matrix for storing output
    Mat overlay = Mat(camImages.leftImage.size(), camImages.leftImage.type());

    //Initialise pointers to data in each of the matrices
    uint8_t* l = (uint8_t*)camImages.leftImage.data;
    uint8_t* r = (uint8_t*)camImages.rightImage.data;
    uint8_t* n = (uint8_t*)overlay.data;

    int cn = camImages.leftImage.channels();
    int cols = camImages.leftImage.cols;

    for(int i = 0; i < camImages.leftImage.rows; i++)
    {
        for(int j = 0; j < camImages.leftImage.cols; j += cn)
        {
            Scalar_<uint8_t> lPixel;
            Scalar_<uint8_t> rPixel;

            lPixel.val[0] = l[i*cols*cn + j*cn + 0]; // B
            lPixel.val[1] = l[i*cols*cn + j*cn + 1]; // G
            lPixel.val[2] = l[i*cols*cn + j*cn + 2]; // R

            rPixel.val[0] = r[i*cols*cn + j*cn + 0]; // B
            rPixel.val[1] = r[i*cols*cn + j*cn + 1]; // G
            rPixel.val[2] = r[i*cols*cn + j*cn + 2]; // R

            n[i*cols*cn + j*cn + 0] = ((int)lPixel.val[0] * scale) + ((int)rPixel.val[0] * scale);
            n[i*cols*cn + j*cn + 1] = ((int)lPixel.val[1] * scale) + ((int)rPixel.val[1] * scale);
            n[i*cols*cn + j*cn + 2] = ((int)lPixel.val[2] * scale) + ((int)rPixel.val[2] * scale);
        }
    }
    return overlay;
}

Hassan