Quick way to make sure camera input is correct (overlaying images using OpenCV)

As I mentioned in one of the earlier posts, we need to limit the camera resolution to 320 x 240 to allow streaming to the Pandaboard. This means that the camera needed to recalibrated. I did that, but when using the images obtained at the lower resolution and rectified by the new calibration parameters, I ran into a strange problem: closer objects seemed darker than the ones further away in the disparity map! This is the opposite of what you’d except, and since running the same algorithm on a bunch of test images was producing the correct output, I knew it was something to do with the camera input. So after a suggestion from our supervisor, I wrote a small function that overlays one image on top of the other. Theoretically, objects closer to the camera should have a larger distance between them in the two pictures than objects further away.

The right image is superimposed on the left image.

Sorry about the picture above being quite dark (it was the first image captured by the camera as it turned on, converted to grayscale, and blurred), but you can see that there is a huge difference between the position of the microwave oven in the two pictures, more so than the head (in the bottom right). The microwave is far enough away that it should be roughly in the same place in both the images. This explains why the disparity was inverted on the disparity map, so this problem should be fixed once I correctly (re)calibrate the cameras.

I could not find a lot of resources for writing a function that overlays two images, so here it is:

//Takes in a custom data structure that holds the left and right image.
//Can be easily modified to take in individual images instead.
//Scale determines the weighting of each image
Mat OverlayImages(StereoPair camImages, double scale)
    //Create new matrix for storing output
    Mat overlay = Mat(camImages.leftImage.size(), camImages.leftImage.type());

    //Initialise pointers to data in each of the matrices
    uint8_t* l = (uint8_t*)camImages.leftImage.data;
    uint8_t* r = (uint8_t*)camImages.rightImage.data;
    uint8_t* n = (uint8_t*)overlay.data;

    int cn = camImages.leftImage.channels();
    int cols = camImages.leftImage.cols;

    for(int i = 0; i < camImages.leftImage.rows; i++)
        for(int j = 0; j < camImages.leftImage.cols; j += cn)
            Scalar_<uint8_t> lPixel;
            Scalar_<uint8_t> rPixel;

            lPixel.val[0] = l[i*cols*cn + j*cn + 0]; // B
            lPixel.val[1] = l[i*cols*cn + j*cn + 1]; // G
            lPixel.val[2] = l[i*cols*cn + j*cn + 2]; // R

            rPixel.val[0] = r[i*cols*cn + j*cn + 0]; // B
            rPixel.val[1] = r[i*cols*cn + j*cn + 1]; // G
            rPixel.val[2] = r[i*cols*cn + j*cn + 2]; // R

            n[i*cols*cn + j*cn + 0] = ((int)lPixel.val[0] * scale) + ((int)rPixel.val[0] * scale);
            n[i*cols*cn + j*cn + 1] = ((int)lPixel.val[1] * scale) + ((int)rPixel.val[1] * scale);
            n[i*cols*cn + j*cn + 2] = ((int)lPixel.val[2] * scale) + ((int)rPixel.val[2] * scale);
    return overlay;


Detecting objects in our disparity map

Now that we are fairly happy with our disparity map, we need a way to find objects in it, so that we can calculate the distance to them and decide whether the car needs to stop or not.

There are several methods that we came across for doing this, but the one we’ve decided on is segmenting the image via blob detection.

I started implementing this from scratch using contours, where you detect the contours in an image, and that essentially gives you a closed region of the image. You can then do further analysis on the contours, and discard those which are below a certain area, calculate the mean intensity of the pixels within a contour and so on.

Figure 1 Result of extracting contours, and filling them in with a random colour.

That’s how far I had got with my implementation when I discovered the cvBlobsLib library. It’s a complete library for blob detecting that integrates with OpenCV (note that OpenCV has a SimpleBlobDetector class but that is quite limited at the moment). cvBlobsLib basically implements all of the features that we might require, and probably does them faster than I could implemented, so why reinvent the wheel, right?

Installing cvBlobsLib on Linux

First, you must download appropriate archive for Linux from here. Extract the contents in a directory on your desktop, then follow the instructions in the readme file.

Then, in your eclipse project, under C/C++ Build -> Settings -> GCC C++ Linker -> Libraries, add blob in Libraries(-l), and under GCC C++ Compiler -> Includes, add /usr/local/include/cvblobs. And finally, in the working .cpp file, add an include directive #include <cvblobs/blob.h> (or #include “blob.h” if you stored the header files locally).

Using cvBlobsLib

cvBlobsLib only works with the C style IplImage object instead of the Mat object in OpenCV, but converting between the two is not is not that big an issue. Plus you can only change the header file, and then not have to copy all of the data from one format to the other, so there is no real performance impact.

//bm is our disparity map in a Mat
IplImage *dispIpl = new IplImage(bm);	//create an IplImage from Mat

//Declare variables
CBlobResult blobs;
CBlob *currentBlob;
int minArea = 1;

blobs = CBlobResult(dispIpl, NULL,0);  //get all blobs in the disparity map
blobs.Filter( blobs, B_EXCLUDE, CBlobGetArea(), B_LESS, minArea ); //filter blobs by area and remove all less than minArea

//Display blobs
IplImage *displayedImage = cvCreateImage(Size(640,480),8,3); //create image for outputting blobs
for (int i = 0; i &lt; blobs.GetNumBlobs(); i++ )
	currentBlob = blobs.GetBlob(i);
	Scalar color( rand()&amp;255, rand()&amp;255, rand()&amp;255 );
	currentBlob-&gt;FillBlob( displayedImage, color);
Mat displayImage = displayedImage; //Convert to Mat for use in imshow()
imshow("Blobs", displayImage);

Figure 2 Disparity map

Figure 3 Result of blob detection with minArea set to 1

We can also do some noise filtering by excluding blobs that are below a certain size.

Figure 4 Result of blob detection with minArea set to 50. Note that there is a lot less noise.

Another big problem we can see immediately is that the person in the foreground and the car in the background are detected as one region. This is because the edge of the person is not closed, and so if you were to draw a contour, it would go around the edges of the person and the car, like so:

Figure 5 Contours in disparity map. Note that the person and the object to the right share are surrounded by the same contour.

We’ve dealt with this problem using morphological filters.

Morphological filtering

“Morphological filtering is a theory developed in the 1960s for the analysis and processing of discrete images. It defines a series of operators which transform an image by probing it with a predefined shape element. The way this shape element intersects the neighbourhood of a pixel determines the result of the operation” [1].

OpenCV implements Erosion and dilation filters as simple functions. For this problem we need to use the erode filter.

erode(src, dst, Mat());  //default kernel is of size 3 x3

//Optionally, select kernel size
cv::Mat element(7,7,CV_8U,cv::Scalar(1));
erode(src, dst, element);

Figure 6 Filter with a 2×2 kernel. It does the job! But we need to experiment with more images to set a final value for the kernel size.

Figure 7 Erosion with kernel size 3×3 (left) and 7×7 (right) for illustration purposes. 7×7 is clearly very destructive.


We’ve isolated our objects of interest! Now all that remains to be done is go over the blobs, find their average intensity, and calculate the distance!

Sources: [1], cvBlobsLib


PandaBoard Streaming

In order to let us see what the car is doing, we needed a way to stream data between the car and a server. Having googled around, so far, two methods fit the bill.

First – C++ server → Websocket → HTML5 Client

This seems to be the best solution (if working) for the project as the clients can have the comfort of their HTML5 – supported browser ( while using Facebook at the same time) without any extra installation. Decided to follow the method implemented here:


Streaming from a fixed video works well. There wasn’t any frame drop or lag, and it works for mp4 and ogg formats. In order to stream captured videos, I’ve used OpenCV to capture frames, then writing every frame to video container using the built-in OpenCV VideoWriter and writeframe(). Ogg was used as the extension in this case as OpenCV doesn’t have any codec to write MP4 files. As the writing happens, it is streamed through the server mentioned above.

At the receiving end, results doesn’t look good at all. There is a massive lag and delay in the video received. This is expected as there are overhead from file I/Os, encoding and network delays. After some googling, there doesn’t seem to be a better solution in solving this problem at the moment as OpenCV doesn’t allow writing frames to memory instead of files. On the bright side, the recording from the webcam is playable and seems to be real time. We might incorporate this method into the project as part of recording the journey of the car.


  • Video stream is in colour
  • Can be accessed through a browser
  • Supports many devices
  • Streaming is recorded


  • Streaming is sluggish, massive delay of 30 seconds

Second Method – C++ Server → Websocket → C++ Client

Anyway, moving on, instead of streaming encoded data, I’ve decided to try streaming OpenCV data instead. The downside of this method is that the client end has to have OpenCV installed instead of using the more available browsers. Managed to stream data using the method from below :



Pros –

  • Faster speed
  • Much lesser delay (about 5 seconds only)
  • Doesn’t require a server to be setup.

Cons –

  • At the moment, image viewed is in gray-scale. (Working on color streaming)
  • Presence of delay defeats the objective of real-time

Verdict –

At the moment, the second method is better than the first method in displaying video  obtained from OpenCV. More options to be explored, hopefully its better than these 2. Results to be uploaded soonish in the future.

Rectifying images from stereo cameras

I mentioned in the earlier post about calibrating stereo cameras that the output that process is a bunch of matrices. This post is going to describe what the matrices are and how to use them to correct the images.

Description of files
The calibration process produces these files:

D1.xml D2.xml
M1.xml M2.xml
mx1.xml mx2.xml
my1.xml my2.xml
P1.xml P2.xml
R1.xml R2.xml

The files with the 1 are for camera 1, where the files with the 2 are for camera 2. The files m*.xml are the distortion models of the individual cameras. These would be used if you wanted to rectify an individual stream independently.
The D* are the distortion matrices, M* are the camera matrices, P* are the projection matrices and R* are the rotation matrices. The book has a lot more information about these, including the maths behind them and how they are computed behind the scenes.

While using the files turned out to be really simple, it took me quite a long time of rooting around in the documentation to work out what to do.

Mat left, right; //Create matrices for storing input images

//Create transformation and rectification maps
Mat cam1map1, cam1map2;
Mat cam2map1, cam2map2;

initUndistortRectifyMap(M1, D1, R1, P1, Size(640,480) , CV_16SC2, cam1map1, cam1map2);
initUndistortRectifyMap(M2, D2, R2, P2, Size(640,480) , CV_16SC2, cam2map1, cam2map2);

Mat leftStereoUndistorted, rightStereoUndistorted; //Create matrices for storing rectified images

/*Acquire images*/

//Rectify and undistort images
remap(left, leftStereoUndistorted, cam1map1, cam1map2, INTER_LINEAR);
remap(right, rightStereoUndistorted, cam2map1, cam2map2, INTER_LINEAR);

//Show rectified and undistorted images
imshow("LeftUndistorted", left); imshow("RightUndistored", right);

That’s it! You first use the initUndistortRectifyMap() function with the appropriate parameters obtained from calibration to generate a joint undistortion and rectification matrix in the form of maps for remap().

One interesting point to make (from the documentation) is that the resulting camera from this process is oriented differently in the coordinate space, according to R. This helps to align the images so that the epipolar lines on both images become horizontal and have the same y-coordinate.

For more information, have a look at the OpenCV documentation.


Installing OpenCV 2.4.3 on Ubuntu

As mentioned previously, this is applicable to the pandaboard (with its ARM process), the Virtual Machine, and any installation on an X86 system.

This was quite hard to figure out as there are a lot of dependencies that need to be installed, and the documentation is quite poor and not up to date, but all credit to Chee who spent a lot of time researching and coming up with a way.

Basically, someone has written a script which you can find here, and execute from a terminal, and it sorts everything out. The problem is that it installs version 2.4.2, and we wanted to install 2.4.3, so Chee modified the script accordingly.

It is given below. You can paste it in a text editor, save it as opencv2_4_3.sh and execute from a terminal with the following command

$ sh opencv2_4_3.sh

And everything will magically sort out itself. A word of caution though, I had an issue when executing the script because the line endings were of the wrong type. If that happens, simply change the type of line-endings used for saving via the text-editor, and you should be good to go (Thanks, Chee!).

arch=$(uname -m)
if [ "$arch" == "i686" -o "$arch" == "i386" -o "$arch" == "i486" -o "$arch" == "i586" ]; then
echo "Installing OpenCV 2.4.3"
mkdir OpenCV
cd OpenCV
echo "Removing any pre-installed ffmpeg and x264"
sudo apt-get remove ffmpeg x264 libx264-dev
echo "Installing Dependenices"
sudo apt-get install libopencv-dev
sudo apt-get install build-essential checkinstall cmake pkg-config yasm
sudo apt-get install libtiff4-dev libjpeg-dev libjasper-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev libv4l-dev
sudo apt-get install python-dev python-numpy
sudo apt-get install libtbb-dev
sudo apt-get install libqt4-dev libgtk2.0-dev
echo "Downloading x264"
wget ftp://ftp.videolan.org/pub/videolan/x264/snapshots/x264-snapshot-20121107-2245-stable.tar.bz2
tar -xvf x264-snapshot-20121107-2245-stable.tar.bz2
cd x264-snapshot-20121107-2245-stable/
echo "Installing x264"
if [ $flag -eq 1 ]; then
./configure --enable-static
./configure --enable-shared --enable-pic
sudo make install
cd ..
echo "Downloading ffmpeg"
wget http://ffmpeg.org/releases/ffmpeg-0.11.2.tar.bz2
echo "Installing ffmpeg"
tar -xvf ffmpeg-0.11.2.tar.bz2
cd ffmpeg-0.11.2/
if [ $flag -eq 1 ]; then
./configure --enable-gpl --enable-libfaac --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid --enable-nonfree --enable-postproc --enable-version3 --enable-x11grab
./configure --enable-gpl --enable-libfaac --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid --enable-nonfree --enable-postproc --enable-version3 --enable-x11grab --enable-shared
sudo make install
cd ..
echo "Downloading v4l"
wget http://www.linuxtv.org/downloads/v4l-utils/v4l-utils-0.8.9.tar.bz2
echo "Installing v4l"
tar -xvf v4l-utils-0.8.9.tar.bz2
cd v4l-utils-0.8.9/
sudo make install
cd ..
echo "Downloading OpenCV 2.4.3"
wget -O OpenCV-2.4.3.tar.bz2 http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/2.4.3/OpenCV-2.4.3.tar.bz2/download
echo "Installing OpenCV 2.4.3"
tar -xvf OpenCV-2.4.3.tar.bz2
cd OpenCV-2.4.3
mkdir build
cd build
sudo make install
sudo echo "/usr/local/lib" >> /etc/ld.so.conf
sudo ldconfig
echo "OpenCV 2.4.3 ready to be used"

Chee and Hassan

Receiving images from the stereo camera

This post is slightly out of sequence as the first post, but I thought I’d start documenting our efforts towards the autonomous car as soon as possible and avoid a massive backlog of things that need to be reported later on.

So, we are focusing on stereo vision at the moment. We have done some research on what exactly is Stereo Vision and the theory behind calculating disparity for a given pair of images. We have also managed to successfully install Ubuntu on the Pandaboard, which will most likely be our final development environment, but we are investigating putting Android on it, to see if we gain any performance advantages (posts on these topics will be published soon).

For the past few hours, I have been focusing my attention on getting two streams of images from our stereo vision camera. The camera we are currently using is the Minoru 3D Webcam.

The camera is rather cute..

The camera has a single USB connection, but presents itself as two separate cameras to the operating system (much like if you had two individual cameras connected via a USB hub).

We are going to be using OpenCV for all (or most) of our image processing needs, so that is what I have turned to for reading the input from the camera(s).

There was a surprisingly small amount of code required to get both left and right streams to show side by side on the screen. I am not sure about the internals of the camera, and whether both the images are transferred synchronously over the USB port, but I experimented with accessing the camera streams and outputting images both sequentially and in parallel (using the Concurrency Runtime introduced in C++11).

(I’ll update this post (and add comments to the code) as soon as I work out how to post code in a nice way)

Mat left, right;
///Reading from cameras sequentially
CvCapture* capture1 = cvCaptureFromCAM(1);
CvCapture* capture2 = cvCaptureFromCAM(2);
if(capture1 && capture2)
        left = cvQueryFrame(capture1);
        imshow("left", left);

        imshow("right", right);

        int c = waitKey(5);
        if(c == 'c')
} else
    printf("Failure in capture\n");

///Reading from cameras in parallel
        CvCapture* capture1 = cvCaptureFromCAM(1);
            left = cvQueryFrame(capture1);
                imshow("left", left);
            int c = waitKey(5);
            if(c=='c') break;
        CvCapture* capture2 = cvCaptureFromCAM(2);
                imshow("right", right);
            int c = waitKey(5);

Getting the parallel implementation to work took some tinkering because one of the images on the camera would stall occasionally. I thought this was because of the OS putting one of the threads to sleep randomly for a short time, but putting the while(true) statement inside each of the parallel statements, instead of having a single while(true) encompassing both the parallel statements fixed it. I guess having a never ending loop enveloping the parallel statement was causing the system to continually create and destroy threads (and incurring a lot of overhead), whereas nesting the loop inside the parallel statement ensures that two new threads are created only once, and all subsequent work is carried out by those threads.

In the parallel implementation, the camera streams are running on two separate cores. I didn’t do any extensive testing, but the sequential implementation showed some lag between the left and right images, where one would update slightly before the other. The parallel implementation seemed synchronous and generally a bit less jittery.

The two output video streams.

As you can see, the two images are displaced from each other both horizontally and vertically. Horizontal displacement is good, it’s because of the cameras being physically separated from each other horizontally, but I’m not sure where the vertical displacement is coming from, as the two cameras seem on a level plane to me. This is something that I will be discussing tomorrow with people that are cleverer than I. Paying close attention to this, however, I notice that the original images exhibit a similar problem, which is then taken care of via calibration of the cameras. This is something that I need to understand better before taking the next step of calibrating the camera and finally calculating disparities in real-time.


Update: It has just been pointed out to me that the right camera picture is warmer than the left camera picture, which looks more washed out. This is something that will need to be taken care.