Performance and Optimisation of the Stereo Vision algorithm on the Pandaboard

We’ve had a bit of a hiatus over the last few days, what with working on projects for other modules and all, but we’re back, and today, I’m going to talk about performance on the Pandaboard.

So far, all of the development and testing that I’ve been doing has been done on my laptop. We found in previous tests that on my laptop, using OpenCV’s StereoBM function, with images of 640 x 480, we can get roughly 10 frames per second. That’s probably good enough for controlling the car in real-time if we are not going too fast. But of course, on the Pandaboard, the frame rate is considerably slower.

One alleviating factor is that the Pandaboard’s USB bandwidth is limited so we can only stream images of 320 x 240 simultaneously. This gives us an excuse to use a lower resolution, which increases the performance. Below are the results of running the algorithm without any optimisations on the same image 200 times.

Table 1 Results of StereoBM algorithm over 200 runs, SAD window size = 21, number of disparities = 16*5

Min (s)

Max (s)

Average (s)

FPS

0.1890

0.3884

0.1963

5.0942

 

Despite the resolution of the image being half, the FPS is still half that of running the algorithm on the pc.

Optimisations

5 frames per second is not good enough. Even if the FPS was higher, we’d still be trying to optimise things to squeeze as much performance out of the board as we can. I cannot, unfortunately, disclose what optimisations I have implemented so far, as they have not been published yet.

As a starting point, I applied the techniques on a vanilla SAD algorithm, which, from a previous post, took roughly 60 seconds per frame. After said optimisations, the time dropped by a massive 30 seconds!

This was very encouraging, so I implemented the same sort of technique for use with OpenCV’s StereoBM function. To my surprise, the time taken for a frame almost doubled! I looked at the source code for the implementation, and found that it was so well optimised for (I assume) x86 architecture that it was practically unreadable! So I can only conclude that the pre-processing required for the techniques I applied introduce too much overhead, and were resulting in a deterioration of the performance.

Assuming OpenCV won’t be as well optimised for the ARM architecture (and systems with a fewer number of cores), I tested the optimisations on the Pandaboard, and sure enough there was some improvement. The following table shows the results from two functions, one which is capable of dealing with any arbitrary number of disparities (as long as they are multiples of 16), and the other one tuned specifically for 16*5 number of disparities (I did things like loop unrolling to minimise as much overhead as possible).

Table 2 Results of optimised implementations over 200 runs, SAD window size = 21, number of disparities = 16 *5

 

Min(s)

Max(s)

Average(s)

FPS

General

0.1281

0.3069

0.1349

7.4134

Specific

0.1289

0.1911

0.1324

7.5549

 

There is an increase of almost 2.5 frames per second, which is a worthwhile improvement. There is not a huge difference between the specific implementation and the general implementation, however.

We can afford to be specific on the car, though, as once the parameters are set, they will not be changed (unless we allow for variable disparities depending on the scenario the car is driving in). The specific algorithm seems more consistent as well, with a smaller difference between the maximum time/frame and the average.

Bearing in mind that all of these implementations are still sequential (I will experiment with parallelising some parts of them soon), there is hope for yet more improvement. We will also probably calculate the disparity map for only part of the image, since we only really need to know what’s straight ahead of the car.

I am wary of the fact that after we have the disparity map, we will need to do further calculations per frame as well to determine if the map contains an object close enough to warrant action by the car. That is why it is so important to get the actual computation of the disparity map as fast as possible.

Hassan

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s