Thursday, February 11, 2010

SIFT rocks / Vincent's BA is fast

An original image, taken with my iPhone.
SR from 5 images.
SR from 10 images.
SR from 19 images. I had to assume a sharper than optimal blur kernel for all the images, especially the last image, because my machine didn't have enough memory for a wider kernel with the current code.

I switched to SIFT descriptors from the Harris corners, using the VLFeat toolbox. They'll fire on lots of things, not just corners, and have enabled me to throw the test card in the wastebin. Also, SIFT appears to reduce raw homog estimation error by about 75% over Harris corners. Above we have the current system running on photos from the 4th floor CSE rec room.

The current system also uses the bundle adjustment from Vincent's toolbox. Vincent's code reduces my BA step from an hour-long process to something that works in less than a second. (What an inefficient approach I had!) Unfortunately, it appears to return slightly inferior registrations as is; I'm guessing the code is using L2 norm error, where (as I mentioned in a previous post), L1 norm error has worked better for me.

Monday, February 8, 2010

Strong for synthetic, weak for real

Synthetically generated low-res image of Geisel Library.
Reconstruction from 16 synthetic low-res images, using estimated parameters.
Reconstruction from 16 synthetic low-res images, using exact parameters. Note this image is still noticeably worse than the original image (below); I'm not sure how many images are needed for this difference to disappear (I haven't found a theoretical analysis, and I nearly run out of memory with 16 images).
Ground truth image used to create synthetic Geisel images.
Original (low-res) photo of test card. One of 16 similar photos

Real images: test card SR from 16 images.


I implemented bundle adjustment (BA) in the style of Brown and Lowe, ICCV 2003. I chose to roll my own code because 1) Vincent Rabaud's code does more than I want, and in particular the results it returns won't be constrained to homographies, meaning I'd have to somehow project the 3D transforms it returns into the nearest homography, 2) BA for homographies isn't very complicated.

I also switched to punishing geometric misregistration error linearly (L1 norm) instead of quadratically (L2 norm). This is similar to the robust error used by Brown and Lowe, and improved my registrations by perhaps 20% (as measured by the Frobenius distance between true and estimated homographies). L1 errors are more robust to outliers than L2 errors; I also plan to investigate Brown and Lowe's robust L1 error.

I found BA plus L1 errors approximately halve geometric misregistration over pairwise homog estimation plus L2 errors.

On synthetic images of Geisel Library, I've found my code works convincingly, with detailed building structure emerging in the SR image. Unfortunately, the results are not so strong with real images taken with my iPhone. This suggests that the generative model does not precisely reflect what happens when the iPhone takes a picture. I've emailed Lyndsey Pickup, and hopefully she'll have some words of wisdom.

In order to get at least one example of SR working on real images, I created a "test card", which I designed to be very easy to geometrically register. I took photos of the test card with my iPhone, and (I think) succeeded in producing a higher-quality SR image. I made it late at night, so I foolishly decided to write a narrative instead of writing random characters in the style of a vision test card (this means a viewer can use language knowledge to predict characters instead of relying entirely on the image quality, and so introduces an irritating variable). As I understand Zisserman's code, the spirals should help the geometric alignment.

Monday, February 1, 2010

The other part of the solution

The bottom photo was created using 5 synthetic images with uniform and trivial photometric parameters, and the top photo is the SR image that my code currently produces.

Using the code from http://www.robots.ox.ac.uk/~vgg/hzbook/code/, I now have the ability to automatically compute a homography between two images, given only the images as input. The pipeline uses Harris corners, followed by RANSAC used to fit a homography.

I created a synthetic dataset with uniform lighting parameters (so the photometric parameters are trivial and fixed), and ran Pickup's code using the estimated homographies. The figures attached to this post are of a low-res image and a corresponding super-resolved image. The SR image was created using 5 such low-res images. I believe the gray border in the bottom and right of the SR image to be an artifact of the final viewing angle I choose, and trivially fixed.

Note that the high-res image is of significantly higher quality than the low-res image. It's not clear to me whether this is mainly due to the additional information from the other low-res images, or mainly due to Pickup's code knowing the exact blur kernel used in the synthetic data generation. One way to know for sure is to try the code on real images, but to do that, I'll have to implement photometric registration code.

Note the extreme black and white aliasing on some of the columns in the image. I can reduce the aliasing by increasing the weight of the Huber prior, at the risk of over-smoothing the image.

Because of the extreme sensitivity of the SR process to accurate image registration, improving the registration will probably be more effective than fine-tuning the prior. To do this, I plan to use bundle adjustment, which simultaneously registers several image planes. I will likely use the code at http://vision.ucsd.edu/~vrabaud/toolbox/doc/, which appears to be well-commented and likely high-quality. But first, I need to understand how bundle adjustment works, which will require some background reading.