Thursday, February 11, 2010

SIFT rocks / Vincent's BA is fast

An original image, taken with my iPhone.
SR from 5 images.
SR from 10 images.
SR from 19 images. I had to assume a sharper than optimal blur kernel for all the images, especially the last image, because my machine didn't have enough memory for a wider kernel with the current code.

I switched to SIFT descriptors from the Harris corners, using the VLFeat toolbox. They'll fire on lots of things, not just corners, and have enabled me to throw the test card in the wastebin. Also, SIFT appears to reduce raw homog estimation error by about 75% over Harris corners. Above we have the current system running on photos from the 4th floor CSE rec room.

The current system also uses the bundle adjustment from Vincent's toolbox. Vincent's code reduces my BA step from an hour-long process to something that works in less than a second. (What an inefficient approach I had!) Unfortunately, it appears to return slightly inferior registrations as is; I'm guessing the code is using L2 norm error, where (as I mentioned in a previous post), L1 norm error has worked better for me.

Monday, February 8, 2010

Strong for synthetic, weak for real

Synthetically generated low-res image of Geisel Library.
Reconstruction from 16 synthetic low-res images, using estimated parameters.
Reconstruction from 16 synthetic low-res images, using exact parameters. Note this image is still noticeably worse than the original image (below); I'm not sure how many images are needed for this difference to disappear (I haven't found a theoretical analysis, and I nearly run out of memory with 16 images).
Ground truth image used to create synthetic Geisel images.
Original (low-res) photo of test card. One of 16 similar photos

Real images: test card SR from 16 images.


I implemented bundle adjustment (BA) in the style of Brown and Lowe, ICCV 2003. I chose to roll my own code because 1) Vincent Rabaud's code does more than I want, and in particular the results it returns won't be constrained to homographies, meaning I'd have to somehow project the 3D transforms it returns into the nearest homography, 2) BA for homographies isn't very complicated.

I also switched to punishing geometric misregistration error linearly (L1 norm) instead of quadratically (L2 norm). This is similar to the robust error used by Brown and Lowe, and improved my registrations by perhaps 20% (as measured by the Frobenius distance between true and estimated homographies). L1 errors are more robust to outliers than L2 errors; I also plan to investigate Brown and Lowe's robust L1 error.

I found BA plus L1 errors approximately halve geometric misregistration over pairwise homog estimation plus L2 errors.

On synthetic images of Geisel Library, I've found my code works convincingly, with detailed building structure emerging in the SR image. Unfortunately, the results are not so strong with real images taken with my iPhone. This suggests that the generative model does not precisely reflect what happens when the iPhone takes a picture. I've emailed Lyndsey Pickup, and hopefully she'll have some words of wisdom.

In order to get at least one example of SR working on real images, I created a "test card", which I designed to be very easy to geometrically register. I took photos of the test card with my iPhone, and (I think) succeeded in producing a higher-quality SR image. I made it late at night, so I foolishly decided to write a narrative instead of writing random characters in the style of a vision test card (this means a viewer can use language knowledge to predict characters instead of relying entirely on the image quality, and so introduces an irritating variable). As I understand Zisserman's code, the spirals should help the geometric alignment.

Monday, February 1, 2010

The other part of the solution

The bottom photo was created using 5 synthetic images with uniform and trivial photometric parameters, and the top photo is the SR image that my code currently produces.

Using the code from http://www.robots.ox.ac.uk/~vgg/hzbook/code/, I now have the ability to automatically compute a homography between two images, given only the images as input. The pipeline uses Harris corners, followed by RANSAC used to fit a homography.

I created a synthetic dataset with uniform lighting parameters (so the photometric parameters are trivial and fixed), and ran Pickup's code using the estimated homographies. The figures attached to this post are of a low-res image and a corresponding super-resolved image. The SR image was created using 5 such low-res images. I believe the gray border in the bottom and right of the SR image to be an artifact of the final viewing angle I choose, and trivially fixed.

Note that the high-res image is of significantly higher quality than the low-res image. It's not clear to me whether this is mainly due to the additional information from the other low-res images, or mainly due to Pickup's code knowing the exact blur kernel used in the synthetic data generation. One way to know for sure is to try the code on real images, but to do that, I'll have to implement photometric registration code.

Note the extreme black and white aliasing on some of the columns in the image. I can reduce the aliasing by increasing the weight of the Huber prior, at the risk of over-smoothing the image.

Because of the extreme sensitivity of the SR process to accurate image registration, improving the registration will probably be more effective than fine-tuning the prior. To do this, I plan to use bundle adjustment, which simultaneously registers several image planes. I will likely use the code at http://vision.ucsd.edu/~vrabaud/toolbox/doc/, which appears to be well-commented and likely high-quality. But first, I need to understand how bundle adjustment works, which will require some background reading.

Tuesday, January 19, 2010

A partial solution



Results of Pickup's SR code applied to synthetic data. The images in the top figure are the low-quality images, and the images in the bottom figure show the outputs of various SR techniques, along with the original high quality image. "Huber" refers to a prior over high-resolution images, making the bottom-left image a result of a MAP technique, in contrast to the ML technique demonstrated in the upper-right.

I've read most of Pickup's thesis and played with the code on her site. The above figures were generated by applying her super resolution code to synthetic data. The code she supplies, which is written in Matlab with some mex C files, is able to perform super resolution given a priori knowledge of all the image generation parameters; these are the parameters that give geometric and photometric registrations, as well as the blur kernel. Unfortunately, this means that before her code can be used, all the parameters have to be independently inferred and fixed. This two-step process appears to produce results inferior to those obtained by simultaneous approaches, though the results may be good enough for this project.

At this point, there are at least 2 possible options for the future of this project. 1) I could write or find code that learns the model parameters, and feed the parameters to Pickup's SR code, or 2) I could write the code for a simultaneous algorithm from scratch. The second approach would be more fun, but has a lower chance or working, so I'll likely go with the first option.

Oscar Beijbom, a student of David Kriegman, is also interested in SR, so we may end up developing the code together.

Monday, January 11, 2010

Lyndsey Pickup's thesis

The reading I've done suggests that SR is quite a bit more sophisticated than it was when Capel wrote his thesis. In particular, Bishop and others have adopted fully probabilistic techniques in which image registration is accounted for in the generative models. This allows registration to be learned concurrently with the super resolution image, and even allows the registration parameters to be marginalized over, so that they never have to be explicitly computed. Lindsey Pickup's thesis seems to be a great survey of these recent techniques, and I am in the process of reading it.

Thursday, January 7, 2010

Introduction

(Click on the image to see it fully)

This is a research blog for CSE 190A. The goal of this quarter-long project is to arrive at an understanding and implementation of multi-view super-resolution that could be of use to Serge Belongie's research group. This project is a follow-up to previous 190A work.

Super-resolution (SR) describes techniques for taking one or more low-quality images of a scene and producing a high-quality image of the same scene. There are at least two general approaches. In the hallucination approach, prior information about what the high-quality image is likely to look like is used to construct a high-quality approximation from a single low-quality image. In the multi-view approach, minimal prior information is assumed, and instead complementary information from multiple photos of the same scene is used to construct a single high-quality photo. Multi-view SR appears to be a well developed field, as I was able to find a number of commercial implementations of multi-view SR, including an iPhone app. From here on, SR will always refer to multi-view SR. My proposal has more details.

The image was taken from S. Farsiu, M. Elad, and P. Milanfar, “A Practical Approach to Super-Resolution”, Invited paper, Proc. of the SPIE Conf. on Visual Communications and Image Processing, San Jose, January 2006.