Synthetically generated low-res image of Geisel Library.
Reconstruction from 16 synthetic low-res images, using estimated parameters.
Reconstruction from 16 synthetic low-res images, using exact parameters. Note this image is still noticeably worse than the original image (below); I'm not sure how many images are needed for this difference to disappear (I haven't found a theoretical analysis, and I nearly run out of memory with 16 images).
Ground truth image used to create synthetic Geisel images.
Original (low-res) photo of test card. One of 16 similar photos
Real images: test card SR from 16 images.
I implemented bundle adjustment (BA) in the style of Brown and Lowe, ICCV 2003. I chose to roll my own code because 1) Vincent Rabaud's code does more than I want, and in particular the results it returns won't be constrained to homographies, meaning I'd have to somehow project the 3D transforms it returns into the nearest homography, 2) BA for homographies isn't very complicated.
I also switched to punishing geometric misregistration error linearly (L1 norm) instead of quadratically (L2 norm). This is similar to the robust error used by Brown and Lowe, and improved my registrations by perhaps 20% (as measured by the Frobenius distance between true and estimated homographies). L1 errors are more robust to outliers than L2 errors; I also plan to investigate Brown and Lowe's robust L1 error.
I found BA plus L1 errors approximately halve geometric misregistration over pairwise homog estimation plus L2 errors.
On synthetic images of Geisel Library, I've found my code works convincingly, with detailed building structure emerging in the SR image. Unfortunately, the results are not so strong with real images taken with my iPhone. This suggests that the generative model does not precisely reflect what happens when the iPhone takes a picture. I've emailed Lyndsey Pickup, and hopefully she'll have some words of wisdom.
In order to get at least one example of SR working on real images, I created a "test card", which I designed to be very easy to geometrically register. I took photos of the test card with my iPhone, and (I think) succeeded in producing a higher-quality SR image. I made it late at night, so I foolishly decided to write a narrative instead of writing random characters in the style of a vision test card (this means a viewer can use language knowledge to predict characters instead of relying entirely on the image quality, and so introduces an irritating variable). As I understand Zisserman's code, the spirals should help the geometric alignment.