blog tags:

About:

I'm Dmitry Popov,
lead developer and director of Infognition.

Known in the interwebs as Dee Mon since 1997. You could see me as thedeemon on reddit or LiveJournal.

RSS
Articles Technology Blog News Company
Blog
How many frames super resolution needs?
December 27, 2010

As you already know, super resolution is a method to upsample video which for each frame uses information from neighbour frames. I was asked many times: how many frames does it use? Well, our SR implementation is "streaming": frame in - frame out. Internally for each new frame it uses its own result for previous frame, i.e. to upsample frame N it uses upsampled frame N-1 for which frame N-2 was used, for which frame N-3 was used and so on. In this sense all previous frames are used to make the current one. However video is changing from frame to frame and as new information gets accumulated old information gets forgotten. And when processing frame 100 there is hardly anything left from frame 1. So how many frames are really used?

To answer this question we took some files from our video resize shootout and upsized them 2 times first the whole files, then starting from frame 20, then starting from frame 40 etc. Then we measured PSNR for each frame and looked at the charts.

When SR is just starting, there is no history for the first frame, so it's upscaled using an image interpolation method. The second frame is processed with SR but there is only one frame of accumulated information. For frame 3 there are two frames of history. When we start processing from the middle, we can compare results with processing from the beginning. This is where we can see the difference between processing with short history and with longer history, and see how quickly one catches up with the other (it's obvious that there shouldn't be any difference between 10000 and 10020 frames of history, so the difference must decrease over time).

Here are some of the charts we got for karate.avi sequence:

And here are some for avatar1.avi file:

In karate.avi video content changes more quickly, so effective history (number of frames that can be used for SR) is shorter, hence the difference between processing from beginning and from the middle decreases faster.

Here's the average PSNR difference:

The peak at frame 1 shows PSNR difference between video processed with super resolution with some history and single-frame upsampling. As history of accumulated frames gets longer, the average PSNR difference decreases. After 5 frames are processed this difference becomes lower than 0.1 dB. And although for slowly changing videos like avatar1.avi it doesn't fall to zero for a long time, in practice 0.1 dB is indistinguishable for human eye.

So our conclusion will be: number of frames that super resolution needs to perform best depends strongly on video content (the slower video changes, the more frames can be used), but 5-6 frames is enough for most cases.