How many frames super resolution needs?
December 27, 2010
As you already know, super resolution is a method to upsample video which for each frame
uses information from neighbour frames. I was asked many times: how many frames does it use?
Well, our SR implementation is "streaming": frame in - frame out. Internally for each new frame
it uses its own result for previous frame, i.e. to upsample frame N it uses upsampled frame N-1
for which frame N-2 was used, for which frame N-3 was used and so on. In this sense all previous
frames are used to make the current one. However video is changing from frame to frame and as
new information gets accumulated old information gets forgotten. And when processing frame 100
there is hardly anything left from frame 1. So how many frames are really used?
To answer this question we took some files from our
video resize shootout
and upsized them 2 times first the whole files, then starting from frame 20, then starting from frame 40 etc.
Then we measured PSNR for each frame and looked at the charts.
When SR is just starting, there is no history for the first frame, so it's upscaled using an image interpolation
method. The second frame is processed with SR but there is only one frame of accumulated information.
For frame 3 there are two frames of history. When we start processing from the middle, we can compare
results with processing from the beginning. This is where we can see the difference between processing with
short history and with longer history, and see how quickly one catches up with the other (it's obvious that
there shouldn't be any difference between 10000 and 10020 frames of history, so the difference must
decrease over time).
Here are some of the charts we got for karate.avi sequence:
And here are some for avatar1.avi file:
In karate.avi video content changes more quickly, so effective history (number of frames that can be used for SR)
is shorter, hence the difference between processing from beginning and from the middle decreases faster.
Here's the average PSNR difference:
The peak at frame 1 shows PSNR difference between video processed with super resolution with some
history and single-frame upsampling. As history of accumulated frames gets longer, the average PSNR difference
decreases. After 5 frames are processed this difference becomes lower than 0.1 dB. And although for slowly
changing videos like avatar1.avi it doesn't fall to zero for a long time, in practice 0.1 dB is indistinguishable for
human eye.
So our conclusion will be: number of frames that super resolution needs to perform best depends strongly on
video content (the slower video changes, the more frames can be used), but 5-6 frames is enough for most
cases.
tags: super_resolution
|