Number Crunching
and Multi-threading and Speed, (oh my!)
PD Pro's Motion prediction uses a 16x16 grid to
compare blocks between video frames.
If every pixel were compared, our default
settings on a high def frame might look something
like this:
16*16*44*44*1280*762*3 =
1,450,212,065,280
or roughly 1.4 trillion calculations per frame.
Since that would be prohibitive, we made some
optimizations.
A first estimation pass is made using a smaller
data set. This includes working on a smaller grid
inside the image. The result is then interpolated
to a full size morph map.
This is getting more realistic, but still a bit of
a crunch.
To smooth things out, we employed SSE2 in our
inner loops which is able to paralellize 16
instructions at once. We also added threading into
the mix allowing the calculations to be performed
on as many processors as are available.
In a number of cases, our first pass can be
perfectly acceptable. in cases where more
precision is needed, a refinement pass was added
that calculates every pixel. However, instead of
calculating every possible pixel combination, this
pass is based on the first pass, so we already
have a good idea where to look for matching
blocks.
This allows for a much smaller search area. This
makes for a fairly balanced algorithm. Changing
the size of the grid and search area doesn't have
a huge impact on the speed or quality of the
refinement pass.