Unraveling Motion: A Deep Dive into Optical Flow in Computer Vision

Motion is an integral part of our lives, whether it’s noticing the movement of vehicles on the road or tracking a soccer ball in a game. Our ability to perceive, understand, and predict motion enables us to interact effectively with the dynamic world around us. In the digital world, mimicking our natural ability to perceive motion requires the analysis of movement within video frames, a technique referred to as optical flow.

In this blog, we’ll explore optical flow, its principles, challenges, and methods.

What is Optical Flow?

At its core, optical flow refers to the apparent motion of objects, surfaces, or edges in a visual scene, as observed through a sequence of images over time. A video can be thought of as a collection of frames captured at successive moments. In this context, the image data can be represented as a function of space $(x, y)$ and time $t$ :

video frames — A video is made up of a sequence of individual frames

Optical flow involves estimating how pixel intensities in these images move from one frame to the next. This estimation helps identify the motion patterns in a scene, which is crucial for tasks such as object tracking, scene reconstruction, and robotic navigation.

The Aperture Problem

Before diving into the methods of calculating optical flow, let’s address a fundamental challenge: the aperture problem.

The aperture problem refers to a limitation in accurately estimating the motion of objects in an image when only a small local region of the image is considered

Imagine watching a straight edge move from left to right through a small circular window. You know the edge is moving to the right, but when you look through the window, the movement suddenly seems different. Why does this happen?

Illustration of the aperture problem: A line moves from left to right, but when viewed through a small window, the perceived motion direction differs.

It’s because movement along the edge doesn’t change how bright the pixels appear—it looks the same as if the edge wasn’t moving at all. Only motion perpendicular to the edge creates visible changes in brightness that we can detect.

This limitation, known as the aperture problem, makes estimating motion in optical flow tricky. It’s a key reason why understanding motion isn’t always straightforward in computer vision.

Motion along the edge direction is not detectable

This phenomenon is also observed in the barberpole illusion.

How Do We Estimate Motion?

To compute the motion of pixels from one frame to the next, such as from $I(x, y, t)$ to $I(x, y, t + 1)$ , we rely on certain key assumptions:

Brightness Constancy

This principle assumes that the intensity of a point in the image remains the same across frames. For example, a pixel in $I(x, y, t)$ retains its brightness in $I(x, y, t+1)$ .

In grayscale images, this is often referred to as brightness constancy.

I(x, y, t) \approx I(x + u, y + v, t + 1)

where $u$ and $v$ represent the motion (optical flow) in the $x$ and $y$ directions.

Small Motion

Motion is assumed to be small, meaning that objects or points don’t move far between consecutive frames. This allows us to use mathematical approximations, such as the Taylor series expansion, to linearize the changes in the image.

\begin{align} I(x+u, y+v) &= I(x, y) + \frac{\partial I}{\partial x} u + \frac{\partial I}{\partial y} v + \text{[higher-order terms]} \\ &= I(x, y) + I_x u + I_y v + \text{[higher-order terms]} \end{align}

Under these two assumptions, we derive the brightness constancy constraint equation as follows:

Brightness Constancy Constraint

Brightness Constancy Constraint:
$I_x u + I_y v + I_t = 0$

This is the optical flow equation, but it presents a problem: we have one equation and two unknowns $(u, v)$ for every pixel. To solve this, additional constraints are needed.

Optical Flow Estimation

Several methods have been proposed to resolve the ambiguity in the optical flow equation. Two prominent approaches are the Horn-Schunck method and the Lucas-Kanade method.

Horn-Schunck Method

The Horn-Schunck method assumes that the optical flow is smooth across the image. This global method computes a dense optical flow field by minimizing an energy function that combines two constraints:

Brightness Constancy: Ensures that the intensity of points doesn’t change over time.
Smooth Flow: Ensures that the flow varies smoothly from pixel to pixel.

This method is suitable for estimating dense motion fields but can be computationally expensive, especially for large images or videos.

Lucas-Kanade Method

In contrast, the Lucas-Kanade method assumes that the optical flow is constant within small patches of the image. It uses a local window around each pixel to gather more equations (up to 25 equations for a $5 \times 5$ window) and solve for $u$ and $v$ .

This method is computationally efficient and works well for tracking sparse features in an image, such as corners or points of interest.

Horn-Schunck vs. Lucas-Kanade

Feature	Horn-Schunck	Lucas-Kanade
Flow Type	Dense	Sparse
Assumption	Smooth Flow	Constant Flow
Scope	Global	Local
Computational Cost	High	Moderate

Horn-Schunck Method

The Horn-Schunck method is based on the principle of minimizing an energy functional that combines the brightness constancy constraint with a smoothness term. The energy functional is defined as:

E(u, v) = \int\int (I_x u + I_y v + I_t)^2 + \alpha (\|\nabla u\|^2 + \|\nabla v\|^2) dxdy

The solution to this problem involves the use of variational calculus, a mathematical technique that allows us to find functions that minimize or maximize functionals. In the context of the Horn-Schunck method, variational calculus is employed to derive the Euler-Lagrange equations, which provide the necessary conditions for the optical flow fields $u$ and $v$ to minimize the energy functional. This process involves taking the functional derivative of the energy expression with respect to the flow fields and setting it to zero, leading to a set of partial differential equations that describe the optimal flow.

Lucas-Kanade

The Lucas-Kanade method is particularly effective for tracking small, localized motions in an image sequence. It operates under the assumption that the motion of the pixels within a small neighborhood is approximately constant. This allows the method to solve for the optical flow by considering multiple equations derived from the image gradients within the neighborhood.

The method can be summarized in the following steps:

Select a Window: Choose a small window around each pixel where the flow is assumed to be constant.
Compute Image Gradients: Calculate the spatial gradients $I_x$ and $I_y$ and the temporal gradient $I_t$ within the window.
Formulate the System of Equations: For each pixel in the window, the optical flow equation $I_x u + I_y v + I_t = 0$ is formed. If we use a $5x5$ window, that gives us 25 equations per pixel

\begin{bmatrix} I_x(\mathbf{P_1}) & I_y(\mathbf{P_1}) \\ I_x(\mathbf{P_2}) & I_y(\mathbf{P_2}) \\ \vdots & \vdots \\ I_x(\mathbf{P_{25}}) & I_y(\mathbf{P_{25}}) \\ \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = - \begin{bmatrix} I_t(\mathbf{P_1}) \\ I_t(\mathbf{P_2}) \\ \vdots \\ I_t(\mathbf{P_{25}}) \\ \end{bmatrix}

Ad=b

Solve for Flow: Use least squares to solve the over-determined system of equations for the flow vector $(u, v)$ .

\begin{aligned} & ||e||^2 = ||Ad-b||^2 = (Ad-b)^T(Ad-b) \\ & \text{Take derivative with respect to } d \text{ and set to zero} \\ & 2A^T(Ad-b) = 0 \\ & \text{Solve:} \\ & A^TAd = A^Tb \end{aligned}

Solubility of Lucas-Kanade When is the Lucas-Kanade method solvable? In other words, what are the characteristics of good points to track?

The matrix $A^TA$ should be invertible.
$A^TA$ $A^{T} A$ should not be too small to avoid being affected by noise.
- The eigenvalues $\lambda_1$ and $\lambda_2$ of $A^TA$ should not be too small.
ATA should be well-conditioned.
- The ratio $\lambda_1 / \lambda_2$ should not be too large (where $\lambda_1$ is the larger eigenvalue).

The condition for the solubility of the Lucas-Kanade method, which involves ensuring that the matrix $A^TA$ is invertible and well-conditioned, is akin to the criteria used in the Harris corner detector. In the Harris corner detector, the focus is on identifying keypoints in an image where the intensity changes significantly in all directions, which is determined by analyzing the eigenvalues of a similar matrix. Both methods rely on the properties of these eigenvalues to ensure that the points selected are suitable for tracking or corner detection, as they provide stable and reliable features.

On nice textured regions motion is well estimated because the gradient varies over the window.

Dealing with Large Motions: Coarse-to-Fine Estimation

When there’s significant motion between frames, small motion assumptions may no longer hold. To handle such cases, a coarse-to-fine strategy is often employed. It involves the following steps:

Downsampling: The image is downsampled to create a low-resolution version, where large motions appear smaller.
Estimating Motion: Optical flow is computed at the coarse level.
Refinement: The flow is refined iteratively at finer resolutions, propagating information from the coarse level to the fine level.

This hierarchical approach ensures accurate motion estimation for large displacements while maintaining computational efficiency.

OpenCV (Python) based Implementation

Sparse Optical Flow: This code demonstrates how to implement sparse optical flow tracking using the Lucas-Kanade method in OpenCV. A video file is loaded, and key feature points are identified in the first frame using the cv.goodFeaturesToTrack function, which detects corners. For each subsequent frame, the cv.calcOpticalFlowPyrLK function calculates the motion of these points between the current and previous frames. Only points successfully tracked are retained, and lines are drawn to visualize their movement, along with rectangles to highlight their new positions. Random colors are assigned to points for better visualization. The process repeats until the video ends or the user presses 'q', creating an interactive display of point tracking throughout the video.

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

cap = cv.VideoCapture('<video file>') 

n = 100 # number of points to track
winSize = (15,15) # size of the window for lucas kanade

ret, frame = cap.read() 
gray0 = cv.cvtColor(frame, cv.COLOR_BGR2GRAY) 

# Find good features to track
points = cv.goodFeaturesToTrack(gray0, maxCorners=n, qualityLevel=0.1, minDistance=7) 

color = np.random.randint(0, 255, (n, 3)) 

points0 = points
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("Can't receive frame (stream end?). Exiting ...")
        #break
    gray1 = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)     
    points1, status, err = cv.calcOpticalFlowPyrLK(gray0, gray1, points0, None, winSize=winSize, maxLevel=3) 
    
    # Select good points 
    points0 = points0[status == 1] 
    points1 = points1[status == 1] 
    points = points[status == 1] 
    color = color.reshape(-1,1,3)[status==1]
    # display a line between old position and new position of the points
    for i, (p0, p1) in enumerate(zip(points, points1)):
        frame = cv.line(frame, p0.astype(int), p1.astype(int), color[i].tolist(),2)
        frame = cv.rectangle(frame, p1.astype(int)-winSize[0]//2,
                             p1.astype(int)+winSize[0]//2, color[i].tolist(), 2) 
    
    cv.imshow('frame', frame)
    
    if cv.waitKey(1) == ord('q'):
        break
    gray0 = gray1
    points0 = points1.reshape(-1,1,2)
    points = points.reshape(-1,1,2)


cap.release()
cv.destroyAllWindows()

Dense Optical Flow: This code demonstrates dense optical flow estimation between two images using Farneback's method in OpenCV. Two grayscale frames are loaded, and the cv.calcOpticalFlowFarneback function computes the optical flow, representing pixel-wise motion between the two frames. The flow field contains the horizontal and vertical displacement for each pixel.

The results are visualized in three subplots: the first displays the initial frame, the second shows the subsequent frame, and the third uses a quiver plot to overlay flow vectors on a blank background. The flow vectors, sampled at every 10th pixel, highlight the direction and magnitude of motion between the frames, with cyan arrows indicating motion.

frame1 = cv.imread('<First image path>', cv.IMREAD_GRAYSCALE) 
frame2 = cv.imread('<Second image path>', cv.IMREAD_GRAYSCALE) 

flow = cv.calcOpticalFlowFarneback(frame1, frame2, None, 0.5, 3, 15, 3, 5, 1.2, 0)

# Create the plot
plt.figure(figsize=(20, 10))

# Display the first frame
plt.subplot(1, 3, 1)
plt.imshow(frame1, cmap='gray')
plt.title("Frame 1")
plt.axis('off')

# Display the second frame
plt.subplot(1, 3, 2)
plt.imshow(frame2, cmap='gray')
plt.title("Frame 2")
plt.axis('off')

# Display the optical flow
plt.subplot(1, 3, 3)
plt.imshow(frame1*0, cmap='gray')  # Background as the first frame
x, y = np.meshgrid(
    np.arange(0, flow.shape[1], 10),  # x-coordinates
    np.arange(0, flow.shape[0], 10)   # y-coordinates
)
plt.quiver(
    x, y, 
    flow[::10, ::10, 0],  # Horizontal flow
    flow[::10, ::10, 1],  # Vertical flow
    scale=400, color='c'
)
plt.title("Optical Flow")
plt.axis('off')

# Show the plot
plt.tight_layout()
plt.show()

Conclusion

Optical flow is a powerful tool for understanding motion in visual scenes. By leveraging techniques like the Horn-Schunck and Lucas-Kanade methods, we can estimate pixel motion and unlock a variety of applications in computer vision.

While challenges like the aperture problem and large motion estimation exist, advancements in coarse-to-fine strategies and feature detection continue to improve the robustness of optical flow methods.

FAQs

What is optical flow?

Optical flow refers to the apparent motion of objects, surfaces, or edges in a visual scene between two consecutive video frames. It is estimated by analyzing pixel intensity changes over time and space.

Why is optical flow important?

Optical flow is crucial for applications such as:

Object tracking
Motion detection
Scene reconstruction
Robotic navigation

What are the key assumptions in optical flow estimation?

Two primary assumptions are:

Brightness constancy: The intensity of a pixel remains constant across frames.
Small motion: Motion between consecutive frames is minimal, enabling mathematical linearization.

What is the aperture problem in optical flow?

The aperture problem arises when motion is observed through a small local region of an image. It leads to ambiguities in detecting motion direction, as only movement perpendicular to an edge is apparent.

What are the common methods for computing optical flow?

The two most widely used methods are:

Horn-Schunck Method: Assumes smooth flow across the image and computes dense optical flow using global optimization.
Lucas-Kanade Method: Assumes constant flow within small patches and computes sparse optical flow efficiently.

What is the brightness constancy constraint equation?

The equation is:

I_x u + I_y v + I_t = 0

Here, $I_x$ , $I_y$ , and $I_t$ are the spatial and temporal image gradients, while $u$ and $v$ are the optical flow components.

What is the Horn-Schunck method?

The Horn-Schunck method calculates dense optical flow by minimizing an energy function that combines brightness constancy and smoothness constraints. This global approach ensures smooth motion estimation but is computationally intensive.

What is the Lucas-Kanade method?

The Lucas-Kanade method computes sparse optical flow by solving the optical flow equation within a local window. It is computationally efficient and suitable for tracking keypoints or features.

What are the conditions for the Lucas-Kanade method to work?

The matrix $A^TA$ (from the system of equations) should be invertible.
Eigenvalues of $A^TA$ should be large enough to avoid noise sensitivity.
The matrix should be well-conditioned to ensure stability.

What is coarse-to-fine estimation in optical flow?

Coarse-to-fine estimation handles large motions by:

Downsampling the image to reduce motion magnitude.
Estimating optical flow at the coarse level.
Refining the flow iteratively at finer resolutions.

How does optical flow relate to the Harris corner detector?

Both methods use eigenvalues of a gradient-based matrix to identify stable features in an image. The Lucas-Kanade method relies on well-conditioned eigenvalues for reliable flow computation, similar to corner detection criteria in the Harris detector.

What are the challenges in estimating optical flow?

The aperture problem.
Ambiguities in regions with uniform intensity.
High computational costs for dense motion estimation.
Handling large displacements or rapid motion between frames.

What is the barberpole illusion?

The barberpole illusion demonstrates the aperture problem by showing how motion along an edge is perceived differently when viewed through a restricted window, making motion direction ambiguous.

Test Your Knowledge

1/9

What is the primary goal of optical flow in computer vision?

Unraveling Motion: A Deep Dive into Optical Flow in Computer Vision

Introduction

What is Optical Flow?

The Aperture Problem

How Do We Estimate Motion?

Brightness Constancy

Small Motion

Optical Flow Estimation

Horn-Schunck Method

Lucas-Kanade Method

Horn-Schunck vs. Lucas-Kanade

Horn-Schunck Method

Lucas-Kanade

Dealing with Large Motions: Coarse-to-Fine Estimation

OpenCV (Python) based Implementation

Conclusion

FAQs

What is optical flow?

Why is optical flow important?

What are the key assumptions in optical flow estimation?

What is the aperture problem in optical flow?

What are the common methods for computing optical flow?

What is the brightness constancy constraint equation?

What is the Horn-Schunck method?

What is the Lucas-Kanade method?

What are the conditions for the Lucas-Kanade method to work?

What is coarse-to-fine estimation in optical flow?

How does optical flow relate to the Harris corner detector?

What are the challenges in estimating optical flow?

What is the barberpole illusion?

Test Your Knowledge