CamelEdge
computer vision

Unraveling Motion: A Deep Dive into Optical Flow in Computer Vision

Vibrant Carousel Ride at Outdoor Amusement Park
Table Of Content

    By CamelEdge

    Updated on Mon Dec 02 2024


    Motion is an integral part of our lives, whether it’s noticing the movement of vehicles on the road or tracking a soccer ball in a game. Our ability to perceive, understand, and predict motion enables us to interact effectively with the dynamic world around us. In the digital world, mimicking our natural ability to perceive motion requires the analysis of movement within video frames, a technique referred to as optical flow.

    In this blog, we’ll explore optical flow, its principles, challenges, and methods.

    What is Optical Flow?

    At its core, optical flow refers to the apparent motion of objects, surfaces, or edges in a visual scene, as observed through a sequence of images over time. A video can be thought of as a collection of frames captured at successive moments. In this context, the image data can be represented as a function of space (x,y)(x, y) and time tt:

    video frames
    A video is made up of a sequence of individual frames

    Optical flow involves estimating how pixel intensities in these images move from one frame to the next. This estimation helps identify the motion patterns in a scene, which is crucial for tasks such as object tracking, scene reconstruction, and robotic navigation.

    video frames
    Optical flow is the estimation of the apparent motion of objects in a visual scene, which can be computed from time-varying image sequence.

    The Aperture Problem

    Before diving into the methods of calculating optical flow, let’s address a fundamental challenge: the aperture problem.

    The aperture problem refers to a limitation in accurately estimating the motion of objects in an image when only a small local region of the image is considered

    Imagine watching a straight edge move from left to right through a small circular window. You know the edge is moving to the right, but when you look through the window, the movement suddenly seems different. Why does this happen?

    Illustration of the aperture problem: A line moves from left to right, but when viewed through a small window, the perceived motion direction differs.

    It’s because movement along the edge doesn’t change how bright the pixels appear—it looks the same as if the edge wasn’t moving at all. Only motion perpendicular to the edge creates visible changes in brightness that we can detect.

    This limitation, known as the aperture problem, makes estimating motion in optical flow tricky. It’s a key reason why understanding motion isn’t always straightforward in computer vision.

    Motion along the edge direction is not detectable

    This phenomenon is also observed in the barberpole illusion.


    How Do We Estimate Motion?

    To compute the motion of pixels from one frame to the next, such as from I(x,y,t)I(x, y, t) to I(x,y,t+1)I(x, y, t + 1), we rely on certain key assumptions:

    Brightness Constancy

    This principle assumes that the intensity of a point in the image remains the same across frames. For example, a pixel in I(x,y,t)I(x, y, t) retains its brightness in I(x,y,t+1)I(x, y, t+1).

    In grayscale images, this is often referred to as brightness constancy.

    I(x,y,t)I(x+u,y+v,t+1)I(x, y, t) \approx I(x + u, y + v, t + 1)

    where uu and vv represent the motion (optical flow) in the xx and yy directions.

    Small Motion

    Motion is assumed to be small, meaning that objects or points don’t move far between consecutive frames. This allows us to use mathematical approximations, such as the Taylor series expansion, to linearize the changes in the image.

    I(x+u,y+v)=I(x,y)+Ixu+Iyv+[higher-order terms]=I(x,y)+Ixu+Iyv+[higher-order terms]\begin{align} I(x+u, y+v) &= I(x, y) + \frac{\partial I}{\partial x} u + \frac{\partial I}{\partial y} v + \text{[higher-order terms]} \\ &= I(x, y) + I_x u + I_y v + \text{[higher-order terms]} \end{align}

    Under these two assumptions, we derive the brightness constancy constraint equation as follows:

    Brightness Constancy Constraint

    Brightness Constancy Constraint:

    Ixu+Iyv+It=0I_x u + I_y v + I_t = 0

    This is the optical flow equation, but it presents a problem: we have one equation and two unknowns (u,v)(u, v) for every pixel. To solve this, additional constraints are needed.


    Optical Flow Estimation

    Several methods have been proposed to resolve the ambiguity in the optical flow equation. Two prominent approaches are the Horn-Schunck method and the Lucas-Kanade method.

    Horn-Schunck Method

    The Horn-Schunck method assumes that the optical flow is smooth across the image. This global method computes a dense optical flow field by minimizing an energy function that combines two constraints:

    • Brightness Constancy: Ensures that the intensity of points doesn’t change over time.
    • Smooth Flow: Ensures that the flow varies smoothly from pixel to pixel.

    This method is suitable for estimating dense motion fields but can be computationally expensive, especially for large images or videos.

    Lucas-Kanade Method

    In contrast, the Lucas-Kanade method assumes that the optical flow is constant within small patches of the image. It uses a local window around each pixel to gather more equations (up to 25 equations for a 5×55 \times 5 window) and solve for uu and vv.

    This method is computationally efficient and works well for tracking sparse features in an image, such as corners or points of interest.

    Horn-Schunck vs. Lucas-Kanade

    FeatureHorn-SchunckLucas-Kanade
    Flow TypeDenseSparse
    AssumptionSmooth FlowConstant Flow
    ScopeGlobalLocal
    Computational CostHighModerate

    Horn-Schunck Method

    The Horn-Schunck method is based on the principle of minimizing an energy functional that combines the brightness constancy constraint with a smoothness term. The energy functional is defined as:

    E(u,v)=(Ixu+Iyv+It)2+α(u2+v2)dxdyE(u, v) = \int\int (I_x u + I_y v + I_t)^2 + \alpha (\|\nabla u\|^2 + \|\nabla v\|^2) dxdy

    The solution to this problem involves the use of variational calculus, a mathematical technique that allows us to find functions that minimize or maximize functionals. In the context of the Horn-Schunck method, variational calculus is employed to derive the Euler-Lagrange equations, which provide the necessary conditions for the optical flow fields uu and vv to minimize the energy functional. This process involves taking the functional derivative of the energy expression with respect to the flow fields and setting it to zero, leading to a set of partial differential equations that describe the optimal flow.

    Lucas-Kanade

    The Lucas-Kanade method is particularly effective for tracking small, localized motions in an image sequence. It operates under the assumption that the motion of the pixels within a small neighborhood is approximately constant. This allows the method to solve for the optical flow by considering multiple equations derived from the image gradients within the neighborhood.

    The method can be summarized in the following steps:

    1. Select a Window: Choose a small window around each pixel where the flow is assumed to be constant.
    2. Compute Image Gradients: Calculate the spatial gradients IxI_x and IyI_y and the temporal gradient ItI_t within the window.
    3. Formulate the System of Equations: For each pixel in the window, the optical flow equation Ixu+Iyv+It=0I_x u + I_y v + I_t = 0 is formed. If we use a 5x55x5 window, that gives us 25 equations per pixel
    [Ix(P1)Iy(P1)Ix(P2)Iy(P2)Ix(P25)Iy(P25)][uv]=[It(P1)It(P2)It(P25)]\begin{bmatrix} I_x(\mathbf{P_1}) & I_y(\mathbf{P_1}) \\ I_x(\mathbf{P_2}) & I_y(\mathbf{P_2}) \\ \vdots & \vdots \\ I_x(\mathbf{P_{25}}) & I_y(\mathbf{P_{25}}) \\ \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = - \begin{bmatrix} I_t(\mathbf{P_1}) \\ I_t(\mathbf{P_2}) \\ \vdots \\ I_t(\mathbf{P_{25}}) \\ \end{bmatrix} Ad=bAd=b
    1. Solve for Flow: Use least squares to solve the over-determined system of equations for the flow vector (u,v)(u, v).
    e2=Adb2=(Adb)T(Adb)Take derivative with respect to d and set to zero2AT(Adb)=0Solve:ATAd=ATb\begin{aligned} & ||e||^2 = ||Ad-b||^2 = (Ad-b)^T(Ad-b) \\ & \text{Take derivative with respect to } d \text{ and set to zero} \\ & 2A^T(Ad-b) = 0 \\ & \text{Solve:} \\ & A^TAd = A^Tb \end{aligned}

    Solubility of Lucas-Kanade When is the Lucas-Kanade method solvable? In other words, what are the characteristics of good points to track?

    • The matrix ATAA^TA should be invertible.
    • ATAA^TA should not be too small to avoid being affected by noise.
      • The eigenvalues λ1\lambda_1 and λ2\lambda_2 of ATAA^TA should not be too small.
    • ATA should be well-conditioned.
      • The ratio λ1/λ2\lambda_1 / \lambda_2 should not be too large (where λ1\lambda_1 is the larger eigenvalue).

    The condition for the solubility of the Lucas-Kanade method, which involves ensuring that the matrix ATAA^TA is invertible and well-conditioned, is akin to the criteria used in the Harris corner detector. In the Harris corner detector, the focus is on identifying keypoints in an image where the intensity changes significantly in all directions, which is determined by analyzing the eigenvalues of a similar matrix. Both methods rely on the properties of these eigenvalues to ensure that the points selected are suitable for tracking or corner detection, as they provide stable and reliable features.

    On nice textured regions motion is well estimated because the gradient varies over the window.

    Dealing with Large Motions: Coarse-to-Fine Estimation

    When there’s significant motion between frames, small motion assumptions may no longer hold. To handle such cases, a coarse-to-fine strategy is often employed. It involves the following steps:

    1. Downsampling: The image is downsampled to create a low-resolution version, where large motions appear smaller.
    2. Estimating Motion: Optical flow is computed at the coarse level.
    3. Refinement: The flow is refined iteratively at finer resolutions, propagating information from the coarse level to the fine level.

    This hierarchical approach ensures accurate motion estimation for large displacements while maintaining computational efficiency.


    OpenCV (Python) based Implementation

    Sparse Optical Flow: This code demonstrates how to implement sparse optical flow tracking using the Lucas-Kanade method in OpenCV. A video file is loaded, and key feature points are identified in the first frame using the cv.goodFeaturesToTrack function, which detects corners. For each subsequent frame, the cv.calcOpticalFlowPyrLK function calculates the motion of these points between the current and previous frames. Only points successfully tracked are retained, and lines are drawn to visualize their movement, along with rectangles to highlight their new positions. Random colors are assigned to points for better visualization. The process repeats until the video ends or the user presses 'q', creating an interactive display of point tracking throughout the video.

    import numpy as np
    import cv2 as cv
    from matplotlib import pyplot as plt
    
    cap = cv.VideoCapture('<video file>') 
    
    n = 100 # number of points to track
    winSize = (15,15) # size of the window for lucas kanade
    
    ret, frame = cap.read() 
    gray0 = cv.cvtColor(frame, cv.COLOR_BGR2GRAY) 
    
    # Find good features to track
    points = cv.goodFeaturesToTrack(gray0, maxCorners=n, qualityLevel=0.1, minDistance=7) 
    
    color = np.random.randint(0, 255, (n, 3)) 
    
    points0 = points
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            print("Can't receive frame (stream end?). Exiting ...")
            #break
        gray1 = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)     
        points1, status, err = cv.calcOpticalFlowPyrLK(gray0, gray1, points0, None, winSize=winSize, maxLevel=3) 
        
        # Select good points 
        points0 = points0[status == 1] 
        points1 = points1[status == 1] 
        points = points[status == 1] 
        color = color.reshape(-1,1,3)[status==1]
        # display a line between old position and new position of the points
        for i, (p0, p1) in enumerate(zip(points, points1)):
            frame = cv.line(frame, p0.astype(int), p1.astype(int), color[i].tolist(),2)
            frame = cv.rectangle(frame, p1.astype(int)-winSize[0]//2,
                                 p1.astype(int)+winSize[0]//2, color[i].tolist(), 2) 
        
        cv.imshow('frame', frame)
        
        if cv.waitKey(1) == ord('q'):
            break
        gray0 = gray1
        points0 = points1.reshape(-1,1,2)
        points = points.reshape(-1,1,2)
    
    
    cap.release()
    cv.destroyAllWindows()
    

    Dense Optical Flow: This code demonstrates dense optical flow estimation between two images using Farneback's method in OpenCV. Two grayscale frames are loaded, and the cv.calcOpticalFlowFarneback function computes the optical flow, representing pixel-wise motion between the two frames. The flow field contains the horizontal and vertical displacement for each pixel.

    The results are visualized in three subplots: the first displays the initial frame, the second shows the subsequent frame, and the third uses a quiver plot to overlay flow vectors on a blank background. The flow vectors, sampled at every 10th pixel, highlight the direction and magnitude of motion between the frames, with cyan arrows indicating motion.

    frame1 = cv.imread('<First image path>', cv.IMREAD_GRAYSCALE) 
    frame2 = cv.imread('<Second image path>', cv.IMREAD_GRAYSCALE) 
    
    flow = cv.calcOpticalFlowFarneback(frame1, frame2, None, 0.5, 3, 15, 3, 5, 1.2, 0)
    
    # Create the plot
    plt.figure(figsize=(20, 10))
    
    # Display the first frame
    plt.subplot(1, 3, 1)
    plt.imshow(frame1, cmap='gray')
    plt.title("Frame 1")
    plt.axis('off')
    
    # Display the second frame
    plt.subplot(1, 3, 2)
    plt.imshow(frame2, cmap='gray')
    plt.title("Frame 2")
    plt.axis('off')
    
    # Display the optical flow
    plt.subplot(1, 3, 3)
    plt.imshow(frame1*0, cmap='gray')  # Background as the first frame
    x, y = np.meshgrid(
        np.arange(0, flow.shape[1], 10),  # x-coordinates
        np.arange(0, flow.shape[0], 10)   # y-coordinates
    )
    plt.quiver(
        x, y, 
        flow[::10, ::10, 0],  # Horizontal flow
        flow[::10, ::10, 1],  # Vertical flow
        scale=400, color='c'
    )
    plt.title("Optical Flow")
    plt.axis('off')
    
    # Show the plot
    plt.tight_layout()
    plt.show()
    

    Conclusion

    Optical flow is a powerful tool for understanding motion in visual scenes. By leveraging techniques like the Horn-Schunck and Lucas-Kanade methods, we can estimate pixel motion and unlock a variety of applications in computer vision.

    While challenges like the aperture problem and large motion estimation exist, advancements in coarse-to-fine strategies and feature detection continue to improve the robustness of optical flow methods.

    FAQs

    What is optical flow?

    Optical flow refers to the apparent motion of objects, surfaces, or edges in a visual scene between two consecutive video frames. It is estimated by analyzing pixel intensity changes over time and space.

    Why is optical flow important?

    Optical flow is crucial for applications such as:

    • Object tracking
    • Motion detection
    • Scene reconstruction
    • Robotic navigation

    What are the key assumptions in optical flow estimation?

    Two primary assumptions are:

    • Brightness constancy: The intensity of a pixel remains constant across frames.
    • Small motion: Motion between consecutive frames is minimal, enabling mathematical linearization.

    What is the aperture problem in optical flow?

    The aperture problem arises when motion is observed through a small local region of an image. It leads to ambiguities in detecting motion direction, as only movement perpendicular to an edge is apparent.

    What are the common methods for computing optical flow?

    The two most widely used methods are:

    • Horn-Schunck Method: Assumes smooth flow across the image and computes dense optical flow using global optimization.
    • Lucas-Kanade Method: Assumes constant flow within small patches and computes sparse optical flow efficiently.

    What is the brightness constancy constraint equation?

    The equation is:

    Ixu+Iyv+It=0I_x u + I_y v + I_t = 0

    Here, IxI_x, IyI_y, and ItI_t are the spatial and temporal image gradients, while uu and vv are the optical flow components.

    What is the Horn-Schunck method?

    The Horn-Schunck method calculates dense optical flow by minimizing an energy function that combines brightness constancy and smoothness constraints. This global approach ensures smooth motion estimation but is computationally intensive.

    What is the Lucas-Kanade method?

    The Lucas-Kanade method computes sparse optical flow by solving the optical flow equation within a local window. It is computationally efficient and suitable for tracking keypoints or features.

    What are the conditions for the Lucas-Kanade method to work?

    • The matrix ATAA^TA (from the system of equations) should be invertible.
    • Eigenvalues of ATAA^TA should be large enough to avoid noise sensitivity.
    • The matrix should be well-conditioned to ensure stability.

    What is coarse-to-fine estimation in optical flow?

    Coarse-to-fine estimation handles large motions by:

    1. Downsampling the image to reduce motion magnitude.
    2. Estimating optical flow at the coarse level.
    3. Refining the flow iteratively at finer resolutions.

    How does optical flow relate to the Harris corner detector?

    Both methods use eigenvalues of a gradient-based matrix to identify stable features in an image. The Lucas-Kanade method relies on well-conditioned eigenvalues for reliable flow computation, similar to corner detection criteria in the Harris detector.

    What are the challenges in estimating optical flow?

    • The aperture problem.
    • Ambiguities in regions with uniform intensity.
    • High computational costs for dense motion estimation.
    • Handling large displacements or rapid motion between frames.

    What is the barberpole illusion?

    The barberpole illusion demonstrates the aperture problem by showing how motion along an edge is perceived differently when viewed through a restricted window, making motion direction ambiguous.

    Test Your Knowledge

    1/9

    What is the primary goal of optical flow in computer vision?