Google’s “AutoFlip” Intelligently Crops Videos

Google’s “AutoFlip” Intelligently Crops Videos

Traditionally, people watched videos on TVs with a 16:9 or 4:3 aspect ratio. Nowadays, however, videos are viewed and created on devices with various aspect ratios. Cropping videos to suit these screens can be laborious for video curators. Fortunately, Google has stepped in to streamline the process.

In a recent blog post, Google unveiled an open-source tool for reframing and cropping videos to fit any screen. AutoFlip utilizes machine learning (ML) based object detection and tracking technology to automatically reframe videos.

AutoFlip – Intelligent Video Cropping

Google developed this tool to revolutionize traditional static video cropping. Static cropping relies on unreliable reframing techniques, where a camera viewport is defined, and everything outside is cropped. This often yields unsatisfactory results.

Google Autoflip offers advanced features including shot detection, video content analysis, and reframing. Let’s delve into each of these reframing techniques briefly.

Shot (Scene) Detection

A scene or shot in a video is a continuous sequence of frames without cuts. Google’s AutoFlip detects changes by comparing color histograms of consecutive frames. A shot change is identified when frame color distribution shifts differently from a historical window. The tool buffers the entire video to optimize reframing decisions.

Video Content Analysis

Utilizing this strategy, the tool identifies significant objects and individuals within the video. Employing deep learning object detection models, it recognizes objects, including text overlays, brand logos, and motion elements like balls in sports footage. The tool integrates face and object detection models via MediaPipe, a framework for multimodal data processing. This framework leverages Google’s TensorFlow Lite ML framework on CPUs.

Revamped Perspective

After identifying people and objects in videos, the tool makes logical decisions on reframing. AutoFlip selects from three strategies – stationary, panning, or tracking, based on video content. In stationary mode, the camera viewport remains fixed, capturing important scenes. For motion-filled videos, Panning moves the viewport at a constant velocity. Tracking mode engages when interesting subjects appear.

AutoFlip sets an optimized cropping window for each frame based on the algorithm’s chosen strategy. This preserves video content effectively.

Google released this tool to developers and filmmakers to ” reduce barriers to design creativity and reach through automated video editing.” From landscape to portrait or vice versa, AutoFlip aims for optimal results.