OptiSub: Optimizing Video Subtitle Presentation for Varied Display and Font Sizes via Speech Pause-Driven Chunking

Video Preview (30s)

Given any font size of subtitles specified by a user, which may vary depending on the display size (due to PPI) and individual preferences, our method automatically generates an optimized presentation of subtitles. The method reconstructs the word composition displayed on the screen at once according to the given font size. To achieve optimal reconstruction that is synchronized with the video content, the duration of speech pause between adjacent words is utilized. Image source: https://www.youtube.com/watch?v=ZhJ-LAQ6e_Y

Abstract

Viewers desire to watch video content with subtitles in various font sizes according to their viewing environment and personal preferences. Unfortunately, because a chunk of the subtitle—a segment of the text corpus displayed on the screen at once—is typically constructed based on one specific font size, text truncation or awkward line breaks can occur when different font sizes are utilized. While existing methods address this problem by reconstructing subtitle chunks based on maximum character counts, they overlook synchronization of the subtitle with the content, often causing the occurrence of misaligned text. We introduce OptiSub, a fully automated system that optimizes subtitle segmentation to fit any font size while ensuring synchronization with the content. Our system leverages the timing of speech pauses within the video for synchronization. Experimental results, including a user study comparing OptiSub with previous methods, demonstrate its effectiveness and practicality across diverse font sizes and input videos.

OptiSub: Optimizing Video Subtitle Presentation for Varied Display and Font Sizes via Speech Pause-Driven Chunking

CHI 2025

Video Preview (30s)

Abstract

Supplementary Video

Presentation