The problem#

Sometimes I have a long video that I only want a shorter section of. Maybe it's a TV show that I want to clip a funny scene out of. Or a video of a concert that I want to clip the individual songs out of so I can put them into my music library. But figuring out exactly where to start and end the clip so there's no weird sounds or flashes due to accidentally including the surrounding video takes some care.

The solution#

In order to determine the right clip location, I wanted an easy way to repeatedly watch and listen to the start or end of a proposed clip and get immediate feedback on adjusting its position frame-by-frame.

find_split is a script that implements this workflow. You give it a video, a proposed frame index to split the video on, a preview context length in number of frames, and whether the preview should be the video before (to the left of) or after (to the right of) the split. It plays the preview window ending or starting at the proposed split and waits for the user to press a key indicating how to move the proposed split and then repeats, starting by playing that preview. Once you've confirmed you've determined the desired frame index, press ` (or just Ctrl+c) to exit. The full keyboard controls are described at the link.

$ ./find_split video.mkv 1000 60 left
Playing video.mkv on left of split at 1000 \(with 60 extra frames\).
Playing video.mkv on left of split at 1060 \(with 60 extra frames\).

The second line appears after typing Shift+h to increase the proposed frame index by 60 after the first clip finishes. That means the second clip played will be from frame 1000 to frame 1060, so about two seconds (60 frames) long starting at approximately 33 seconds into the video. You can continue to adjust the position until you've decided you've determined the correct frame.

Once you have found the start and end frames (here 1060 to 2034), you can save the clip to a file:

$ ./encode_frames video.mkv 1060 2034 clip.mkv av

(Change the av to audio if you only want the audio and not the video.)

The details#

Workflow#

The basic idea is to see the proposed boundary of the clip in as quick a loop as possible while adjusting the position. To help with that, it's best to keep the preview length short (sometimes as short as one second or 30 frames), although that has to be balanced with being long enough to given enough context to be understandable. Additionally, the interface to adjust the position is based on each interaction being a single keypress so as much of the time as possible is spent observing the current state as opposed to providing inputs to the program.

Interface implementation#

This is the script I developed read_keypress.sh for, although the original one-byte version (all of the controls are printable characters). The code is a very simple call to read_keypress.sh and case on the result to change a parameter and looping the script as shown in the example in my post on interpreting keypresses.

In order to support adjustments at many scales, I took the leftmost six keys in each row of the keyboard and split them in half, so three (e.g. q/w/e) decrease values and the next three increase values (e.g. y/t/r). They're in that order because y is the mirror of q in the row of six keys, so it increases the value by the same amount that q decreases it. The smallest increment is d moves back by one frame while next to it f moves forward by one frame. Additionally, Shift modifies all of those keys to make adjustments of the same direction but greater magnitude. The top two rows of letters on the keyboard move the current frame position, while the bottom row of letters adjusts the preview window size.

Overall, this supports adjusting the frame position quickly by many different amounts without needing to memorize much, just needing to vaguely know where on the keyboard to type (leftmost three keys in each row decrease values, next three keys in each row increase values by the same amounts) and get immediate feedback, and it's easy to undo.

Clipping implementation#

The rest of the work of the script is some straightforward arithmetic expressions (some using bc since Bash arithmetic does not support floating point) and calls to ffmpeg to actually create the clipped videos and mplayer to play them.

To avoid reencoding, ffmpeg is called with the -map 0 option which means to copy all streams from the input to the output. Since we just want to clip the video, there's no need to reencode it, and it's much faster not to. You can see in encode_frames there's a few other attempts at using mencoder and ffmpeg to do this task that didn't work as well.

Using keyframes… or not#

One idea for finding good split points is to look at keyframes. Since a scene change will look very different, it would make sense that every scene change would line up with a keyframe and therefore be a good guess at where to start or end a clip for many uses. ffprobe or ffmpeg can be used to identify keyframes, but, unfortunately, with the video I was using, it just reported keyframe spaced exactly 250 frames apart for the entire video. Additionally, it took ffprobe several minutes to generate that list for a half hour video (ffmpeg at least printed the partial results as it went). But maybe it will be useful on other files.

A Weird Imagination

Clipping videos from the shell