Generating a video mosaic/grid
I got to use my random technical knowledge to construct a 7-second video clip.
Warning; this post is a monologue and not a tutorial but if you are just interested in a solution using FFmpeg skip here.
I had some video footage that was a few hours long and I wanted to generate another video that was only a few seconds long and displayed a grid of videos that showed different moments from the original video by having each smaller video play with some random time offset. This was fairly easy to visualise in my head and describing it in English was also not too difficult. But it turns out explaining how to perform this efficiently to a computer is not really very easy. I was editing a YouTube video and I had this idea for the intro sequence and I was determined to find a solution that would allow me to do this multiple times without having to wait a ridiculous amount of time.
# The After Effects Attempt
I usually do all of my video editing in Premiere Pro but sometimes there are things that its not capable of so I have to resort to After Effects. Sometimes I find this a little frustrating because I often feel Premiere Pro is a couple of sliders away from solving my task but instead I have to launch a behemoth of a program to solve it. You can manually create a grid of videos in Premiere but I wanted to have some automatic process so I could choose the number of rows and columns at will.
After Effects expressions are powerful but I often feel like they aren’t powerful enough. They definitely have their strengths but I’m always using them as a stopgap. One operation I feel is missing from After Effects is the ability to “instantiate” objects through expressions. There is a Repeater effect that only functions on shape layers but each instance is forced to have the same position offset properties. I do not think this would be a particularly easy feature for After Effects to implement given everything currently works but I also feel like it would be a ridiculously useful motion graphics tool. I don’t doubt that there aren’t plugins to solve this but coming from the open source programming community, After Effects plugins with their absurd price are not very appealing, but that’s just the way it is I suppose.
I couldn’t work out how to do this at first so I do what I do when I’m stuck with any strangely specific problem; find the relevant Discord server and ask more dedicated members of the community. I arrived at 3 possible solutions, layout the grid of videos manually (what I’m trying to avoid), using the Time Displacement effect, or using a plugin (an Adobe subscription is expensive enough as it is). After playing with the Time Displacement effect, it is actually pretty neat. It works similar to a displacement shader in graphics except instead of using a texture to alter geometry position, it alters the time of a video per pixel.
Using the Motion Tile effect you can repeat the same video multiple times in a grid and then each repetition can have a different time offset using Time Displacement. The texture I need looks like random noise scaled up so each pixel equates to one video in the grid. For some reason using the Transform effect in combination with the Noise effect just caused After Effects to run out of memory. I looked for a different way and using Fractal Noise and the Mosaic (pixelate) effect, I was able to achieve what I wanted.
Playback was… a little slow to say the least and I should have taken this as a warning but I wasn’t finished because I wanted the grid size to be controllable. This was accomplished using a Slider Control which is effectively a variable you can control using a slider and is available to use in expressions. Hooking up all of the effect properties to this variable, I accomplished what I wanted.
However, when it came to rendering out the video, I came to the realisation that it would just take too long. The original video is 1080p60; selecting a grid of size 7x7, an output of size 1080p and length 7 seconds, I tried rendering and after waiting a while, After Effects reported estimated about 2 hours to finish rendering. While it was loading, I started to think of different ways of solving this because it wasn’t ideal and then I remembered that FFmpeg exists.
Before I talk about FFmpeg, now that I think about it the reasons for this absurd render time is not so absurd. The original video is compressed using the H.264 format. A substantial amount of computation and file reading is required to read an arbitrary frame of the video because of the way the compression works. After Effects is also not really suited for working with H.264 videos but I didn’t have much choice without exploding the file size of the original video. In addition, the file was stored on my hard disk drive. Hard disk drives work best when data is read sequentially but After Effects renders frame by frame so 49 random frames in the video need to be read from the disk for each frame rendered. The data locality is horrifying. These reasons compounded, it does seem like I set myself up for disappointment but I wasn’t prepared to go down without a fight.
# The FFmpeg Attempt
FFmpeg is a marvelous tool, it comes with the options to do pretty much anything with video and it’s superbly reliable, even if I always have to google 10 things to make sure I have the right flags. After some serious google-rewording madness, I found several people who wanted to stitch multiple videos together into a grid and although my source video was just a single video, it was pretty much all I needed. Luckily, this exact operation to create a videos is possible using the xstack filter. All you have to do is scale down all of the videos and specify the layout as a parameter. Needing to change several parameters every time I wanted to change something like the grid size was a no-go and FFmpeg’s filter format is not particularly expressive so I did what any other person would do and write a script to write a script. I used Node.js (if you know me this is no surprise) and the logic for it is fairly trivial. I ran into a few issues, specifically realising that the setpts filter is necessary (timecodes in FFmpeg always confuse me but what better way is there, I’m controlling moving pictures with words). However, after all of this the render time for a 7x7 grid was still a little absurd although I didn’t stick around long enough to find out how absurd.
I thought perhaps enabling hardware acceleration would help, it didn’t seem to help much but I left it on anyway. At this point I was getting a little weary of all of this - is creating video grids just a extraordinarily time-consuming task? Do modern video formats doom the merging process of a significantly large number of videos? Then I had a smidgen of an idea that might make the process slightly faster.
I edited the script so that instead of reading a single file and combining different parts of that video together, I would first split the large file into several short videos. Since I only need the output to be 7 seconds, for a 7x7 grid the script will choose a random point in the video and cut out 49 7-second snippets out of it. I did not need the time to be accurate so I could use the -c copy
option which makes this operation pretty much instant. Then, another FFmpeg command is run to stitch all 49 smaller videos into one bigger video. Selecting a grid of size 7x7 with an output of size [1920*3.5,1080*3.5]
(so the resolution is divisible by 7), generating the output only took 5 minutes. And thus, I had succeeded.
It took me several hours to figure this all out at which point it would have been faster to just continue using After Effects but my output resolution was a lot higher so I was able to zoom in without any significant loss of quality which is what I had planned to do with the clip. Perhaps maybe I’ll even get to use it more than twice like I had planned!
A minor issue I had with this approach was when I imported the clip into After Effects, animated it and rendered out the video, the 7 second sequence with the video in took 20 minutes to render. Interestingly, After Effects took much longer to render parts of the video that were zoomed in on the grid but was multitudes faster at rendering the video where it was fully zoomed out. Thinking back now, I should have just used a less compressed video file since the output was only 7 seconds long.
# The Solution
Here is the final script I used, it’s not the prettiest but it gets the job done.
// Inputs
const inputFile = process.argv[2];
const rows = 7; // Number of rows and columns
const vidLengthMins = 140; // Length of the input video in minutes
const outputLength = 7; // Length of the output video in seconds
const outSize = [1920*3.5,1080*3.5]; // Resolution of the output video
// Derived
const cells = rows*rows;
const range = i => [...Array(i).keys()];
const arr = range(cells);
const trimPoints = arr.map(() => (Math.random()*vidLengthMins*60));
const outSizeStr = outSize.join('x');
const cellSize = outSize.map(x => x/rows);
const cellSizeStr = cellSize.join('x');
const getPos = (x,y) => `${x===0 ? '0' : range(x).map(i => `w${i}`).join('+')}_${y===0 ? '0' : range(y).map(i => `h${i}`).join('+')}`;
const commands = [];
arr.forEach(i => {
commands.push(`ffmpeg -ss ${trimPoints[i]} -i "${inputFile}" -c copy -t ${outputLengthSecs} in${i}.mp4`);
});
const filter = `${arr.map(i => `[${i}:v] setpts=PTS-STARTPTS, scale=${cellSizeStr} [a${i}]`).join(';')}; ${arr.map(i => `[a${i}]`).join('')}xstack=inputs=${cells}:layout=${arr.map(i => getPos(i%rows,Math.floor(i/rows))).join('|')}[out]`;
// I enabled `-hwaccel cuda` here but do what you want
commands.push(`ffmpeg ${arr.map(i => `-i in${i}.mp4`).join(' ')} -filter_complex "${filter}" -map "[out]" out.mp4`);
console.log(commands.join(' && '));
Run it using (you need node.js):
node ./script.js ./path/to/video.mp4
The program outputs a string to input as a command. Only tested on Windows but hopefully it works on anything. For convenience, pipe the output of the script straight into your clipboard (or automate it further, it’s up to you)!