[widget] Video Source Stabilization (part 1)

The widget below illustrates how images generated using animation_mode: Video Source are affected by certain “stabilization” options.

Press the “▷” icon to begin the animation.

The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I’m an ML engineer, not a webdeveloper.

What is “Video Source” animation mode?

PyTTI generates images by iterative updates. This process can be initialized in a variety of ways, and depending on how certain settings are configured, the initial state can have a very significant impact on the final result. For example, if we set the number of steps or the learning rate very low, the final result might be barely modified from the initial state. PyTTI’s default behavior is to initialize this process using random noise (i.e. an image of fuzzy static). If we provide an image to use for the starting state of this process, the “image generation” can become more of an “image manipulation”. A video is just a sequence of images, so we can use pytti as a tool for manipulating an input video sequence similar to how pytti can be used to manipulate an input image.

Generating a sequence of images for an animation often comes with some additional considerations. In particular: we often want to be able to control frame-to-frame coherence. Using adjacent video frames as init images to generate adjacent frames of an animation is a good way to at least guarantee some structural coherence in terms of the image layout, but otherwise the images will be generated independently of each other. A single frame of an animation generated this way will probably look fine in isolation, but as part of an animation sequence it might create a kind of undesirable flickering as manifestations of objects in the image change without regard to what they looked like in the previous frame.

To resolve this, PyTTI provides a variety of mechanisms for encouraging an image generation to conform to attributes of either the input video, previously generated animation frames, or both.

The following widget uses the VQGAN image model. You can aboslutely use other image models for video source animations, but generally we find this is what people are looking for. There will be some artifacts in the animations generated here as a consequence of the low output resolution used, so keep in mind that VQGAN outputs don’t need to be as “blocky” as those illustrated here. The resolution in this experiment was kept low to generate the demonstration images faster.

Description of Settings in Widget

  • reencode_each_frame: Use each video frame as an init_image instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode.

  • direct_stabilization_weight: Use the current frame of the video as a direct image prompt.

  • semantic_stabilization_weight: Use the current frame of the video as a semantic image prompt


import re
from pathlib import Path

from IPython.display import display, clear_output, Image, Video
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import panel as pn



outputs_root = Path('images_out')
folder_prefix = 'exp_video_basic_stability_modes'
folders = list(outputs_root.glob(f'{folder_prefix}_*'))

def format_val(v):
        v = float(v)
        if int(v) == v:
            v = int(v)
    return v

def parse_folder_name(folder):
    metadata_string = folder.name[1+len(folder_prefix):]
    pattern = r"_?([a-zA-Z_]+)-(True|False|[0-9.]+)"
    matches = re.findall(pattern, metadata_string)
    d_ = {k:format_val(v) for k,v in matches}
    d_['fpath'] = folder
    d_['n_images'] = len(list(folder.glob('*.png')))
    return d_

df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])

variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]
variant_ranges = {v:df_meta[v].unique() for v in variant_names}
[v.sort() for v in variant_ranges.values()]


n_imgs_per_group = 20

def setting_name_shorthand(setting_name):
    return ''.join([tok[0] for tok in setting_name.split('_')])

decoded_setting_name = {
    'ref': 'reencode_each_frame',
    'dsw': 'direct_stabilization_weight',
    'ssw': 'semantic_stabilization_weight',

kargs = {k:pn.widgets.DiscreteSlider(name=decoded_setting_name[k], options=list(v), value=v[0]) for k,v in variant_ranges.items() if k != 'n_images'}
kargs['i'] = pn.widgets.Player(interval=300, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')

url_prefix = "https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/"
image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]
d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}


def display_images(
    folder = df_meta[
        (ref == df_meta['ref']) &
        (dsw == df_meta['dsw']) &
        (ssw == df_meta['ssw'])
    im_path = str(folder / f"{folder.name}_{i}.png")
    #im_url = im_path
    im_url = d_image_urls[im_path]
    return pn.pane.HTML(f'<img src="{im_url}" width="700">', width=700, height=350, sizing_mode='fixed')

pn.panel(display_images, height=1000).embed(max_opts=n_imgs_per_group, max_states=999999999)