[widget] Understanding Limited Palette’s Color Control Settings

The widget below illustrates how images generated in “Limited Palette” mode are affected by changes to color control settings.

Press the “▷” icon to begin the animation.

The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I’m an ML engineer, not a webdeveloper.

What is “Limited Palette” mode?

In “Unlimited Palette” mode, pytti directly optimizes pixel values to try to maximize the similarity between the generated image and the input prompts. Limited Palette mode uses this same process, but adds additional constraints on how the colors in the image (i.e. the pixel values) are selected.

We start by specifying a number of “palettes”. In this context, you can think of a palette as a container with a fixed number of slots, where each slot holds a single color. During optimization steps, colors which are all members of ths same “palette” container are optimized together. This has the effect that the “palette” objects become sort of “attached” to semantic objects in the image. Let’s say for example you have an init image of an ocean horizon, so half of the picture is water and half of it is the sky. If we set the number of palettes to 2, chances are one palette will primarily carry colors for painting the ocean and the other will carry colors for painting the sky. This is not a hard-and-fast rule, but you should anticipate that palette size settings will interact with the diversity of semantic content in the generated images.

For advice and additional insights about palette and color behaviors in pytti, we recommend the community document Way of the TTI Artist by oxysoft#6139 and collaborators.

Description of Settings in Widget

All settings except smoothing_weight are specific to Limited Palette mode.

  • palette_size: Number of colors in each palette.

  • palettes: Total number of palettes. The image will have palette_size*palettes colors total.

  • gamma: Relative gamma value. Higher values make the image darker and higher contrast, lower values make the image lighter and lower contrast.

  • hdr_weight: How strongly the optimizer will maintain the gamma. Set to 0 to disable.

  • palette_normalization_weight: How strongly the optimizer will maintain the palettes’ presence in the image. Prevents the image from losing palettes.

  • smoothing_weight: Makes the image smoother using “total variation loss” (old-school image denoising). Can also be negative for that deep fried look.

Widget

import re
from pathlib import Path

import pandas as pd
import panel as pn

pn.extension()

outputs_root = Path('images_out')
folder_prefix = 'permutations_limited_palette_2D'
folders = list(outputs_root.glob(f'{folder_prefix}_*'))

def format_val(v):
    try:
        v = float(v)
        if int(v) == v:
            v = int(v)
    except:
        pass
    return v

def parse_folder_name(folder):
    metadata_string = folder.name[1+len(folder_prefix):]
    pattern = r"_?([a-zA-Z_]+)-([0-9.]+)"
    matches = re.findall(pattern, metadata_string)
    d_ = {k:format_val(v) for k,v in matches}
    d_['fpath'] = folder
    d_['n_images'] = len(list(folder.glob('*.png')))
    return d_

df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])

variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]
variant_ranges = {v:df_meta[v].unique() for v in variant_names}
[v.sort() for v in variant_ranges.values()]

###########################

url_prefix = "https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/"

image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]
d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}

###########################

n_imgs_per_group = 40

kargs = {k:pn.widgets.DiscreteSlider(name=k, options=list(v), value=v[0]) for k,v in variant_ranges.items()}
kargs['i'] = pn.widgets.Player(interval=100, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')

@pn.interact(
    **kargs
)
def display_images(
    palettes,
    palette_size,
    gamma,
    hdr_weight,
    smoothing_weight,
    palette_normalization_weight,
    i,
):
    folder = df_meta[
        (palettes == df_meta['palettes']) &
        (palette_size == df_meta['palette_size']) &
        (gamma == df_meta['gamma']) &
        (hdr_weight == df_meta['hdr_weight']) &
        (smoothing_weight == df_meta['smoothing_weight']) &
        (palette_normalization_weight == df_meta['palette_normalization_weight'])
    ]['fpath'].values[0]
    im_path = str(folder / f"{folder.name}_{i}.png")
    im_url = d_image_urls[im_path]
    return pn.pane.HTML(f'<img src="{im_url}" width="700">', width=700, height=350, sizing_mode='fixed')

pn.panel(display_images).embed(max_opts=n_imgs_per_group, max_states=999999999)

Settings shared across animations

scenes: "fractal crystals | colorful recursions || swirling curves | ethereal neon glow "

scene_suffix: " | text:-1:-.9 | watermark:-1:-.9"

steps_per_frame: 50
save_every: 50
steps_per_scene: 1000
interpolation_steps: 500

image_model: "Limited Palette"
lock_palette: false

animation_mode: "2D"
translate_y: -1
zoom_x_2d: 3
zoom_y_2d: 3

ViT-B/32: true
cutouts: 60
cut_pow: 1

seed: 12345

pixel_size: 1
height: 128
width: 256

Detailed explanation of shared settings

scenes: "fractal crystals | colorful recursions || swirling curves | ethereal neon glow "

We have two scenes (separated by ||) with two prompts each (separated by (|).

scene_suffix: " | text:-1:-.9 | watermark:-1:-.9"

We add prompts with negative weights (and ‘stop’ weights: prompt:weight:stop) to try to discourage generation of specific artifacts. Putting these prompts in the scene_suffix field is a shorthand for concatenating this prompts into all of the scenes. I find it also helps keep the settings a little more neatly organized by reducing clutter in the scenes field.

steps_per_frame: 50
save_every: 50
steps_per_scene: 1000

Pytti will take 50 optimization steps for each frame (i.e. image) of the animation.

We have two scenes: 1000 steps_per_scene / 50 steps_per_frame = 20 frames per scene = 40 frames total will be generated.

interpolation_steps: 500

a range of 500 steps will be treated as a kind of “overlap” between the two scenes to ease the transition from one scene to the next. This means for each scene, we’ll have 1000 - 500/2 = 750 steps = 15 frames that are just the prompt we specified for that scene, and 5 frames were the guiding prompts are constructed by interpolating (mixing) between the prompts of the two scenes. Concretely:

  • first 15 frames: only the prompt for the first scene is used

  • next 5 frames: we use the prompts from both scenes, weighting the first scene more heavily

  • next 5 frames: we use the prompts from both scenes, weighting the second scene more heavily

  • last 15 frames: only the prompt for the second scene is used.

image_model="Limited Palette"
lock_palette: false

We’re using the Limited Palette mode described above, letting the palette change throughout the learning process rather than fitting and freezing it upon initialization.

animation_mode: "2D"
translate_y: -1
zoom_x_2d: 3
zoom_y_2d: 3

After each frame is generated, we will initialize the next frame by scaling up (zooming into) the image a small amount, then shift it (translate) down (negative direction along y axis) a tiny bit. The zoom creates a forward motion illusion: adding the y translation creates the effect of the scene rotating away as the viewer passes over it. NB: more dramatic depth illusions are generally achieved using animation_mode: 3D, but that mode generates images more slowly and this project already required several days to generate.

ViT-B/32: true

We’re using the smallest of openai’s pre-trained vision transformer (ViT) CLIP models to guide the animation. This is the AI component which computes the similarity between the image and the text prompt, hereafter referred to as “the perceptor”.

cutouts: 60
cut_pow: 1

For each optimization step, we will take 60 random crops from the image to show the perceptor. cut_pow controls the size of these cutouts: 1 is generally a good default, smaller values create bigger cutouts. Generally, more cutouts = nicer images. Setting the number of cutouts too low can result in the image segmenting itself into regions: you can observe this phenomenon manifesting towards the end of many of the animations generated in this experiment. In addition to turning up the number of cutouts, this could also potentially be fixed be setting the cut_pow lower to ask the perceptor to score larger regions at a time.

seed: 12345

If a seed is not specified, one will be generated randomly. This value is used to initialize the random number generator: specifying a seed promotes deterministic (repeatable) behavior. This is an especially useful parameter to set for comparison studies like this.