[widget] Aesthetic Biases of VQGAN and CLIP Checkpoints
Contents
[widget] Aesthetic Biases of VQGAN and CLIP Checkpoints¶
The widget below illustrates how images generated in “VQGAN” mode are affected by the choice of VQGAN model and CLIP perceptor.
Press the “▷” icon to begin the animation.
The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I’m an ML engineer, not a webdeveloper.
What is “VQGAN” mode?¶
VQGAN is a method for representing images implicitly, using a latent representation. The dataset the VQGAN model was trained on creates constraints on the kinds of images the model can generate, so different pre-trained VQGANs consequently can have their own respective characteristic looks, in addition to generating images that may have a kind of general “VQGAN” look to them.
The models used to score image-text similarity (usually a CLIP model) are also affected by the dataset they were trained on. Additionally, there are a couple of different structural configurations of CLIP models (resnet architectures vs transformers, fewer vs more parameters, etc.), and these configurational choices can affect the kinds of images that model will guide the VQGAN towards.
Finally, all of these components can interact. And really, the only way to understand the “look” of these models is to play with them and see for yourself. That’s what this page is for :)
Description of Settings in Widget¶
vqgan_model
: The “name” pytti uses for a particular pre-trained VQGAN. The name is derived from the dataset used to train the model.**mmc_model**
: The identifer of the (CLIP) perceptor used by the mmc library, which pytti uses to load these models.
Widget¶
#import re
from pathlib import Path
import numpy as np
import pandas as pd
import panel as pn
pn.extension()
outputs_root = Path('images_out')
folder_prefix = 'exp_vqgan_base_perceptors' #'permutations_limited_palette_2D'
folders = list(outputs_root.glob(f'{folder_prefix}_*'))
def format_val(v):
try:
v = float(v)
if int(v) == v:
v = int(v)
except:
pass
return v
# to do-fix this regex
def parse_folder_name(folder):
#metadata_string = folder.name[1+len(folder_prefix):]
#pattern = r"_?([a-zA-Z_]+)-([0-9.]+)"
#matches = re.findall(pattern, metadata_string)
#d_ = {k:format_val(v) for k,v in matches}
_, metadata_string = folder.name.split('__')
d_ = {k:1 for k in metadata_string.split('_')}
d_['fpath'] = folder
d_['n_images'] = len(list(folder.glob('*.png')))
return d_
#let's just make each model a column
df_meta = pd.DataFrame([parse_folder_name(f) for f in folders]).fillna(0)
variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]
variant_ranges = {v:df_meta[v].unique() for v in variant_names}
[v.sort() for v in variant_ranges.values()]
###########################
url_prefix = "https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/"
image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]
d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}
###########################
vqgan_selector = pn.widgets.Select(
name='vqgan_model',
options=[
'imagenet',
'coco',
'wikiart',
'openimages',
'sflckr'
],
value='sflckr',
)
#perceptor_selector = pn.widgets.MultiSelect(
perceptor_selector = pn.widgets.Select(
name='mmc_models',
options=[
'RN101',
'RN50',
'RN50x4',
'ViT-B16',
'ViT-B32'
]
)
n_imgs_per_group = 40
step_selector = pn.widgets.Player(interval=100, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')
@pn.interact(
vqgan_model=vqgan_selector,
mmc_models=perceptor_selector,
i=step_selector,
)
def display_images(
vqgan_model,
mmc_models,
i,
):
#mmc_idx = [df_meta[m] > 0 for m in mmc_models]
#vqgan_model ==
idx = np.ones(len(df_meta), dtype=bool)
#for m in mmc_models:
# idx &= df_meta[m] > 0
idx &= df_meta[mmc_models] > 0
idx &= df_meta[vqgan_model] > 0
folder = df_meta[idx]['fpath'].values[0]
im_path = str(folder / f"{folder.name}_{i}.png")
im_url = d_image_urls[im_path]
#im_url = im_path
return pn.pane.HTML(f'<img src="{im_url}" width="700">', width=700, height=350, sizing_mode='fixed')
pn.panel(display_images).embed(max_opts=n_imgs_per_group, max_states=999999999)