Settings¶

Prompt Controls¶

scenes

Descriptions of scenes you want generated, separated by ||. Each scene can contain multiple prompts, separated by |. See Scene Syntax for details on scene specification syntax and usage examples.

scene_prefix

Prompts prepended to the beginning of each scene.

scene_suffix

prompts appended to the end of each scene.

interpolation_steps

Number of steps to use smoothly transitioning from the last scene at the start of each scene. \(200\) is a good default. Set to \(0\) to disable. Transitions are performed by linearly interpolating between the prompts of the two scenes in semantic (CLIP) space.

steps_per_scene

Total number of steps to spend rendering each scene. Should be at least interpolation_steps. Along with save_every, this will control the total length of an animation.

direct_image_prompts

Paths or urls of images that you want your image to look like in a literal sense, along with weight_mask and stop values, separated by |.

Apply masks to direct image prompts with path or url of image:weight_path or url of mask For video masks it must be a path to an mp4 file.

init_image

Path or url to an image that will be used to seed the initialization of the image generation process. Useful for creating a central focus or imposing a particular layout on the generated images. If not provided, random noise will be used instead

direct_init_weight

Defaults to \(0\). Use the initial image as a direct image prompt. Equivalent to adding init_image:direct_init_weight as a direct_image_prompt. Supports weights, masks, and stops.

semantic_init_weight

Defaults to \(0\). Defaults to \(0\). Use the initial image as a semantic image prompt. Equivalent to adding [init_image]:direct_init_weight as a prompt to each scene in scenes. Supports weights, masks, and stops.

Important

Since this is a semantic prompt, you still need to put the mask in [ ] to denote it as a path or url, otherwise it will be read as text instead of a file.

Image Representation Controls¶

width, height

Image size. Set one of these \(-1\) to derive it from the aspect ratio of the init image.

pixel_size

Integer image scale factor. Makes the image bigger. Set to \(1\) for VQGAN or face VRAM issues.

smoothing_weight

Makes the image smoother. Defaults to \(0\) (no smoothing). Can also be negative for that deep fried look.

image_model

Select how your image will be represented. Supported image models are:

Limited Palette - Use CLIP to optimize image pixels directly, constrained to a fix number of colors. Generally used for pixel art.
Unlimited Palette - Use CLIP to optimize image pixels directly
VQGAN - Use CLIP to optimize a VQGAN’s latent representation of an image

vqgan_model

Select which VQGAN model to use (only considered for image_model: VQGAN)

random_initial_palette

If checked, palettes will start out with random colors. Otherwise they will start out as grayscale. (only for image_model: Limited Palette)

palette_size

Number of colors in each palette. (only for image_model: Limited Palette)

palettes

total number of palettes. The image will have palette_size*palettes colors total. (only for image_model: Limited Palette)

gamma

Relative gamma value. Higher values make the image darker and higher contrast, lower values make the image lighter and lower contrast. (only for image_model: Limited Palette). \(1\) is a good default.

hdr_weight

How strongly the optimizer will maintain the gamma. Set to \(0\) to disable. (only for image_model: Limited Palette)

palette_normalization_weight

How strongly the optimizer will maintain the palettes’ presence in the image. Prevents the image from losing palettes. (only for image_model: Limited Palette)

show_palette

Display a palette sample each time the image is displayed. (only for image_model: Limited Palette)

target_pallete

Path or url of an image which the model will use to make the palette it uses.

lock_pallete

Force the model to use the initial palette (most useful from restore, but will force a grayscale image or a wonky palette otherwise).

Animation Controls¶

animation_mode

Select animation mode or disable animation. Supported animation modes are:

off
2D
3D
Video Source

sampling_mode

How pixels are sampled during animation. nearest will keep the image sharp, but may look bad. bilinear will smooth the image out, and bicubic is untested :)

infill_mode

Select how new pixels should be filled if they come in from the edge.

mirror: reflect image over boundary
wrap: pull pixels from opposite side
black: fill with black
smear: sample closest pixel in image

pre_animation_steps

Number of steps to run before animation starts, to begin with a stable image. \(250\) is a good default.

steps_per_frame

number of steps between each image move. \(50\) is a good default.

frames_per_second

Number of frames to render each second. Controls how \(t\) is scaled.

direct_stabilization_weight

Keeps the current frame as a direct image prompt. For Video Source this will use the current frame of the video as a direct image prompt. For 2D and 3D this will use the shifted version of the previous frame. Also supports masks: weight_mask.mp4.

semantic_stabilization_weight

Keeps the current frame as a semantic image prompt. For Video Source this will use the current frame of the video as a direct image prompt. For 2D and 3D this will use the shifted version of the previous frame. Also supports masks: weight_[mask.mp4] or weight_mask phrase.

depth_stabilization_weight

Keeps the depth model output somewhat consistent at a VERY steep performance cost. For Video Source this will use the current frame of the video as a semantic image prompt. For 2D and 3D this will use the shifted version of the previous frame. Also supports masks: weight_mask.mp4.

edge_stabilization_weight

Keeps the images contours somewhat consistent at very little performance cost. For Video Source this will use the current frame of the video as a direct image prompt with a sobel filter. For 2D and 3D this will use the shifted version of the previous frame. Also supports masks: weight_mask.mp4.

flow_stabilization_weight

Used for animation_mode: 3D and Video Source to prevent flickering. Comes with a slight performance cost for Video Source, and a great one for 3D, due to implementation differences. Also supports masks: weight_mask.mp4. For video source, the mask should select the part of the frame you want to move, and the rest will be treated as a still background.

video_path

path to mp4 file for Video Source

frame_stride

Advance this many frames in the video for each output frame. This is surprisingly useful. Set to \(1\) to render each frame. Video masks will also step at this rate.

reencode_each_frame

Use each video frame as an init_image instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode.

flow_long_term_samples

Sample multiple frames into the past for consistent interpolation even with disocclusion, as described by Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox (2016). Each sample is twice as far back in the past as the last, so the earliest sampled frame is \(2^{\text{long_term_flow_samples}}\) frames in the past. Set to \(0\) to disable.

Motion Controls¶

translate_x: Horizontal image motion as a function of time \(t\) in seconds.
translate_y: Vertical image motion as a function of time \(t\) in seconds.
translate_z_3d: Forward image motion as a function of time \(t\) in seconds. (only for animation_mode:3D)
rotate_3d: Image rotation as a quaternion \(\left[r,x,y,z\right]\) as a function of time \(t\) in seconds. (only for animation_mode:3D)
rotate_2d: Image rotation in degrees as a function of time \(t\) in seconds. (only for animation_mode:2D)
zoom_x_2d: Horizontal image zoom as a function of time \(t\) in seconds. (only for animation_mode:2D)
zoom_y_2d: Vertical image zoom as a function of time \(t\) in seconds. (only for animation_mode:2D)
lock_camera: Prevents scrolling or drifting. Makes for more stable 3D rotations. (only for animation_mode:3D)
field_of_view: Vertical field of view in degrees. (only for animation_mode:3D)
near_plane: Closest depth distance in pixels. (only for animation_mode:3D)
far_plane: Farthest depth distance in pixels. (only for animation_mode:3D)

Audio Reactivity controls¶

Experimental Feature

As of 2022-04-24, this section describes features that are available on the ‘test’ branch but have not yet been merged into the main release

input_audio: path to audio file.
input_audio_offset: timestamp (in seconds) where pytti should start reading audio. Defaults to 0.
input_audio_filters: list of specifications for individual Butterworth bandpass filters.

Bandpass filter specification¶

For technical details on how these filters work, see: Butterworth Bandpass Filters

variable_name: the variable name through which the value of the filter will be referenced in the weight expression of the prompt. Subject to rules of python variable naming.
f_center: The target frequency of the bandpass filter.
f_width: the range of frequencies about the central frequency which the filter will be responsive to.
order: the slope of the frequency response. Default is 5. The higher the “order” of the filter, the more closely the frequency response will resemble a square/step function. Decreasing order will make the filter more permissive of frequencies outside of the range strictly specified by the center and width above. See https://en.wikipedia.org/wiki/Butterworth_filter#Transfer_function for details.

Example: Audio reactivity specification

scenes:"
  a photograph of a beautiful spring day:2 | 
  flowers blooming: 10*fHi |

  coloful sparks: (fHi+fLo) | 
  sun rays: fHi | 
  forest: fLo | 

  ominous: fLo/(fLo + fHi) | 
  hopeful: fHi/(fLo + fHi) | 
  "

input_audio: '/path/to/audio/source.mp3'
input_audio_offset: 0
input_audio_filters:
- variable_name: fLo
  f_center: 105
  f_width: 65
  order: 5
- variable_name: fHi
  f_center: 900
  f_width: 600
  order: 5

frames_per_second: 30

Would create two filters named fLo and fHi, which could then be referenced in the scene specification DSL to tie prompt weights to properties of the input audio at the appropriate time stamp per the specified FPS.

Output Controls¶

file_namespace: Output directory name.
allow_overwrite: Check to overwrite existing files in file_namespace.
display_every: How many steps between each time the image is displayed in the notebook.
clear_every: How many steps between each time notebook console is cleared.
display_scale: Image display scale in notebook. \(1\) will show the image at full size. Does not affect saved images.
save_every: How many steps between each time the image is saved. Set to steps_per_frame for consistent animation.
backups: Number of backups to keep (only the oldest backups are deleted). Large images make very large backups, so be warned. Set to all to save all backups. These are used for the flow_long_term_samples so be sure that this is at least \(2^{\text{flow_long_term_samples}}+1\) for Video Source mode.
show_graphs: Display graphs of the loss values each time the image is displayed. Disable this for local runtimes.
approximate_vram_usage: Currently broken. Don’t believe its lies.

Perceptor Settings¶

ViTB32, ViTB16, RN50, RN50x4…: Select which CLIP models to use for semantic perception. Multiple models may be selected. Each model requires significant VRAM.
learning_rate: How quickly the image changes.
reset_lr_each_frame: The optimizer will adaptively change the learning rate, so this will thwart it.
seed: Pseudorandom seed. Using a fixed seed will make your process more deterministic, which can be useful for comparing how change specific settings impacts the generated images
cutouts: The number of cutouts from the image that will be scored by the perceiver. Think of each cutout as a “glimpse” at the image. The more glimpses you give the perceptor, the better it will understand what it is looking at. Reduce this to use less VRAM at the cost of quality and speed.
cut_pow: Should be positive. Large values shrink cutouts, making the image more detailed, small values expand the cutouts, making it more coherent. \(1\) is a good default. \(3\) or higher can cause crashes.
cutout_border: Should be between \(0\) and \(1\). Allows cutouts to poke out over the edges of the image by this fraction of the image size, allowing better detail around the edges of the image. Set to \(0\) to disable. \(0.25\) is a good default.
border_mode: how to fill cutouts that stick out over the edge of the image. Match with infill_mode for consistent infill.

clamp: move cutouts back onto image
mirror: reflect image over boundary
wrap: pull pixels from opposite side
black: fill with black
smear: sample closest pixel in image

gradient_accumulation_steps: How many batches to use to process cutouts. Must divide cutouts evenly, defaults to \(1\). If you are using high cutouts and receiving VRAM errors, increasing gradient_accumulation_steps may permit you to generate images without reducing the cutouts setting. Setting this higher than \(1\) will slow down the process proportionally.
models_parent_dir: Parent directory beneath which models will be downloaded. Defaults to ~/.cache/, a hidden folder in your user namespace. E.g. the default storage location for the AdaBins model is ~/.cache/adabins/AdaBins_nyu.pt

PyTTI-Tools

Settings

Contents