Settings
Contents
Settings¶
Prompt Controls¶
- scenes
Descriptions of scenes you want generated, separated by
||
. Each scene can contain multiple prompts, separated by|
. See Scene Syntax for details on scene specification syntax and usage examples.- scene_prefix
Prompts prepended to the beginning of each scene.
- scene_suffix
prompts appended to the end of each scene.
- interpolation_steps
Number of steps to use smoothly transitioning from the last scene at the start of each scene. \(200\) is a good default. Set to \(0\) to disable. Transitions are performed by linearly interpolating between the prompts of the two scenes in semantic (CLIP) space.
- steps_per_scene
Total number of steps to spend rendering each scene. Should be at least
interpolation_steps
. Along withsave_every
, this will control the total length of an animation.- direct_image_prompts
Paths or urls of images that you want your image to look like in a literal sense, along with
weight_mask
andstop
values, separated by|
.Apply masks to direct image prompts with
path or url of image:weight_path or url of mask
For video masks it must be a path to an mp4 file.- init_image
Path or url to an image that will be used to seed the initialization of the image generation process. Useful for creating a central focus or imposing a particular layout on the generated images. If not provided, random noise will be used instead
- direct_init_weight
Defaults to \(0\). Use the initial image as a direct image prompt. Equivalent to adding
init_image:direct_init_weight
as adirect_image_prompt
. Supports weights, masks, and stops.- semantic_init_weight
Defaults to \(0\). Defaults to \(0\). Use the initial image as a semantic image prompt. Equivalent to adding
[init_image]:direct_init_weight
as a prompt to each scene inscenes
. Supports weights, masks, and stops.
Important
Since this is a semantic prompt, you still need to put the mask in [
]
to denote it as a path or url, otherwise it will be read as text instead of a file.
Image Representation Controls¶
- width, height
Image size. Set one of these \(-1\) to derive it from the aspect ratio of the init image.
- pixel_size
Integer image scale factor. Makes the image bigger. Set to \(1\) for VQGAN or face VRAM issues.
- smoothing_weight
Makes the image smoother. Defaults to \(0\) (no smoothing). Can also be negative for that deep fried look.
- image_model
Select how your image will be represented. Supported image models are:
Limited Palette - Use CLIP to optimize image pixels directly, constrained to a fix number of colors. Generally used for pixel art.
Unlimited Palette - Use CLIP to optimize image pixels directly
VQGAN - Use CLIP to optimize a VQGAN’s latent representation of an image
- vqgan_model
Select which VQGAN model to use (only considered for
image_model: VQGAN
)- random_initial_palette
If checked, palettes will start out with random colors. Otherwise they will start out as grayscale. (only for
image_model: Limited Palette
)- palette_size
Number of colors in each palette. (only for
image_model: Limited Palette
)- palettes
total number of palettes. The image will have
palette_size*palettes
colors total. (only forimage_model: Limited Palette
)- gamma
Relative gamma value. Higher values make the image darker and higher contrast, lower values make the image lighter and lower contrast. (only for
image_model: Limited Palette
). \(1\) is a good default.- hdr_weight
How strongly the optimizer will maintain the
gamma
. Set to \(0\) to disable. (only forimage_model: Limited Palette
)- palette_normalization_weight
How strongly the optimizer will maintain the palettes’ presence in the image. Prevents the image from losing palettes. (only for
image_model: Limited Palette
)- show_palette
Display a palette sample each time the image is displayed. (only for
image_model: Limited Palette
)- target_pallete
Path or url of an image which the model will use to make the palette it uses.
- lock_pallete
Force the model to use the initial palette (most useful from restore, but will force a grayscale image or a wonky palette otherwise).
Animation Controls¶
- animation_mode
Select animation mode or disable animation. Supported animation modes are:
off
2D
3D
Video Source
- sampling_mode
How pixels are sampled during animation.
nearest
will keep the image sharp, but may look bad.bilinear
will smooth the image out, andbicubic
is untested :)- infill_mode
Select how new pixels should be filled if they come in from the edge.
mirror: reflect image over boundary
wrap: pull pixels from opposite side
black: fill with black
smear: sample closest pixel in image
- pre_animation_steps
Number of steps to run before animation starts, to begin with a stable image. \(250\) is a good default.
- steps_per_frame
number of steps between each image move. \(50\) is a good default.
- frames_per_second
Number of frames to render each second. Controls how \(t\) is scaled.
- direct_stabilization_weight
Keeps the current frame as a direct image prompt. For
Video Source
this will use the current frame of the video as a direct image prompt. For2D
and3D
this will use the shifted version of the previous frame. Also supports masks:weight_mask.mp4
.- semantic_stabilization_weight
Keeps the current frame as a semantic image prompt. For
Video Source
this will use the current frame of the video as a direct image prompt. For2D
and3D
this will use the shifted version of the previous frame. Also supports masks:weight_[mask.mp4]
orweight_mask phrase
.- depth_stabilization_weight
Keeps the depth model output somewhat consistent at a VERY steep performance cost. For
Video Source
this will use the current frame of the video as a semantic image prompt. For2D
and3D
this will use the shifted version of the previous frame. Also supports masks:weight_mask.mp4
.- edge_stabilization_weight
Keeps the images contours somewhat consistent at very little performance cost. For
Video Source
this will use the current frame of the video as a direct image prompt with a sobel filter. For2D
and3D
this will use the shifted version of the previous frame. Also supports masks:weight_mask.mp4
.- flow_stabilization_weight
Used for
animation_mode: 3D
andVideo Source
to prevent flickering. Comes with a slight performance cost forVideo Source
, and a great one for3D
, due to implementation differences. Also supports masks:weight_mask.mp4
. For video source, the mask should select the part of the frame you want to move, and the rest will be treated as a still background.- video_path
path to mp4 file for
Video Source
- frame_stride
Advance this many frames in the video for each output frame. This is surprisingly useful. Set to \(1\) to render each frame. Video masks will also step at this rate.
- reencode_each_frame
Use each video frame as an
init_image
instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode.- flow_long_term_samples
Sample multiple frames into the past for consistent interpolation even with disocclusion, as described by Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox (2016). Each sample is twice as far back in the past as the last, so the earliest sampled frame is \(2^{\text{long_term_flow_samples}}\) frames in the past. Set to \(0\) to disable.
Motion Controls¶
- translate_x
Horizontal image motion as a function of time \(t\) in seconds.
- translate_y
Vertical image motion as a function of time \(t\) in seconds.
- translate_z_3d
Forward image motion as a function of time \(t\) in seconds. (only for
animation_mode:3D
)- rotate_3d
Image rotation as a quaternion \(\left[r,x,y,z\right]\) as a function of time \(t\) in seconds. (only for
animation_mode:3D
)- rotate_2d
Image rotation in degrees as a function of time \(t\) in seconds. (only for
animation_mode:2D
)- zoom_x_2d
Horizontal image zoom as a function of time \(t\) in seconds. (only for
animation_mode:2D
)- zoom_y_2d
Vertical image zoom as a function of time \(t\) in seconds. (only for
animation_mode:2D
)- lock_camera
Prevents scrolling or drifting. Makes for more stable 3D rotations. (only for
animation_mode:3D
)- field_of_view
Vertical field of view in degrees. (only for
animation_mode:3D
)- near_plane
Closest depth distance in pixels. (only for
animation_mode:3D
)- far_plane
Farthest depth distance in pixels. (only for
animation_mode:3D
)
Audio Reactivity controls¶
Experimental Feature
As of 2022-04-24, this section describes features that are available on the ‘test’ branch but have not yet been merged into the main release
- input_audio
path to audio file.
- input_audio_offset
timestamp (in seconds) where pytti should start reading audio. Defaults to
0
.- input_audio_filters
list of specifications for individual Butterworth bandpass filters.
Bandpass filter specification¶
For technical details on how these filters work, see: Butterworth Bandpass Filters
- variable_name
the variable name through which the value of the filter will be referenced in the
weight
expression of the prompt. Subject to rules of python variable naming.- f_center
The target frequency of the bandpass filter.
- f_width
the range of frequencies about the central frequency which the filter will be responsive to.
- order
the slope of the frequency response. Default is 5. The higher the “order” of the filter, the more closely the frequency response will resemble a square/step function. Decreasing order will make the filter more permissive of frequencies outside of the range strictly specified by the center and width above. See https://en.wikipedia.org/wiki/Butterworth_filter#Transfer_function for details.
Example: Audio reactivity specification
scenes:"
a photograph of a beautiful spring day:2 |
flowers blooming: 10*fHi |
coloful sparks: (fHi+fLo) |
sun rays: fHi |
forest: fLo |
ominous: fLo/(fLo + fHi) |
hopeful: fHi/(fLo + fHi) |
"
input_audio: '/path/to/audio/source.mp3'
input_audio_offset: 0
input_audio_filters:
- variable_name: fLo
f_center: 105
f_width: 65
order: 5
- variable_name: fHi
f_center: 900
f_width: 600
order: 5
frames_per_second: 30
Would create two filters named fLo
and fHi
, which could then be referenced in the scene specification DSL to tie prompt weights to properties of the input audio at the appropriate time stamp per the specified FPS.
Output Controls¶
- file_namespace
Output directory name.
- allow_overwrite
Check to overwrite existing files in
file_namespace
.- display_every
How many steps between each time the image is displayed in the notebook.
- clear_every
How many steps between each time notebook console is cleared.
- display_scale
Image display scale in notebook. \(1\) will show the image at full size. Does not affect saved images.
- save_every
How many steps between each time the image is saved. Set to
steps_per_frame
for consistent animation.- backups
Number of backups to keep (only the oldest backups are deleted). Large images make very large backups, so be warned. Set to
all
to save all backups. These are used for theflow_long_term_samples
so be sure that this is at least \(2^{\text{flow_long_term_samples}}+1\) forVideo Source
mode.- show_graphs
Display graphs of the loss values each time the image is displayed. Disable this for local runtimes.
- approximate_vram_usage
Currently broken. Don’t believe its lies.
Perceptor Settings¶
- ViTB32, ViTB16, RN50, RN50x4…
Select which CLIP models to use for semantic perception. Multiple models may be selected. Each model requires significant VRAM.
- learning_rate
How quickly the image changes.
- reset_lr_each_frame
The optimizer will adaptively change the learning rate, so this will thwart it.
- seed
Pseudorandom seed. Using a fixed seed will make your process more deterministic, which can be useful for comparing how change specific settings impacts the generated images
- cutouts
The number of cutouts from the image that will be scored by the perceiver. Think of each cutout as a “glimpse” at the image. The more glimpses you give the perceptor, the better it will understand what it is looking at. Reduce this to use less VRAM at the cost of quality and speed.
- cut_pow
Should be positive. Large values shrink cutouts, making the image more detailed, small values expand the cutouts, making it more coherent. \(1\) is a good default. \(3\) or higher can cause crashes.
- cutout_border
Should be between \(0\) and \(1\). Allows cutouts to poke out over the edges of the image by this fraction of the image size, allowing better detail around the edges of the image. Set to \(0\) to disable. \(0.25\) is a good default.
- border_mode
how to fill cutouts that stick out over the edge of the image. Match with
infill_mode
for consistent infill.
clamp: move cutouts back onto image
mirror: reflect image over boundary
wrap: pull pixels from opposite side
black: fill with black
smear: sample closest pixel in image
- gradient_accumulation_steps
How many batches to use to process cutouts. Must divide
cutouts
evenly, defaults to \(1\). If you are using high cutouts and receiving VRAM errors, increasinggradient_accumulation_steps
may permit you to generate images without reducing the cutouts setting. Setting this higher than \(1\) will slow down the process proportionally.- models_parent_dir
Parent directory beneath which models will be downloaded. Defaults to
~/.cache/
, a hidden folder in your user namespace. E.g. the default storage location for the AdaBins model is~/.cache/adabins/AdaBins_nyu.pt