Diffusers documentation
AutoencoderKLLTX2Audio
AutoencoderKLLTX2Audio
The 3D variational autoencoder (VAE) model with KL loss used in LTX-2 was introduced by Lightricks. This is for encoding and decoding audio latent representations.
The model can be loaded with the following code snippet.
from diffusers import AutoencoderKLLTX2Audio
vae = AutoencoderKLLTX2Audio.from_pretrained("Lightricks/LTX-2", subfolder="vae", torch_dtype=torch.float32).to("cuda")AutoencoderKLLTX2Audio
class diffusers.AutoencoderKLLTX2Audio
< source >( base_channels: int = 128 output_channels: int = 2 ch_mult: tuple = (1, 2, 4) num_res_blocks: int = 2 attn_resolutions: tuple[int, ...] | None = None in_channels: int = 2 resolution: int = 256 latent_channels: int = 8 norm_type: str = 'pixel' causality_axis: str | None = 'height' dropout: float = 0.0 mid_block_add_attention: bool = False sample_rate: int = 16000 mel_hop_length: int = 160 is_causal: bool = True mel_bins: int | None = 64 double_z: bool = True )
LTX2 audio VAE for encoding and decoding audio latent representations.
forward
< source >( sample: Tensor sample_posterior: bool = False return_dict: bool = True generator: torch._C.Generator | None = None )
Parameters
- sample (
torch.Tensor) — Input sample. - sample_posterior (
bool, optional, defaults toFalse) — Whether to sample from the posterior. - return_dict (
bool, optional, defaults toTrue) — Whether or not to return aDecoderOutputinstead of a plain tuple. - generator (
torch.Generator, optional) — Atorch.Generatorto make sampling deterministic.