AutoencoderKLLTX2Audio

The 3D variational autoencoder (VAE) model with KL loss used in LTX-2 was introduced by Lightricks. This is for encoding and decoding audio latent representations.

The model can be loaded with the following code snippet.

from diffusers import AutoencoderKLLTX2Audio

vae = AutoencoderKLLTX2Audio.from_pretrained("Lightricks/LTX-2", subfolder="vae", torch_dtype=torch.float32).to("cuda")

AutoencoderKLLTX2Audio

class diffusers.AutoencoderKLLTX2Audio

< source >

( base_channels: int = 128output_channels: int = 2ch_mult: tuple = (1, 2, 4)num_res_blocks: int = 2attn_resolutions: tuple[int, ...] | None = Nonein_channels: int = 2resolution: int = 256latent_channels: int = 8norm_type: str = 'pixel'causality_axis: str | None = 'height'dropout: float = 0.0mid_block_add_attention: bool = Falsesample_rate: int = 16000mel_hop_length: int = 160is_causal: bool = Truemel_bins: int | None = 64double_z: bool = True )

LTX2 audio VAE for encoding and decoding audio latent representations.

encode

< source >

( x: Tensorreturn_dict: bool = True )

decode

< source >

( z: Tensorreturn_dict: bool = True )

forward

< source >

( sample: Tensorsample_posterior: bool = Falsereturn_dict: bool = Truegenerator: typing.Optional[torch.Generator] = None ) → DecoderOutput or tuple

Parameters

sample (torch.Tensor) — Input sample.
sample_posterior (bool, optional, defaults to False) — Whether to sample from the posterior.
return_dict (bool, optional, defaults to True) — Whether or not to return a DecoderOutput instead of a plain tuple.
generator (torch.Generator, optional) — A torch.Generator to make sampling deterministic.

Returns

DecoderOutput or tuple

If return_dict is True, a DecoderOutput is returned, otherwise a plain tuple is returned.

Update on GitHub

Diffusers

AutoencoderKLLTX2Audio