How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("nvidia/PixelDiT-1300M-1024px", dtype=torch.bfloat16, device_map="cuda")
pipe.load_lora_weights("madtune/pixeldit-diffusers")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

FourNeuron-PixelDiT Banner

PixelDiT 1.3B β€” Diffusers-Compatible Pipeline

Two RTX 3060s. Infinite Lore. Zero Fear.

Unofficial HuggingFace diffusers-compatible conversion of NVIDIA's PixelDiT-1300M-1024px with dual text encoder support (Gemma-2-2B + Qwen3-2B), LoRA training, and ComfyUI integration.

All credit for the model architecture and weights goes to NVIDIA Research. This repo provides the pipeline wrapper, Qwen encoder integration, LoRA tooling, and scripts.

I do not own this model. Original weights, architecture, and training are the work of NVIDIA Research. For non-commercial use only (NSCLv1).


Gallery β€” IP-Adapter style transfer (SigLIP only, no text prompt)

All generated with madtune/pixeldit-controlnet β€” IP-Adapter only, zero text conditioning.


What is PixelDiT?

PixelDiT is a 1.3B parameter pixel-space diffusion transformer β€” no VAE, generates images directly in pixel space. Runs on 4GB VRAM.

  • Architecture: MMDiT patch blocks + pixel pathway (PiT blocks)
  • Text encoders: Gemma-2-2B (photorealistic) or Qwen3-2B (creative/fantasy)
  • Native resolution: 1024Γ—1024 (non-square supported)
  • Samplers: Euler (default), Heun, LCM
  • Minimum steps: 45–50 β€” below 45 produces garbage output
  • LoRA: full PEFT-compatible LoRA training + inference

Install

python3 -m venv .venv && source .venv/bin/activate
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install "diffusers>=0.31.0" "transformers>=4.40.0,<5.0.0" accelerate safetensors pillow peft
git clone https://github.com/madtunebk/pixeldit-diffusers
cd pixeldit-diffusers
python scripts/setup_diffusers_pixeldit.py

Quick Start

# Gemma encoder (photorealistic, default)
python generate.py --prompt "a viking warrior on a cliff at sunset, cinematic"

# Portrait mode
python generate.py --height 1280 --width 768 --steps 60 --cfg 8.5 --prompt "your prompt"

# LCM fast mode (8 steps)
python generate.py --scheduler lcm --steps 8 --cfg 2.0 --prompt "your prompt"

Python API

import torch
from diffusers import PixelDiTPipeline


pipe = PixelDiTPipeline.from_pretrained("madtune/pixeldit-diffusers",  torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

image = pipe(
    "a viking warrior on a cliff overlooking the stormy sea at sunset",
    negative_prompt="blurry, low quality, deformed, watermark",
    height=1024, width=1024,
    num_inference_steps=50,
    guidance_scale=7.5,
).images[0]

image.save("out.jpg")

ComfyUI

ln -s /path/to/pixeldit-diffusers/comfyui_pixeldit /path/to/ComfyUI/custom_nodes/comfyui_pixeldit

Three nodes under PixelDiT category:

  • PixelDiT Text Encoder β€” load Gemma or any compatible encoder
  • PixelDiT Model Loader β€” loads transformer from HF
  • PixelDiT Sampler β€” prompt β†’ image, all params exposed

Scripts

Script Purpose
generate.py Main generation script
scripts/upscale_images.py RealESRGAN 4Γ— upscale before LoRA precompute
scripts/setup_diffusers_pixeldit.py Install pipeline into active venv's diffusers

Credits

  • Original model & all credit: NVIDIA Research
  • Paper: PixelDiT: Pixel-Space Diffusion Transformers for Text-to-Image Generation β€” NVIDIA
  • This repo: unofficial diffusers conversion, Qwen integration, LoRA tooling only
Downloads last month
1,498
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for madtune/pixeldit-diffusers

Adapter
(1)
this model
Adapters
1 model

Space using madtune/pixeldit-diffusers 1