SDXL T2I Adapter - Brightness Control (100k @ 1024×1024)

A T2I Adapter model trained on Stable Diffusion XL to control image generation through brightness/grayscale information. This model is trained at native SDXL resolution (1024×1024) on 100,000 samples, providing a lightweight alternative to ControlNet with superior efficiency and excellent pattern preservation.

Model Description

This T2I Adapter enables brightness-based conditioning for SDXL image generation. By providing a grayscale image as input, you can control the brightness distribution and lighting structure while maintaining creative freedom through text prompts.

Key Features:

🎨 Excellent brightness and pattern control at high conditioning scales (1.5-2.5)
🖼️ Native SDXL resolution: Trained at 1024×1024 (not upscaled from 512×512)
🚀 15x smaller than ControlNet: ~300MB vs ~4.7GB
⚡ 50% faster inference: ~12 it/s vs ~8 it/s at 1024×1024
💡 Superior pattern preservation: Outperforms ControlNet at scales 1.5-2.5
🔄 Compatible with standard SDXL pipelines

Intended Uses:

Artistic QR code generation (scale 1.5-2.0 recommended)
Image recoloring and colorization
Lighting control in text-to-image generation
Brightness-based pattern integration
Watermark and subtle pattern embedding
Photo enhancement and stylization

🔬 Checkpoint Progression Analysis & Key Findings

We tested all checkpoints at multiple conditioning scales (0.25, 0.5, 0.75, 1.0, 1.25, 1.5) to understand the training progression. Here are the key findings:

Visual Comparisons

Scale 0.25 - Weak control, maximum artistic freedom

Scale 0.5 - Subtle control with strong prompt adherence

Scale 0.75 - Moderate control, balanced

Scale 1.0 - Standard control

Scale 1.25 - Strong control

Scale 1.5 - Maximum pattern preservation

Key Observations

Early Checkpoints (25k-50k samples):
- Best balance between pattern preservation and artistic prompt adherence
- Ideal for artistic QR codes where you want structure but also creativity
- Recommended scales: 0.75-1.25
Mid Checkpoint (75k samples):
- Stronger pattern control than early checkpoints
- Still maintains good prompt adherence
- Recommended scales: 1.0-1.5
Final Model (100k samples):
- Maximum pattern preservation capability
- At high scales (1.5+), pattern becomes very dominant
- Best for cases requiring precise brightness control
- May reduce artistic interpretation of prompts at very high scales

⚠️ Overfitting Observation

The checkpoint progression reveals clear signs of overfitting as training progresses. The final model (100k samples) shows excessive pattern dominance at high scales, losing the artistic balance present in earlier checkpoints. This suggests:

Training should use fewer samples (25k-50k range optimal)
Early stopping around checkpoint-782 (50k samples) provides the best balance
Single epoch training on 100k samples is too much for this task
Future training should target 25k-50k samples for better generalization

For most use cases, checkpoint-391 or checkpoint-782 are recommended over the final model, as they maintain artistic prompt adherence while still providing good pattern control.

Choosing the Right Checkpoint

For artistic QR codes: Use checkpoint-391 or checkpoint-782 at scale 1.0-1.5
For maximum pattern control: Use final model at scale 1.5-2.0
For balanced results: Use checkpoint-782 or checkpoint-1173 at scale 0.75-1.25
For subtle effects: Use any checkpoint at scale 0.25-0.5

Training Details

Training Data

Trained on 100,000 samples from latentcat/grayscale_image_aesthetic_3M:

High-quality aesthetic images
Paired with grayscale/brightness versions
Native resolution: 1024×1024 (SDXL native, no upscaling)

Training Configuration

Parameter	Value
Base Model	`stabilityai/stable-diffusion-xl-base-1.0`
VAE	`madebyollin/sdxl-vae-fp16-fix`
Architecture	T2I Adapter Full XL (~77M parameters)
Training Resolution	1024×1024 (native SDXL)
Training Steps	1,563 (1 epoch)
Batch Size	8 per device
Gradient Accumulation	8 (effective batch: 64)
Learning Rate	1e-5
Mixed Precision	FP16
Hardware	NVIDIA H100 80GB
Training Time	~3 hours
Optimizer	8-bit Adam
Final Loss	~0.025

Model Size Comparison

Model	Parameters	Size	Training	Resolution
This T2I Adapter	~77M	302MB	100k @ 1024	1024×1024
ControlNet (SDXL)	~700M	4.7GB	100k @ 512	512×512
T2I Adapter (10k)	~77M	302MB	10k @ 1024	1024×1024

Usage

Installation

pip install diffusers transformers accelerate torch

Basic Usage

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
import torch
from PIL import Image

# Load T2I Adapter
adapter = T2IAdapter.from_pretrained(
    "Oysiyl/t2i-adapter-brightness-sdxl",
    torch_dtype=torch.float16
)

# Load SDXL pipeline
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    adapter=adapter,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")

# Load grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
control_image = control_image.resize((1024, 1024))  # Resize to 1024×1024

# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=control_image,
    num_inference_steps=30,
    adapter_conditioning_scale=1.5,  # Higher scales work excellently!
    guidance_scale=7.5,
    height=1024,
    width=1024,
).images[0]

image.save("output.png")

Artistic QR Code Generation

import qrcode
from PIL import Image

# Generate QR code
qr = qrcode.QRCode(
    version=1,
    error_correction=qrcode.constants.ERROR_CORRECT_H,
    box_size=10,
    border=4
)
qr.add_data("https://your-url.com")
qr.make(fit=True)

qr_image = qr.make_image(fill_color="black", back_color="white")
qr_image = qr_image.resize((1024, 1024), Image.LANCZOS).convert("RGB")

# Generate artistic QR code
image = pipe(
    prompt="a beautiful garden with colorful flowers and butterflies, highly detailed, professional photography",
    negative_prompt="blurry, low quality, distorted",
    image=qr_image,
    num_inference_steps=30,
    adapter_conditioning_scale=2.0,  # Strong pattern preservation!
    guidance_scale=7.5,
    height=1024,
    width=1024,
).images[0]

image.save("artistic_qr.png")

Adapter Conditioning Scale Guide

The adapter_conditioning_scale parameter controls how strongly the brightness map influences generation:

Recommended Scale Ranges

Scale	Behavior	Best For
0.7-1.0	Subtle artistic integration with hints of pattern	Natural images, soft lighting control
1.0-1.5	Balanced - visible structure with artistic elements	General purpose, recoloring
1.5-2.0	🔥 Excellent pattern preservation	Artistic QR codes, watermarks
2.0-2.5	Maximum control - strong patterns with artistic overlay	Strong geometric patterns

Discovery: This model trained at native 1024×1024 resolution shows superior brightness pattern preservation at scales 1.5-2.5 compared to ControlNet trained at 512×512!

Performance Comparison

vs ControlNet (Both 100k Training)

Metric	ControlNet	This T2I Adapter	Advantage
Parameters	~700M	~77M	9x smaller
Model Size	4.7GB	302MB	15x smaller
Training Resolution	512×512	1024×1024	Native SDXL
Training Time	~49 min	~3 hours	Comparable
Inference Speed @ 1024	~8 it/s	~12 it/s	50% faster
Time per Image	~4 seconds	~2.5 seconds	1.6x faster
Pattern Preservation @ Scale 2.0	Good	Excellent	Superior
Memory Requirement	Higher	Lower	Better efficiency

vs T2I Adapter 10k Model

Metric	10k Model	This 100k Model	Improvement
Training Samples	10,000	100,000	10x more data
Training Time	~18 min	~3 hours	Longer training
Final Loss	0.0796	0.025	3x better
Pattern Quality	Good	Excellent	More consistent
Artistic Integration	Chaotic	Cleaner	Better control

Key Findings

🎉 Major Discovery

This T2I Adapter trained at native 1024×1024 resolution demonstrates:

Superior brightness pattern preservation compared to ControlNet at conditioning scales 1.5-2.5
Excellent QR code structure retention at scale 2.0 while maintaining artistic elements
Faster inference with lower memory requirements (12 it/s vs 8 it/s on H100)
Native SDXL resolution avoids upscaling artifacts present in 512×512 trained models

Evidence

At scale 2.0, this model:

✅ Preserves strong black/white patterns from input QR codes
✅ Maintains structural integrity while adding artistic elements
✅ Shows controlled, non-chaotic pattern integration
✅ Outperforms ControlNet for pattern-based tasks at high scales

Implications

Ideal for artistic QR codes: Use scale 1.5-2.0 for best results
Excellent for brightness-based pattern control: Watermarks, subtle patterns
More efficient than ControlNet: Smaller, faster, same quality
Native SDXL resolution: No upscaling needed, cleaner results

Available Checkpoints

This repository includes checkpoints from throughout training:

Root directory: Final model (1,563 steps, 100k samples)
checkpoint-391: 25% complete (~25k samples)
checkpoint-782: 50% complete (~50k samples)
checkpoint-1173: 75% complete (~75k samples)

All checkpoints are saved in the same format and can be loaded identically. See the Checkpoint Progression Analysis section above for detailed comparisons and recommendations.

When to Use This Model

✅ Use This T2I Adapter When:

Creating artistic QR codes (scale 1.5-2.0)
Need fast inference with limited VRAM (works on 16GB+ GPUs)
Want native 1024×1024 generation without upscaling
Require strong pattern preservation at high conditioning scales
Building mobile/edge applications (smaller model size)
Working with brightness-based patterns or watermarks

⚠️ Consider ControlNet When:

Need precise control at low scales (0.5-1.0)
Working with complex geometric patterns requiring sub-pixel precision
Production applications with strict quality requirements
Have abundant VRAM and storage (24GB+ GPU, 5GB+ storage)

Limitations

Current Limitations

Overfitting: Final model (100k samples) shows overfitting; early checkpoints (25k-50k) recommended
Training duration: Single epoch on 100k samples is excessive; optimal range is 25k-50k samples
Scale dependency: Different checkpoints perform best at different scales
Dataset bias: Trained on aesthetic images, may not generalize to all image types
Pattern vs. Prompt balance: Later checkpoints prioritize pattern over artistic interpretation

For Production Use

Recommendations based on overfitting analysis:

Use checkpoint-391 (25k) or checkpoint-782 (50k) instead of final model for best balance
Target 25k-50k samples when training from scratch (avoid 100k)
Early stopping is crucial to prevent overfitting
Fine-tune early checkpoints on domain-specific data rather than training longer
Test multiple checkpoints on your specific use cases to find the sweet spot
Avoid multi-epoch training; single epoch on 25k-50k samples is sufficient

Training Script

This model was trained using the diffusers T2I Adapter training script:

accelerate launch --mixed_precision="fp16" train_t2i_adapter_sdxl.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
  --dataset_name="<path_to_100k_dataset>" \
  --conditioning_image_column="conditioning_image" \
  --image_column="image" \
  --caption_column="text" \
  --output_dir="./t2i-adapter-brightness-sdxl-100k-1024" \
  --mixed_precision="fp16" \
  --resolution=1024 \
  --learning_rate=1e-5 \
  --train_batch_size=8 \
  --gradient_accumulation_steps=8 \
  --num_train_epochs=1 \
  --checkpointing_steps=391 \
  --validation_steps=391 \
  --enable_xformers_memory_efficient_attention \
  --use_8bit_adam \
  --report_to="wandb"

Citation

@misc{t2i-adapter-brightness-sdxl-100k,
  author = {Oysiyl},
  title = {SDXL T2I Adapter - Brightness Control (100k @ 1024×1024)},
  year = {2025},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/Oysiyl/t2i-adapter-brightness-sdxl}}
}

Acknowledgments

Built with 🤗 Diffusers
Base model: Stable Diffusion XL by Stability AI
Dataset: grayscale_image_aesthetic_3M by latentcat
Training infrastructure: NVIDIA H100 80GB

License

Apache 2.0 License. The base SDXL model has separate license terms at stabilityai/stable-diffusion-xl-base-1.0.

Downloads last month: 4

Model tree for Oysiyl/t2i-adapter-brightness-sdxl

Base model

stabilityai/stable-diffusion-xl-base-1.0

Adapter

(7867)

this model