SDXL T2I Adapter - Brightness Control (100k @ 1024Γ1024)
A T2I Adapter model trained on Stable Diffusion XL to control image generation through brightness/grayscale information. This model is trained at native SDXL resolution (1024Γ1024) on 100,000 samples, providing a lightweight alternative to ControlNet with superior efficiency and excellent pattern preservation.
Model Description
This T2I Adapter enables brightness-based conditioning for SDXL image generation. By providing a grayscale image as input, you can control the brightness distribution and lighting structure while maintaining creative freedom through text prompts.
Key Features:
- π¨ Excellent brightness and pattern control at high conditioning scales (1.5-2.5)
- πΌοΈ Native SDXL resolution: Trained at 1024Γ1024 (not upscaled from 512Γ512)
- π 15x smaller than ControlNet: ~300MB vs ~4.7GB
- β‘ 50% faster inference: ~12 it/s vs ~8 it/s at 1024Γ1024
- π‘ Superior pattern preservation: Outperforms ControlNet at scales 1.5-2.5
- π Compatible with standard SDXL pipelines
Intended Uses:
- Artistic QR code generation (scale 1.5-2.0 recommended)
- Image recoloring and colorization
- Lighting control in text-to-image generation
- Brightness-based pattern integration
- Watermark and subtle pattern embedding
- Photo enhancement and stylization
π¬ Checkpoint Progression Analysis & Key Findings
We tested all checkpoints at multiple conditioning scales (0.25, 0.5, 0.75, 1.0, 1.25, 1.5) to understand the training progression. Here are the key findings:
Visual Comparisons
Scale 0.25 - Weak control, maximum artistic freedom
Scale 0.5 - Subtle control with strong prompt adherence
Scale 0.75 - Moderate control, balanced
Scale 1.0 - Standard control
Scale 1.25 - Strong control
Scale 1.5 - Maximum pattern preservation
Key Observations
Early Checkpoints (25k-50k samples):
- Best balance between pattern preservation and artistic prompt adherence
- Ideal for artistic QR codes where you want structure but also creativity
- Recommended scales: 0.75-1.25
Mid Checkpoint (75k samples):
- Stronger pattern control than early checkpoints
- Still maintains good prompt adherence
- Recommended scales: 1.0-1.5
Final Model (100k samples):
- Maximum pattern preservation capability
- At high scales (1.5+), pattern becomes very dominant
- Best for cases requiring precise brightness control
- May reduce artistic interpretation of prompts at very high scales
β οΈ Overfitting Observation
The checkpoint progression reveals clear signs of overfitting as training progresses. The final model (100k samples) shows excessive pattern dominance at high scales, losing the artistic balance present in earlier checkpoints. This suggests:
- Training should use fewer samples (25k-50k range optimal)
- Early stopping around checkpoint-782 (50k samples) provides the best balance
- Single epoch training on 100k samples is too much for this task
- Future training should target 25k-50k samples for better generalization
For most use cases, checkpoint-391 or checkpoint-782 are recommended over the final model, as they maintain artistic prompt adherence while still providing good pattern control.
Choosing the Right Checkpoint
- For artistic QR codes: Use checkpoint-391 or checkpoint-782 at scale 1.0-1.5
- For maximum pattern control: Use final model at scale 1.5-2.0
- For balanced results: Use checkpoint-782 or checkpoint-1173 at scale 0.75-1.25
- For subtle effects: Use any checkpoint at scale 0.25-0.5
Training Details
Training Data
Trained on 100,000 samples from latentcat/grayscale_image_aesthetic_3M:
- High-quality aesthetic images
- Paired with grayscale/brightness versions
- Native resolution: 1024Γ1024 (SDXL native, no upscaling)
Training Configuration
| Parameter | Value |
|---|---|
| Base Model | stabilityai/stable-diffusion-xl-base-1.0 |
| VAE | madebyollin/sdxl-vae-fp16-fix |
| Architecture | T2I Adapter Full XL (~77M parameters) |
| Training Resolution | 1024Γ1024 (native SDXL) |
| Training Steps | 1,563 (1 epoch) |
| Batch Size | 8 per device |
| Gradient Accumulation | 8 (effective batch: 64) |
| Learning Rate | 1e-5 |
| Mixed Precision | FP16 |
| Hardware | NVIDIA H100 80GB |
| Training Time | ~3 hours |
| Optimizer | 8-bit Adam |
| Final Loss | ~0.025 |
Model Size Comparison
| Model | Parameters | Size | Training | Resolution |
|---|---|---|---|---|
| This T2I Adapter | ~77M | 302MB | 100k @ 1024 | 1024Γ1024 |
| ControlNet (SDXL) | ~700M | 4.7GB | 100k @ 512 | 512Γ512 |
| T2I Adapter (10k) | ~77M | 302MB | 10k @ 1024 | 1024Γ1024 |
Usage
Installation
pip install diffusers transformers accelerate torch
Basic Usage
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
import torch
from PIL import Image
# Load T2I Adapter
adapter = T2IAdapter.from_pretrained(
"Oysiyl/t2i-adapter-brightness-sdxl",
torch_dtype=torch.float16
)
# Load SDXL pipeline
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
adapter=adapter,
torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")
# Load grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
control_image = control_image.resize((1024, 1024)) # Resize to 1024Γ1024
# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=control_image,
num_inference_steps=30,
adapter_conditioning_scale=1.5, # Higher scales work excellently!
guidance_scale=7.5,
height=1024,
width=1024,
).images[0]
image.save("output.png")
Artistic QR Code Generation
import qrcode
from PIL import Image
# Generate QR code
qr = qrcode.QRCode(
version=1,
error_correction=qrcode.constants.ERROR_CORRECT_H,
box_size=10,
border=4
)
qr.add_data("https://your-url.com")
qr.make(fit=True)
qr_image = qr.make_image(fill_color="black", back_color="white")
qr_image = qr_image.resize((1024, 1024), Image.LANCZOS).convert("RGB")
# Generate artistic QR code
image = pipe(
prompt="a beautiful garden with colorful flowers and butterflies, highly detailed, professional photography",
negative_prompt="blurry, low quality, distorted",
image=qr_image,
num_inference_steps=30,
adapter_conditioning_scale=2.0, # Strong pattern preservation!
guidance_scale=7.5,
height=1024,
width=1024,
).images[0]
image.save("artistic_qr.png")
Adapter Conditioning Scale Guide
The adapter_conditioning_scale parameter controls how strongly the brightness map influences generation:
Recommended Scale Ranges
| Scale | Behavior | Best For |
|---|---|---|
| 0.7-1.0 | Subtle artistic integration with hints of pattern | Natural images, soft lighting control |
| 1.0-1.5 | Balanced - visible structure with artistic elements | General purpose, recoloring |
| 1.5-2.0 | π₯ Excellent pattern preservation | Artistic QR codes, watermarks |
| 2.0-2.5 | Maximum control - strong patterns with artistic overlay | Strong geometric patterns |
Discovery: This model trained at native 1024Γ1024 resolution shows superior brightness pattern preservation at scales 1.5-2.5 compared to ControlNet trained at 512Γ512!
Performance Comparison
vs ControlNet (Both 100k Training)
| Metric | ControlNet | This T2I Adapter | Advantage |
|---|---|---|---|
| Parameters | ~700M | ~77M | 9x smaller |
| Model Size | 4.7GB | 302MB | 15x smaller |
| Training Resolution | 512Γ512 | 1024Γ1024 | Native SDXL |
| Training Time | ~49 min | ~3 hours | Comparable |
| Inference Speed @ 1024 | ~8 it/s | ~12 it/s | 50% faster |
| Time per Image | ~4 seconds | ~2.5 seconds | 1.6x faster |
| Pattern Preservation @ Scale 2.0 | Good | Excellent | Superior |
| Memory Requirement | Higher | Lower | Better efficiency |
vs T2I Adapter 10k Model
| Metric | 10k Model | This 100k Model | Improvement |
|---|---|---|---|
| Training Samples | 10,000 | 100,000 | 10x more data |
| Training Time | ~18 min | ~3 hours | Longer training |
| Final Loss | 0.0796 | 0.025 | 3x better |
| Pattern Quality | Good | Excellent | More consistent |
| Artistic Integration | Chaotic | Cleaner | Better control |
Key Findings
π Major Discovery
This T2I Adapter trained at native 1024Γ1024 resolution demonstrates:
- Superior brightness pattern preservation compared to ControlNet at conditioning scales 1.5-2.5
- Excellent QR code structure retention at scale 2.0 while maintaining artistic elements
- Faster inference with lower memory requirements (12 it/s vs 8 it/s on H100)
- Native SDXL resolution avoids upscaling artifacts present in 512Γ512 trained models
Evidence
At scale 2.0, this model:
- β Preserves strong black/white patterns from input QR codes
- β Maintains structural integrity while adding artistic elements
- β Shows controlled, non-chaotic pattern integration
- β Outperforms ControlNet for pattern-based tasks at high scales
Implications
- Ideal for artistic QR codes: Use scale 1.5-2.0 for best results
- Excellent for brightness-based pattern control: Watermarks, subtle patterns
- More efficient than ControlNet: Smaller, faster, same quality
- Native SDXL resolution: No upscaling needed, cleaner results
Available Checkpoints
This repository includes checkpoints from throughout training:
- Root directory: Final model (1,563 steps, 100k samples)
- checkpoint-391: 25% complete (~25k samples)
- checkpoint-782: 50% complete (~50k samples)
- checkpoint-1173: 75% complete (~75k samples)
All checkpoints are saved in the same format and can be loaded identically. See the Checkpoint Progression Analysis section above for detailed comparisons and recommendations.
When to Use This Model
β Use This T2I Adapter When:
- Creating artistic QR codes (scale 1.5-2.0)
- Need fast inference with limited VRAM (works on 16GB+ GPUs)
- Want native 1024Γ1024 generation without upscaling
- Require strong pattern preservation at high conditioning scales
- Building mobile/edge applications (smaller model size)
- Working with brightness-based patterns or watermarks
β οΈ Consider ControlNet When:
- Need precise control at low scales (0.5-1.0)
- Working with complex geometric patterns requiring sub-pixel precision
- Production applications with strict quality requirements
- Have abundant VRAM and storage (24GB+ GPU, 5GB+ storage)
Limitations
Current Limitations
- Overfitting: Final model (100k samples) shows overfitting; early checkpoints (25k-50k) recommended
- Training duration: Single epoch on 100k samples is excessive; optimal range is 25k-50k samples
- Scale dependency: Different checkpoints perform best at different scales
- Dataset bias: Trained on aesthetic images, may not generalize to all image types
- Pattern vs. Prompt balance: Later checkpoints prioritize pattern over artistic interpretation
For Production Use
Recommendations based on overfitting analysis:
- Use checkpoint-391 (25k) or checkpoint-782 (50k) instead of final model for best balance
- Target 25k-50k samples when training from scratch (avoid 100k)
- Early stopping is crucial to prevent overfitting
- Fine-tune early checkpoints on domain-specific data rather than training longer
- Test multiple checkpoints on your specific use cases to find the sweet spot
- Avoid multi-epoch training; single epoch on 25k-50k samples is sufficient
Training Script
This model was trained using the diffusers T2I Adapter training script:
accelerate launch --mixed_precision="fp16" train_t2i_adapter_sdxl.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
--pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
--dataset_name="<path_to_100k_dataset>" \
--conditioning_image_column="conditioning_image" \
--image_column="image" \
--caption_column="text" \
--output_dir="./t2i-adapter-brightness-sdxl-100k-1024" \
--mixed_precision="fp16" \
--resolution=1024 \
--learning_rate=1e-5 \
--train_batch_size=8 \
--gradient_accumulation_steps=8 \
--num_train_epochs=1 \
--checkpointing_steps=391 \
--validation_steps=391 \
--enable_xformers_memory_efficient_attention \
--use_8bit_adam \
--report_to="wandb"
Citation
@misc{t2i-adapter-brightness-sdxl-100k,
author = {Oysiyl},
title = {SDXL T2I Adapter - Brightness Control (100k @ 1024Γ1024)},
year = {2025},
publisher = {HuggingFace},
journal = {HuggingFace Model Hub},
howpublished = {\url{https://huggingface.co/Oysiyl/t2i-adapter-brightness-sdxl}}
}
Acknowledgments
- Built with π€ Diffusers
- Base model: Stable Diffusion XL by Stability AI
- Dataset: grayscale_image_aesthetic_3M by latentcat
- Training infrastructure: NVIDIA H100 80GB
License
Apache 2.0 License. The base SDXL model has separate license terms at stabilityai/stable-diffusion-xl-base-1.0.
- Downloads last month
- 4
Model tree for Oysiyl/t2i-adapter-brightness-sdxl
Base model
stabilityai/stable-diffusion-xl-base-1.0




