Instructions to use diffusers/Qwen-Image-Layered-modular with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use diffusers/Qwen-Image-Layered-modular with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("diffusers/Qwen-Image-Layered-modular", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| library_name: diffusers | |
| tags: | |
| - modular-diffusers | |
| - diffusers | |
| - qwenimage-layered | |
| - text-to-image | |
| - modular-diffusers | |
| - diffusers | |
| - qwenimage-layered | |
| - text-to-image | |
| This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework. | |
| **Pipeline Type**: QwenImageLayeredAutoBlocks | |
| **Description**: Auto Modular pipeline for layered denoising tasks using QwenImage-Layered. | |
| This pipeline uses a 4-block architecture that can be customized and extended. | |
| ## Example Usage | |
| [TODO] | |
| ## Pipeline Architecture | |
| This modular pipeline is composed of the following blocks: | |
| 1. **text_encoder** (`QwenImageLayeredTextEncoderStep`) | |
| - QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided. | |
| 2. **vae_encoder** (`QwenImageLayeredVaeEncoderStep`) | |
| - Vae encoder step that encode the image inputs into their latent representations. | |
| 3. **denoise** (`QwenImageLayeredCoreDenoiseStep`) | |
| - Core denoising workflow for QwenImage-Layered img2img task. | |
| 4. **decode** (`QwenImageLayeredDecoderStep`) | |
| - Decode unpacked latents (B, C, layers+1, H, W) into layer images. | |
| ## Model Components | |
| 1. image_resize_processor (`VaeImageProcessor`) | |
| 2. text_encoder (`Qwen2_5_VLForConditionalGeneration`) | |
| 3. processor (`Qwen2VLProcessor`) | |
| 4. tokenizer (`Qwen2Tokenizer`): The tokenizer to use | |
| 5. guider (`ClassifierFreeGuidance`) | |
| 6. image_processor (`VaeImageProcessor`) | |
| 7. vae (`AutoencoderKLQwenImage`) | |
| 8. pachifier (`QwenImageLayeredPachifier`) | |
| 9. scheduler (`FlowMatchEulerDiscreteScheduler`) | |
| 10. transformer (`QwenImageTransformer2DModel`) | |
| ## Input/Output Specification | |
| **Inputs:** | |
| - `image` (`Image | list`): Reference image(s) for denoising. Can be a single image or list of images. | |
| - `resolution` (`int`, *optional*, defaults to `640`): The target area to resize the image to, can be 1024 or 640 | |
| - `prompt` (`str`, *optional*): The prompt or prompts to guide image generation. | |
| - `use_en_prompt` (`bool`, *optional*, defaults to `False`): Whether to use English prompt template | |
| - `negative_prompt` (`str`, *optional*): The prompt or prompts not to guide the image generation. | |
| - `max_sequence_length` (`int`, *optional*, defaults to `1024`): Maximum sequence length for prompt encoding. | |
| - `generator` (`Generator`, *optional*): Torch generator for deterministic generation. | |
| - `num_images_per_prompt` (`int`, *optional*, defaults to `1`): The number of images to generate per prompt. | |
| - `latents` (`Tensor`, *optional*): Pre-generated noisy latents for image generation. | |
| - `layers` (`int`, *optional*, defaults to `4`): Number of layers to extract from the image | |
| - `num_inference_steps` (`int`, *optional*, defaults to `50`): The number of denoising steps. | |
| - `sigmas` (`list`, *optional*): Custom sigmas for the denoising process. | |
| - `attention_kwargs` (`dict`, *optional*): Additional kwargs for attention processors. | |
| - `**denoiser_input_fields` (`None`, *optional*): conditional model inputs for the denoiser: e.g. prompt_embeds, negative_prompt_embeds, etc. | |
| - `output_type` (`str`, *optional*, defaults to `pil`): Output format: 'pil', 'np', 'pt'. | |
| **Outputs:** | |
| - `images` (`list`): Generated images. | |