diffusers
/

Qwen-Image-Layered-modular

modular-diffusers

qwenimage-layered

Model card Files Files and versions

Qwen-Image-Layered-modular / README.md

YiYiXu's picture

YiYiXu HF Staff

Upload QwenImageLayeredModularPipeline (#4)

fa40986 3 months ago

|

history blame contribute delete

3.19 kB

	---
	library_name: diffusers
	tags:
	- modular-diffusers
	- diffusers
	- qwenimage-layered
	- text-to-image
	- modular-diffusers
	- diffusers
	- qwenimage-layered
	- text-to-image
	---
	This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.

	Pipeline Type: QwenImageLayeredAutoBlocks

	Description: Auto Modular pipeline for layered denoising tasks using QwenImage-Layered.

	This pipeline uses a 4-block architecture that can be customized and extended.

	## Example Usage

	[TODO]

	## Pipeline Architecture

	This modular pipeline is composed of the following blocks:

	1. text_encoder (`QwenImageLayeredTextEncoderStep`)
	- QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided.
	2. vae_encoder (`QwenImageLayeredVaeEncoderStep`)
	- Vae encoder step that encode the image inputs into their latent representations.
	3. denoise (`QwenImageLayeredCoreDenoiseStep`)
	- Core denoising workflow for QwenImage-Layered img2img task.
	4. decode (`QwenImageLayeredDecoderStep`)
	- Decode unpacked latents (B, C, layers+1, H, W) into layer images.

	## Model Components

	1. image_resize_processor (`VaeImageProcessor`)
	2. text_encoder (`Qwen2_5_VLForConditionalGeneration`)
	3. processor (`Qwen2VLProcessor`)
	4. tokenizer (`Qwen2Tokenizer`): The tokenizer to use
	5. guider (`ClassifierFreeGuidance`)
	6. image_processor (`VaeImageProcessor`)
	7. vae (`AutoencoderKLQwenImage`)
	8. pachifier (`QwenImageLayeredPachifier`)
	9. scheduler (`FlowMatchEulerDiscreteScheduler`)
	10. transformer (`QwenImageTransformer2DModel`)

	## Input/Output Specification

	Inputs:

	- `image` (`Image \| list`): Reference image(s) for denoising. Can be a single image or list of images.
	- `resolution` (`int`, optional, defaults to `640`): The target area to resize the image to, can be 1024 or 640
	- `prompt` (`str`, optional): The prompt or prompts to guide image generation.
	- `use_en_prompt` (`bool`, optional, defaults to `False`): Whether to use English prompt template
	- `negative_prompt` (`str`, optional): The prompt or prompts not to guide the image generation.
	- `max_sequence_length` (`int`, optional, defaults to `1024`): Maximum sequence length for prompt encoding.
	- `generator` (`Generator`, optional): Torch generator for deterministic generation.
	- `num_images_per_prompt` (`int`, optional, defaults to `1`): The number of images to generate per prompt.
	- `latents` (`Tensor`, optional): Pre-generated noisy latents for image generation.
	- `layers` (`int`, optional, defaults to `4`): Number of layers to extract from the image
	- `num_inference_steps` (`int`, optional, defaults to `50`): The number of denoising steps.
	- `sigmas` (`list`, optional): Custom sigmas for the denoising process.
	- `attention_kwargs` (`dict`, optional): Additional kwargs for attention processors.
	- `*denoiser_input_fields` (`None`, optional*): conditional model inputs for the denoiser: e.g. prompt_embeds, negative_prompt_embeds, etc.
	- `output_type` (`str`, optional, defaults to `pil`): Output format: 'pil', 'np', 'pt'.

	Outputs:

	- `images` (`list`): Generated images.