vaibhavpandeyvpz commited on
Commit
b4e0f77
·
1 Parent(s): 470abea

Setup Flux.2 image generation

Browse files
Files changed (4) hide show
  1. .gitignore +64 -0
  2. README.md +48 -4
  3. app.py +358 -96
  4. requirements.txt +12 -5
.gitignore ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual environments
24
+ venv/
25
+ env/
26
+ ENV/
27
+ .venv
28
+
29
+ # IDE
30
+ .vscode/
31
+ .idea/
32
+ *.swp
33
+ *.swo
34
+ *~
35
+ .DS_Store
36
+
37
+ # Jupyter Notebook
38
+ .ipynb_checkpoints
39
+
40
+ # Environment variables
41
+ .env
42
+ .env.local
43
+
44
+ # Model cache
45
+ .cache/
46
+ models/
47
+ *.safetensors
48
+ *.bin
49
+ *.pt
50
+ *.pth
51
+
52
+ # Logs
53
+ *.log
54
+ logs/
55
+
56
+ # Temporary files
57
+ *.tmp
58
+ *.temp
59
+ temp/
60
+ tmp/
61
+
62
+ # Hugging Face cache
63
+ .huggingface/
64
+
README.md CHANGED
@@ -1,14 +1,58 @@
1
  ---
2
- title: Flux.2 Text To Image
3
  emoji: 🖼
4
  colorFrom: purple
5
  colorTo: red
6
  sdk: gradio
7
- sdk_version: 5.44.0
8
  app_file: app.py
9
- pinned: false
10
  license: apache-2.0
11
- short_description: Generate production-grade AI images using Flux.2 [dev]
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: FLUX.2 Text to Image
3
  emoji: 🖼
4
  colorFrom: purple
5
  colorTo: red
6
  sdk: gradio
7
+ sdk_version: 6.0.2
8
  app_file: app.py
9
+ pinned: true
10
  license: apache-2.0
11
+ short_description: Generate production-grade AI images using FLUX.2 [dev]
12
  ---
13
 
14
+ # FLUX.2 [dev] Text-to-Image
15
+
16
+ Generate high-quality images using FLUX.2 [dev], a 32B parameter rectified flow model by Black Forest Labs.
17
+
18
+ ## Features
19
+
20
+ - **Text-to-Image Generation**: Create images from text prompts
21
+ - **Image Editing**: Upload images for editing and manipulation
22
+ - **Image Combining**: Combine multiple images based on text instructions
23
+ - **Prompt Upsampling**: Automatically enhance prompts using a VLM (optional)
24
+ - **ZeroGPU Support**: Optimized for ZeroGPU inference
25
+ - **Advanced Controls**: Fine-tune generation with seed, guidance scale, inference steps, and dimensions
26
+
27
+ ## Setup
28
+
29
+ This Space requires:
30
+ - **ZeroGPU**: Enable ZeroGPU in your Space settings
31
+ - **HF_TOKEN**: Set your Hugging Face token as a Space secret for gated model access
32
+ - Go to Settings → Secrets → Add `HF_TOKEN` with your Hugging Face token
33
+
34
+ ## Model
35
+
36
+ - **Model**: [black-forest-labs/FLUX.2-dev](https://huggingface.co/black-forest-labs/FLUX.2-dev)
37
+ - **Blog**: [FLUX.2 Announcement](https://bfl.ai/blog/flux-2)
38
+
39
+ ## Usage
40
+
41
+ 1. Enter a text prompt describing the image you want to generate
42
+ 2. (Optional) Upload one or more images for editing/combining
43
+ 3. Adjust advanced settings as needed:
44
+ - **Prompt Upsampling**: Enable to automatically enhance your prompt
45
+ - **Seed**: Control randomness (use randomize for variety)
46
+ - **Dimensions**: Width and height (must be multiples of 8)
47
+ - **Inference Steps**: More steps = higher quality but slower (default: 30)
48
+ - **Guidance Scale**: How closely to follow the prompt (default: 4.0)
49
+ 4. Click "Run" to generate your image
50
+
51
+ ## Notes
52
+
53
+ - The model uses pre-compiled blocks for ZeroGPU optimization
54
+ - Prompt upsampling requires a valid `HF_TOKEN` secret
55
+ - Image dimensions are automatically adjusted when uploading images
56
+ - Supports image editing and combining multiple images
57
+
58
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -1,151 +1,413 @@
 
 
 
 
1
  import gradio as gr
2
  import numpy as np
3
  import random
4
-
5
- # import spaces #[uncomment to use ZeroGPU]
6
- from diffusers import DiffusionPipeline
7
  import torch
 
 
 
 
 
8
 
9
- device = "cuda" if torch.cuda.is_available() else "cpu"
10
- model_repo_id = "stabilityai/sdxl-turbo" # Replace to the model you would like to use
11
-
12
- if torch.cuda.is_available():
13
- torch_dtype = torch.float16
14
- else:
15
- torch_dtype = torch.float32
16
 
17
- pipe = DiffusionPipeline.from_pretrained(model_repo_id, torch_dtype=torch_dtype)
18
- pipe = pipe.to(device)
19
 
20
  MAX_SEED = np.iinfo(np.int32).max
21
  MAX_IMAGE_SIZE = 1024
22
 
 
 
23
 
24
- # @spaces.GPU #[uncomment to use ZeroGPU]
25
- def infer(
26
- prompt,
27
- negative_prompt,
28
- seed,
29
- randomize_seed,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  width,
31
  height,
 
32
  guidance_scale,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  num_inference_steps,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  progress=gr.Progress(track_tqdm=True),
35
  ):
 
36
  if randomize_seed:
37
  seed = random.randint(0, MAX_SEED)
38
 
39
- generator = torch.Generator().manual_seed(seed)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
- image = pipe(
42
- prompt=prompt,
43
- negative_prompt=negative_prompt,
44
- guidance_scale=guidance_scale,
45
- num_inference_steps=num_inference_steps,
46
- width=width,
47
- height=height,
48
- generator=generator,
49
- ).images[0]
 
 
 
50
 
51
  return image, seed
52
 
53
 
54
  examples = [
55
- "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
56
- "An astronaut riding a green horse",
57
- "A delicious ceviche cheesecake slice",
 
 
 
 
 
 
58
  ]
59
 
60
  css = """
61
  #col-container {
62
  margin: 0 auto;
63
- max-width: 640px;
 
 
 
64
  }
65
  """
66
 
67
  with gr.Blocks(css=css) as demo:
68
  with gr.Column(elem_id="col-container"):
69
- gr.Markdown(" # Text-to-Image Gradio Template")
 
 
 
 
70
 
71
  with gr.Row():
72
- prompt = gr.Text(
73
- label="Prompt",
74
- show_label=False,
75
- max_lines=1,
76
- placeholder="Enter your prompt",
77
- container=False,
78
- )
79
-
80
- run_button = gr.Button("Run", scale=0, variant="primary")
81
-
82
- result = gr.Image(label="Result", show_label=False)
83
-
84
- with gr.Accordion("Advanced Settings", open=False):
85
- negative_prompt = gr.Text(
86
- label="Negative prompt",
87
- max_lines=1,
88
- placeholder="Enter a negative prompt",
89
- visible=False,
90
- )
91
-
92
- seed = gr.Slider(
93
- label="Seed",
94
- minimum=0,
95
- maximum=MAX_SEED,
96
- step=1,
97
- value=0,
98
- )
99
-
100
- randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
101
-
102
- with gr.Row():
103
- width = gr.Slider(
104
- label="Width",
105
- minimum=256,
106
- maximum=MAX_IMAGE_SIZE,
107
- step=32,
108
- value=1024, # Replace with defaults that work for your model
109
- )
110
 
111
- height = gr.Slider(
112
- label="Height",
113
- minimum=256,
114
- maximum=MAX_IMAGE_SIZE,
115
- step=32,
116
- value=1024, # Replace with defaults that work for your model
117
- )
118
 
119
- with gr.Row():
120
- guidance_scale = gr.Slider(
121
- label="Guidance scale",
122
- minimum=0.0,
123
- maximum=10.0,
124
- step=0.1,
125
- value=0.0, # Replace with defaults that work for your model
126
- )
127
 
128
- num_inference_steps = gr.Slider(
129
- label="Number of inference steps",
130
- minimum=1,
131
- maximum=50,
132
- step=1,
133
- value=2, # Replace with defaults that work for your model
134
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
- gr.Examples(examples=examples, inputs=[prompt])
137
  gr.on(
138
  triggers=[run_button.click, prompt.submit],
139
  fn=infer,
140
  inputs=[
141
  prompt,
142
- negative_prompt,
143
  seed,
144
  randomize_seed,
145
  width,
146
  height,
147
- guidance_scale,
148
  num_inference_steps,
 
 
149
  ],
150
  outputs=[result, seed],
151
  )
 
1
+ import os
2
+ import subprocess
3
+ import sys
4
+ import io
5
  import gradio as gr
6
  import numpy as np
7
  import random
8
+ import spaces
 
 
9
  import torch
10
+ from diffusers import Flux2Pipeline, Flux2Transformer2DModel
11
+ import requests
12
+ from PIL import Image
13
+ import base64
14
+ from huggingface_hub import InferenceClient
15
 
16
+ # Install spaces if needed
17
+ try:
18
+ import spaces
19
+ except ImportError:
20
+ subprocess.check_call([sys.executable, "-m", "pip", "install", "spaces==0.43.0"])
21
+ import spaces
 
22
 
23
+ dtype = torch.bfloat16
24
+ device = "cuda" if torch.cuda.is_available() else "cpu"
25
 
26
  MAX_SEED = np.iinfo(np.int32).max
27
  MAX_IMAGE_SIZE = 1024
28
 
29
+ # Hugging Face token for gated repo authentication
30
+ HF_TOKEN = os.environ.get("HF_TOKEN", os.environ.get("HUGGING_FACE_HUB_TOKEN"))
31
 
32
+ hf_client = (
33
+ InferenceClient(
34
+ api_key=HF_TOKEN,
35
+ )
36
+ if HF_TOKEN
37
+ else None
38
+ )
39
+
40
+ VLM_MODEL = "baidu/ERNIE-4.5-VL-424B-A47B-Base-PT"
41
+
42
+ SYSTEM_PROMPT_TEXT_ONLY = """You are an expert prompt engineer for FLUX.2 by Black Forest Labs. Rewrite user prompts to be more descriptive while strictly preserving their core subject and intent.
43
+
44
+ Guidelines:
45
+ 1. Structure: Keep structured inputs structured (enhance within fields). Convert natural language to detailed paragraphs.
46
+ 2. Details: Add concrete visual specifics - form, scale, textures, materials, lighting (quality, direction, color), shadows, spatial relationships, and environmental context.
47
+ 3. Text in Images: Put ALL text in quotation marks, matching the prompt's language. Always provide explicit quoted text for objects that would contain text in reality (signs, labels, screens, etc.) - without it, the model generates gibberish.
48
+
49
+ Output only the revised prompt and nothing else."""
50
+
51
+ SYSTEM_PROMPT_WITH_IMAGES = """You are FLUX.2 by Black Forest Labs, an image-editing expert. You convert editing requests into one concise instruction (50-80 words, ~30 for brief requests).
52
+
53
+ Rules:
54
+ - Single instruction only, no commentary
55
+ - Use clear, analytical language (avoid "whimsical," "cascading," etc.)
56
+ - Specify what changes AND what stays the same (face, lighting, composition)
57
+ - Reference actual image elements
58
+ - Turn negatives into positives ("don't change X" → "keep X")
59
+ - Make abstractions concrete ("futuristic" → "glowing cyan neon, metallic panels")
60
+ - Keep content PG-13
61
+
62
+ Output only the final instruction in plain text and nothing else."""
63
+
64
+
65
+ def remote_text_encoder(prompts):
66
+ from gradio_client import Client
67
+
68
+ client = Client("multimodalart/mistral-text-encoder")
69
+ result = client.predict(prompt=prompts, api_name="/encode_text")
70
+
71
+ # Load returns a tensor, usually on CPU by default
72
+ prompt_embeds = torch.load(result[0])
73
+ return prompt_embeds
74
+
75
+
76
+ # Load model
77
+ repo_id = "black-forest-labs/FLUX.2-dev"
78
+
79
+ print("Loading Flux.2 model...")
80
+ dit = Flux2Transformer2DModel.from_pretrained(
81
+ repo_id,
82
+ subfolder="transformer",
83
+ torch_dtype=torch.bfloat16,
84
+ token=HF_TOKEN,
85
+ )
86
+
87
+ pipe = Flux2Pipeline.from_pretrained(
88
+ repo_id,
89
+ text_encoder=None,
90
+ transformer=dit,
91
+ torch_dtype=torch.bfloat16,
92
+ token=HF_TOKEN,
93
+ )
94
+ pipe.to(device)
95
+
96
+ # Pull pre-compiled Flux2 Transformer blocks from HF hub for ZeroGPU
97
+ print("Loading pre-compiled blocks for ZeroGPU...")
98
+ spaces.aoti_blocks_load(pipe.transformer, "zerogpu-aoti/FLUX.2", variant="fa3")
99
+
100
+
101
+ def image_to_data_uri(img):
102
+ buffered = io.BytesIO()
103
+ img.save(buffered, format="PNG")
104
+ img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
105
+ return f"data:image/png;base64,{img_str}"
106
+
107
+
108
+ def upsample_prompt_logic(prompt, image_list):
109
+ """Upsample prompt using VLM if available"""
110
+ if not hf_client:
111
+ return prompt
112
+
113
+ try:
114
+ if image_list and len(image_list) > 0:
115
+ # Image + Text Editing Mode
116
+ system_content = SYSTEM_PROMPT_WITH_IMAGES
117
+
118
+ # Construct user message with text and images
119
+ user_content = [{"type": "text", "text": prompt}]
120
+
121
+ for img in image_list:
122
+ data_uri = image_to_data_uri(img)
123
+ user_content.append(
124
+ {"type": "image_url", "image_url": {"url": data_uri}}
125
+ )
126
+
127
+ messages = [
128
+ {"role": "system", "content": system_content},
129
+ {"role": "user", "content": user_content},
130
+ ]
131
+ else:
132
+ # Text Only Mode
133
+ system_content = SYSTEM_PROMPT_TEXT_ONLY
134
+ messages = [
135
+ {"role": "system", "content": system_content},
136
+ {"role": "user", "content": prompt},
137
+ ]
138
+
139
+ completion = hf_client.chat.completions.create(
140
+ model=VLM_MODEL, messages=messages, max_tokens=1024
141
+ )
142
+
143
+ return completion.choices[0].message.content
144
+ except Exception as e:
145
+ print(f"Upsampling failed: {e}")
146
+ return prompt
147
+
148
+
149
+ def update_dimensions_from_image(image_list):
150
+ """Update width/height sliders based on uploaded image aspect ratio.
151
+ Keeps one side at 1024 and scales the other proportionally, with both sides as multiples of 8.
152
+ """
153
+ if image_list is None or len(image_list) == 0:
154
+ return 1024, 1024 # Default dimensions
155
+
156
+ # Get the first image to determine dimensions
157
+ img = image_list[0][0] # Gallery returns list of tuples (image, caption)
158
+ img_width, img_height = img.size
159
+
160
+ aspect_ratio = img_width / img_height
161
+
162
+ if aspect_ratio >= 1: # Landscape or square
163
+ new_width = 1024
164
+ new_height = int(1024 / aspect_ratio)
165
+ else: # Portrait
166
+ new_height = 1024
167
+ new_width = int(1024 * aspect_ratio)
168
+
169
+ # Round to nearest multiple of 8
170
+ new_width = round(new_width / 8) * 8
171
+ new_height = round(new_height / 8) * 8
172
+
173
+ # Ensure within valid range (minimum 256, maximum 1024)
174
+ new_width = max(256, min(1024, new_width))
175
+ new_height = max(256, min(1024, new_height))
176
+
177
+ return new_width, new_height
178
+
179
+
180
+ # Updated duration function to match generate_image arguments (including progress)
181
+ def get_duration(
182
+ prompt_embeds,
183
+ image_list,
184
  width,
185
  height,
186
+ num_inference_steps,
187
  guidance_scale,
188
+ seed,
189
+ progress=gr.Progress(track_tqdm=True),
190
+ ):
191
+ num_images = 0 if image_list is None else len(image_list)
192
+ step_duration = 1 + 0.8 * num_images
193
+ return max(65, num_inference_steps * step_duration + 10)
194
+
195
+
196
+ @spaces.GPU(duration=get_duration)
197
+ def generate_image(
198
+ prompt_embeds,
199
+ image_list,
200
+ width,
201
+ height,
202
  num_inference_steps,
203
+ guidance_scale,
204
+ seed,
205
+ progress=gr.Progress(track_tqdm=True),
206
+ ):
207
+ # Move embeddings to GPU only when inside the GPU decorated function
208
+ prompt_embeds = prompt_embeds.to(device)
209
+
210
+ generator = torch.Generator(device=device).manual_seed(seed)
211
+
212
+ pipe_kwargs = {
213
+ "prompt_embeds": prompt_embeds,
214
+ "image": image_list,
215
+ "num_inference_steps": num_inference_steps,
216
+ "guidance_scale": guidance_scale,
217
+ "generator": generator,
218
+ "width": width,
219
+ "height": height,
220
+ }
221
+
222
+ # Progress bar for the actual generation steps
223
+ if progress:
224
+ progress(0, desc="Starting generation...")
225
+
226
+ image = pipe(**pipe_kwargs).images[0]
227
+ return image
228
+
229
+
230
+ def infer(
231
+ prompt,
232
+ input_images=None,
233
+ seed=42,
234
+ randomize_seed=False,
235
+ width=1024,
236
+ height=1024,
237
+ num_inference_steps=30,
238
+ guidance_scale=4.0,
239
+ prompt_upsampling=False,
240
  progress=gr.Progress(track_tqdm=True),
241
  ):
242
+
243
  if randomize_seed:
244
  seed = random.randint(0, MAX_SEED)
245
 
246
+ # Prepare image list (convert None or empty gallery to None)
247
+ image_list = None
248
+ if input_images is not None and len(input_images) > 0:
249
+ image_list = []
250
+ for item in input_images:
251
+ image_list.append(item[0])
252
+
253
+ # 1. Upsampling (Network bound - No GPU needed)
254
+ final_prompt = prompt
255
+ if prompt_upsampling:
256
+ progress(0.05, desc="Upsampling prompt...")
257
+ final_prompt = upsample_prompt_logic(prompt, image_list)
258
+ print(f"Original Prompt: {prompt}")
259
+ print(f"Upsampled Prompt: {final_prompt}")
260
+
261
+ # 2. Text Encoding (Network bound - No GPU needed)
262
+ progress(0.1, desc="Encoding prompt...")
263
+ # This returns CPU tensors
264
+ prompt_embeds = remote_text_encoder(final_prompt)
265
 
266
+ # 3. Image Generation (GPU bound)
267
+ progress(0.3, desc="Waiting for GPU...")
268
+ image = generate_image(
269
+ prompt_embeds,
270
+ image_list,
271
+ width,
272
+ height,
273
+ num_inference_steps,
274
+ guidance_scale,
275
+ seed,
276
+ progress,
277
+ )
278
 
279
  return image, seed
280
 
281
 
282
  examples = [
283
+ ["Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"],
284
+ ["An astronaut riding a green horse"],
285
+ ["A delicious ceviche cheesecake slice"],
286
+ [
287
+ "Create a vase on a table in living room, the color of the vase is a gradient of color, starting with #02eb3c color and finishing with #edfa3c. The flowers inside the vase have the color #ff0088"
288
+ ],
289
+ [
290
+ "Soaking wet capybara taking shelter under a banana leaf in the rainy jungle, close up photo"
291
+ ],
292
  ]
293
 
294
  css = """
295
  #col-container {
296
  margin: 0 auto;
297
+ max-width: 1200px;
298
+ }
299
+ .gallery-container img {
300
+ object-fit: contain;
301
  }
302
  """
303
 
304
  with gr.Blocks(css=css) as demo:
305
  with gr.Column(elem_id="col-container"):
306
+ gr.Markdown(
307
+ """# FLUX.2 [dev] Text-to-Image
308
+ FLUX.2 [dev] is a 32B model rectified flow capable of generating, editing and combining images based on text instructions [[model](https://huggingface.co/black-forest-labs/FLUX.2-dev)], [[blog](https://bfl.ai/blog/flux-2)]
309
+ """
310
+ )
311
 
312
  with gr.Row():
313
+ with gr.Column():
314
+ with gr.Row():
315
+ prompt = gr.Text(
316
+ label="Prompt",
317
+ show_label=False,
318
+ max_lines=2,
319
+ placeholder="Enter your prompt",
320
+ container=False,
321
+ scale=3,
322
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
323
 
324
+ run_button = gr.Button("Run", scale=1, variant="primary")
 
 
 
 
 
 
325
 
326
+ with gr.Accordion("Input image(s) (optional)", open=False):
327
+ input_images = gr.Gallery(
328
+ label="Input Image(s)",
329
+ type="pil",
330
+ columns=3,
331
+ rows=1,
332
+ info="Upload images for editing or combining",
333
+ )
334
 
335
+ with gr.Accordion("Advanced Settings", open=False):
336
+ prompt_upsampling = gr.Checkbox(
337
+ label="Prompt Upsampling",
338
+ value=False,
339
+ info="Automatically enhance the prompt using a VLM (requires HF_TOKEN)",
340
+ )
341
+
342
+ seed = gr.Slider(
343
+ label="Seed",
344
+ minimum=0,
345
+ maximum=MAX_SEED,
346
+ step=1,
347
+ value=0,
348
+ )
349
+
350
+ randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
351
+
352
+ with gr.Row():
353
+ width = gr.Slider(
354
+ label="Width",
355
+ minimum=256,
356
+ maximum=MAX_IMAGE_SIZE,
357
+ step=8,
358
+ value=1024,
359
+ )
360
+
361
+ height = gr.Slider(
362
+ label="Height",
363
+ minimum=256,
364
+ maximum=MAX_IMAGE_SIZE,
365
+ step=8,
366
+ value=1024,
367
+ )
368
+
369
+ with gr.Row():
370
+ num_inference_steps = gr.Slider(
371
+ label="Number of inference steps",
372
+ minimum=1,
373
+ maximum=100,
374
+ step=1,
375
+ value=30,
376
+ info="More steps = higher quality but slower",
377
+ )
378
+
379
+ guidance_scale = gr.Slider(
380
+ label="Guidance scale",
381
+ minimum=0.0,
382
+ maximum=10.0,
383
+ step=0.1,
384
+ value=4.0,
385
+ info="How closely to follow the prompt",
386
+ )
387
+
388
+ with gr.Column():
389
+ result = gr.Image(label="Result", show_label=False)
390
+
391
+ gr.Examples(examples=examples, inputs=[prompt], cache_examples=False)
392
+
393
+ # Auto-update dimensions when images are uploaded
394
+ input_images.upload(
395
+ fn=update_dimensions_from_image, inputs=[input_images], outputs=[width, height]
396
+ )
397
 
 
398
  gr.on(
399
  triggers=[run_button.click, prompt.submit],
400
  fn=infer,
401
  inputs=[
402
  prompt,
403
+ input_images,
404
  seed,
405
  randomize_seed,
406
  width,
407
  height,
 
408
  num_inference_steps,
409
+ guidance_scale,
410
+ prompt_upsampling,
411
  ],
412
  outputs=[result, seed],
413
  )
requirements.txt CHANGED
@@ -1,6 +1,13 @@
1
- accelerate
2
- diffusers
3
- invisible_watermark
4
- torch
5
  transformers
6
- xformers
 
 
 
 
 
 
 
 
 
 
 
1
+ git+https://github.com/huggingface/diffusers.git@cb9f124657ee2107ec9a4901b823a427e0fd6468
 
 
 
2
  transformers
3
+ accelerate
4
+ safetensors
5
+ bitsandbytes
6
+ torchao
7
+ kernels
8
+ spaces==0.43.0
9
+ gradio
10
+ gradio-client
11
+ huggingface-hub
12
+ pillow
13
+ numpy