How to use


import spaces
import torch
from diffusers.pipelines.flux.pipeline_flux import FluxPipeline

pipeline = FluxPipeline.from_pretrained(
    'black-forest-labs/FLUX.1-dev',
    torch_dtype=torch.bfloat16
).to('cuda')

spaces.aoti_load(
    module=pipeline.transformer,
    repo_id='cbensimon/FLUX.1-aot-h200',
)

How to reproduce or customize

# Install hf CLI
curl -LsSf https://hf.co/cli/install.sh | bash

# Login
hf auth login

# Get the job file and edit if needed
hf download cbensimon/FLUX.1-aot-h200 job.py

# Chose a destination repository
OUTPUT_REPO_ID=<output-repo-id>

# Run the job and change flavor or image if needed
hf jobs uv run job.py \
    --flavor h200 \
    --image pytorch/pytorch:2.9.1-cuda12.8-cudnn9-devel \
    -e PARENT_REPO_ID=cbensimon/FLUX.1-aot-h200 \
    -e OUTPUT_REPO_ID=$OUTPUT_REPO_ID \
    --secrets HF_TOKEN

Environment

Click to expand

PyTorch version: 2.9.1+cu128
Is debug build: False
CUDA used to build PyTorch: 12.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.1.2
Libc version: glibc-2.35

Python version: 3.10.0 (default, Oct 18 2021, 02:11:22) [Clang 13.0.0 ] (64-bit runtime)
Python platform: Linux-6.12.64-87.122.amzn2023.x86_64-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.8.93
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: NVIDIA H200
Nvidia driver version: 580.126.09
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           46 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  192
On-line CPU(s) list:                     0-191
Vendor ID:                               GenuineIntel
Model name:                              Intel(R) Xeon(R) Platinum 8488C
CPU family:                              6
Model:                                   143
Thread(s) per core:                      2
Core(s) per socket:                      48
Socket(s):                               2
Stepping:                                8
BogoMIPS:                                4800.00
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd ida arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear serialize amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Hypervisor vendor:                       KVM
Virtualization type:                     full
L1d cache:                               4.5 MiB (96 instances)
L1i cache:                               3 MiB (96 instances)
L2 cache:                                192 MiB (96 instances)
L3 cache:                                210 MiB (2 instances)
NUMA node(s):                            2
NUMA node0 CPU(s):                       0-47,96-143
NUMA node1 CPU(s):                       48-95,144-191
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Versions of relevant libraries:
[pip3] Could not collect
[conda] numpy                        2.3.4            py311h2e04523_0       conda-forge
[conda] nvidia-cublas-cu12           12.8.4.1         pypi_0                pypi
[conda] nvidia-cuda-cupti-cu12       12.8.90          pypi_0                pypi
[conda] nvidia-cuda-nvrtc-cu12       12.8.93          pypi_0                pypi
[conda] nvidia-cuda-runtime-cu12     12.8.90          pypi_0                pypi
[conda] nvidia-cudnn-cu12            9.10.2.21        pypi_0                pypi
[conda] nvidia-cufft-cu12            11.3.3.83        pypi_0                pypi
[conda] nvidia-curand-cu12           10.3.9.90        pypi_0                pypi
[conda] nvidia-cusolver-cu12         11.7.3.90        pypi_0                pypi
[conda] nvidia-cusparse-cu12         12.5.8.93        pypi_0                pypi
[conda] nvidia-cusparselt-cu12       0.7.1            pypi_0                pypi
[conda] nvidia-nccl-cu12             2.27.5           pypi_0                pypi
[conda] nvidia-nvjitlink-cu12        12.8.93          pypi_0                pypi
[conda] nvidia-nvtx-cu12             12.8.90          pypi_0                pypi
[conda] optree                       0.17.0           pypi_0                pypi
[conda] torch                        2.9.1+cu128      pypi_0                pypi
[conda] torchaudio                   2.9.1+cu128      pypi_0                pypi
[conda] torchelastic                 0.2.2            pypi_0                pypi
[conda] torchvision                  0.24.1+cu128     pypi_0                pypi
[conda] triton                       3.5.1            pypi_0                pypi

Job run

cbensimon/69cc059834fa24114ddf48a9

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support