fused_dense_lib

Pre-built fused_dense_lib CUDA extension from csrc/fused_dense_lib of the odysseyml/flash-attention fork. Backs flash_attn.modules.mlp.FusedMLP: linear_bias_wgrad, linear_act_forward, bias_act_linear_dgrad_bgrad.

No upstream wheels or community kernel exist for this extension, so the full torch/CUDA matrix is built by the fork's build-fused-dense-matrix.yml workflow (release fused-dense-matrix-1) and repackaged here in kernels layout.

Build variants: torch 2.9–2.12 × cu126–cu132, x86_64 + aarch64, cp312. sm80/90 everywhere; +sm100/sm120 on cu128+.

from kernels import get_kernel

fused_dense_lib = get_kernel("odyssey-systems/fused-dense-lib")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support