fused_dense_lib
Pre-built fused_dense_lib CUDA extension from csrc/fused_dense_lib of the
odysseyml/flash-attention fork.
Backs flash_attn.modules.mlp.FusedMLP: linear_bias_wgrad, linear_act_forward, bias_act_linear_dgrad_bgrad.
No upstream wheels or community kernel exist for this extension, so the full
torch/CUDA matrix is built by the fork's build-fused-dense-matrix.yml workflow
(release fused-dense-matrix-1) and repackaged here in kernels layout.
Build variants: torch 2.9–2.12 × cu126–cu132, x86_64 + aarch64, cp312. sm80/90 everywhere; +sm100/sm120 on cu128+.
from kernels import get_kernel
fused_dense_lib = get_kernel("odyssey-systems/fused-dense-lib")
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support