Granite 4.0 1b Speech (ONNX, ONNX (fp32))

Production-ready ONNX quantization of ibm-granite/granite-4.0-1b-speech for distributed automatic speech recognition — powered by the Aether edge inference runtime on Edgework.ai.

Model Details

Property Value
Base model ibm-granite/granite-4.0-1b-speech
Parameters 1B
Architecture Granite Speech
Quantization ONNX (fp32)
Format ONNX
Size ~2 GB
License apache-2.0

Deployment Architecture

This model runs on the Aether distributed inference runtime — a custom engine that shards model layers across multiple nodes for parallel execution:

  1. Coordinator receives requests and manages token generation
  2. Layer nodes each hold a subset of model layers (2 nodes for this model)
  3. Hidden states flow between nodes via gRPC
  4. Zero cold start via warm pool scheduling

Deployed via Edgework.ai — bringing fast, cheap, and private inference as close to the user as possible.

About

Published by AFFECTIVELY · Managed by @buley

We quantize and publish production-ready models for distributed edge inference via the Aether runtime. Every release is tested for correctness and stability before publication.

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for affectively-ai/granite-4.0-1b-speech-onnx

Quantized
(2)
this model