Granite 4.0 1b Speech (ONNX, ONNX (fp32))
Production-ready ONNX quantization of ibm-granite/granite-4.0-1b-speech for distributed automatic speech recognition — powered by the Aether edge inference runtime on Edgework.ai.
Model Details
| Property | Value |
|---|---|
| Base model | ibm-granite/granite-4.0-1b-speech |
| Parameters | 1B |
| Architecture | Granite Speech |
| Quantization | ONNX (fp32) |
| Format | ONNX |
| Size | ~2 GB |
| License | apache-2.0 |
Deployment Architecture
This model runs on the Aether distributed inference runtime — a custom engine that shards model layers across multiple nodes for parallel execution:
- Coordinator receives requests and manages token generation
- Layer nodes each hold a subset of model layers (2 nodes for this model)
- Hidden states flow between nodes via gRPC
- Zero cold start via warm pool scheduling
Deployed via Edgework.ai — bringing fast, cheap, and private inference as close to the user as possible.
About
Published by AFFECTIVELY · Managed by @buley
We quantize and publish production-ready models for distributed edge inference via the Aether runtime. Every release is tested for correctness and stability before publication.
- Downloads last month
- 22
Model tree for affectively-ai/granite-4.0-1b-speech-onnx
Base model
ibm-granite/granite-4.0-1b-base
Finetuned
ibm-granite/granite-4.0-1b-speech