Granite 4.0 1b Speech (ONNX, ONNX (fp32))

Production-ready ONNX quantization of ibm-granite/granite-4.0-1b-speech for distributed automatic speech recognition — powered by the Aether edge inference runtime on Edgework.ai.

Model Details

Property	Value
Base model	ibm-granite/granite-4.0-1b-speech
Parameters	1B
Architecture	Granite Speech
Quantization	ONNX (fp32)
Format	ONNX
Size	~2 GB
License	apache-2.0

Deployment Architecture

This model runs on the Aether distributed inference runtime — a custom engine that shards model layers across multiple nodes for parallel execution:

Coordinator receives requests and manages token generation
Layer nodes each hold a subset of model layers (2 nodes for this model)
Hidden states flow between nodes via gRPC
Zero cold start via warm pool scheduling

Deployed via Edgework.ai — bringing fast, cheap, and private inference as close to the user as possible.

About

Published by AFFECTIVELY · Managed by @buley

We quantize and publish production-ready models for distributed edge inference via the Aether runtime. Every release is tested for correctness and stability before publication.

All models · GitHub · Edgework.ai

Downloads last month: 22

Model tree for affectively-ai/granite-4.0-1b-speech-onnx

Base model

ibm-granite/granite-4.0-1b-base

Finetuned

ibm-granite/granite-4.0-1b-speech

Quantized

(2)

this model