IndicConformer 600M β ONNX (repackaged for Vernacula)
This repo republishes AI4Bharat's 22-language
ai4bharat/indic-conformer-600m-multilingual
as a single self-contained ONNX shipping package, in the shape the
Vernacula desktop ASR
app expects its on-disk model directories to be. The CTC head only β the
RNNT components from the source repo are not shipped here.
All numerical behavior is identical to the upstream encoder+CTC graph; only the on-disk packaging differs.
Contents
| File | Purpose |
|---|---|
encoder-model.onnx (+ .data sidecar) |
Conformer encoder, [features, features_lens] -> [encoded, encoded_lens] |
ctc_decoder-model.onnx |
Single Conv1d β 5633-dim logits (22 Γ 256 language tokens + 1 shared CTC blank at id 5632) |
nemo128.onnx |
DFT-conv1d 80-mel preprocessor, [waveforms, waveforms_lens] -> [features, features_lens] |
vocab.txt |
Flat 5632-line vocab, id = line index; shared CTC blank is implicit at id 5632 |
language_spans.json |
22 Γ {start, length} β which slice of vocab.txt each language's 256 tokens occupy |
config.json |
Preprocessor frontend params + CTC blank id |
manifest.json |
Per-file MD5 hashes (used by Vernacula's download verifier) |
Transformations applied vs upstream
- Encoder ONNX: consolidated the ~360 per-tensor external-data blob
files (HF's xet layout in the upstream repo) into a single
encoder-model.onnx.datasidecar so the file set is manageable. Also resolved external data from Constant-node attributes as well as graph initializers. - Renamed ONNX IO tensors so one C# backend loads either this 600M
package or a NeMo-fork 120M export without branching:
- Encoder:
audio_signal β features,length β features_lens,outputs β encoded,encoded_lengths β encoded_lens. - CTC decoder:
encoder_output β encoded,logprobs β logits.
- Encoder:
- Vocab flatten: upstream
vocab.jsonis a 22-key dict with 257 entries each ([<unk>, t1..t256]). Flattened to a single 5632-linevocab.txtkeeping<unk>at local index 0 and the 255 real tokens at 1..255 per language. The 257th upstream slot is unused padding mirroring the RNNT head layout; it would never be decoded by the 256-dim CTC softmax. - Masks β spans: upstream
language_masks.jsonis 22 per-language length-5633 boolean arrays. Verified they resolve to contiguous 256-token ranges, then compressed to22 Γ {start, length}entries. - Preprocessor: upstream ships TorchScript (
preprocessor.ts). We replace it with a custom DFT-conv1d ONNX graph (noSTFTop β ONNX Runtime's STFT diverges from PyTorch's on current toolchains). The frontend config is byte-identical to upstream: sample_rate 16 kHz, 80 mel, n_fft 512, hop 160, win 400, hann, preemph 0.97, log+add guard, per-feature normalize, power spectrogram. - Parity verified end-to-end against upstream on a Hindi Fleurs clip β decodes to readable Devanagari with realistic WER; the full pipeline (nemo128 β encoder β ctc_decoder) is numerically equivalent to running AI4Bharat's reference model_onnx.py against the original assets/*.onnx files.
Citation
Original model by AI4Bharat. Please cite their work when using this repackaged copy; see their model card for details.
License
MIT, same as upstream.
- Downloads last month
- 13
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for christopherthompson81/indicconformer-600m-onnx
Base model
ai4bharat/indic-conformer-600m-multilingual