ProtocolVoice models

Offline models for the ProtocolVoice Android app β€” voice transcription, speaker diarization, and on-device interview summarization.

All models run on the device, no cloud calls.

Contents

Russian ASR

File Size Purpose Original source License
gigaam_v3_e2e_ctc_int8.onnx 305 MB Russian ASR with built-in punctuation Sber/SaluteDevices GigaAM (v3, e2e CTC, int8-quantized) MIT

English ASR

File Size Purpose Original source License
en/whisper_base_en_encoder_int8.onnx 28 MB Whisper base.en encoder openai/whisper via sherpa-onnx MIT
en/whisper_base_en_decoder_int8.onnx 125 MB Whisper base.en decoder OpenAI Whisper via sherpa-onnx MIT
en/whisper_base_en_tokens.txt 0.8 MB Whisper tokens vocab OpenAI Whisper MIT

Speaker diarization (works for any language)

File Size Purpose Original source License
speaker_embedding_camplus.onnx 27 MB Speaker embedding (CAM++) β€” recommended default modelscope/3D-Speaker Apache-2.0
speaker_embedding.onnx 111 MB Speaker embedding (ERes2Net V1) β€” best quality modelscope/3D-Speaker Apache-2.0
speaker_embedding_v2.onnx 68 MB Speaker embedding (ERes2NetV2) modelscope/3D-Speaker Apache-2.0

Russian summarization (Default tier β€” NER-based, no LLM)

File Size Purpose Original source License
summary/navec_news.tar 25 MB Navec quantized word embeddings (250K Russian words, 300-dim, PQ-100) natasha/navec MIT
summary/slovnet_ner.tar 2.3 MB Slovnet NER weights (WordCNN + CRF, PER/LOC/ORG) natasha/slovnet MIT

These two files together (28 MB total) enable offline Russian named entity recognition + LexRank-based extractive summarization. ProtocolVoice uses them to extract names, organizations, locations, and key quotes from interview transcripts. No LLM required β€” fully deterministic, factual extraction.

Manifest

File Size Purpose
manifest.json < 2 KB SHA-256 hashes and metadata for all models

Important β€” attribution

These are NOT new models β€” this repository redistributes existing models in formats convenient for mobile delivery. The original authors retain all credit and copyright. We did not train, fine-tune, or modify the model weights.

Please cite the original projects, not this redistribution:

Why this redistribution

The ProtocolVoice mobile app needs to download these models on first run from a mirror that:

  • supports files larger than 100 MB without git-lfs limits,
  • has fast CDN reachable from Russia,
  • is the conventional hosting platform for ML models.

All redistributed files retain their original licenses. This README serves as the required attribution under those licenses.

How the app uses these models

ASR + diarization (loaded via sherpa-onnx):

  1. App downloads .onnx files from https://huggingface.co/protocolvoice/asr-models/resolve/main/{filename}
  2. Verifies SHA-256 against manifest.json
  3. Loads via sherpa-onnx for offline inference

Summarization (Default tier, custom Kotlin port):

  1. App downloads summary/navec_news.tar and summary/slovnet_ner.tar
  2. Extracts both .tar archives into the app's private files directory
  3. Loads weights into a pure-Kotlin reimplementation of Slovnet NER (no PyTorch, no Python β€” just FloatArray math): WordEmbedding β†’ ShapeEmbedding β†’ 3-layer Conv1D β†’ Linear β†’ CRF Viterbi
  4. Combines NER output with TF-IDF + LexRank to extract top quotes, named entities, risks, and numerical data

Inference performance on Xiaomi 12T: ~6 seconds for a 17,900-word transcript (default tier, NER + LexRank, no LLM).

You can also use these files directly with the upstream libraries (sherpa-onnx, slovnet, navec) in any project that respects the original licenses.

Verifying integrity

import hashlib

with open("gigaam_v3_e2e_ctc_int8.onnx", "rb") as f:
    print(hashlib.sha256(f.read()).hexdigest())
# expected: 0aacb41f70f0f5aaac4b45dd430337b9e16b180f22c72af04db8516e7609c3c0

Hashes for all files are in manifest.json.

Optional: Pro tier (QVikhr 1.5B)

ProtocolVoice has an optional PRO tier that produces a literary, narrative summary using QVikhr-2.5-1.5B-Instruct-r (1.0 GB GGUF, runs via llama.cpp on-device). The PRO tier is layered on top of the Default tier β€” Default extracts facts, PRO turns them into a coherent narrative.

The QVikhr GGUF is not hosted in this repo β€” users download it directly from the Vikhrmodels HF org or from a separate mirror, on demand. The QVikhr authors retain copyright; please cite them, not us.

License

This repository's metadata, README, and packaging scripts are released under Apache-2.0. Each model file remains under its original license (see the tables above). By using a model, you accept its original license β€” not just this repository's.

Removal request

If you are an author of one of the upstream projects and have any concerns about this redistribution (attribution, hosting, anything else), please open a discussion on this Hugging Face repo or email the maintainers β€” the files will be amended or removed as requested.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support