Qwen3-8B LiteRT Model

This repository contains a LiteRT-LM version of the Qwen3-8B model, specifically optimized for on-device text generation.

Key Features

Quantization: Weights are quantized by torchao to INT8 (channel-wise), and the KV cache is kept in Float32 to balance performance and accuracy.
Optimized for Mobile: Designed for efficient execution on mobile and edge devices via LiteRT.
Multilingual Support: Inherits strong multilingual capabilities from the base Qwen3-8B model.

Compatiblity

Tested on Linux CPU

Model Metadata

Filename: Qwen3_8b_channelwise_int8_float32kv.litertlm
Model Size: 7.74 GB
Quantization Mode: Channel-wise INT8 (Weights), Float32 (KV Cache)

Integration

Ready to integrate this into your product? Get started here.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}

Downloads last month: 209

Model tree for litert-community/Qwen3-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Quantized

(275)

this model

Collection including litert-community/Qwen3-8B

Qwen Family

Collection

LiteRT models in the Qwen Family • 5 items • Updated 5 days ago • 7

Paper for litert-community/Qwen3-8B

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 340