Text Generation
LiteRT-LM
English
qwen

Qwen3-8B LiteRT Model

This repository contains a LiteRT-LM version of the Qwen3-8B model, specifically optimized for on-device text generation.

Key Features

  • Quantization: Weights are quantized by torchao to INT8 (channel-wise), and the KV cache is kept in Float32 to balance performance and accuracy.
  • Optimized for Mobile: Designed for efficient execution on mobile and edge devices via LiteRT.
  • Multilingual Support: Inherits strong multilingual capabilities from the base Qwen3-8B model.

Compatiblity

Tested on Linux CPU

Model Metadata

  • Filename: Qwen3_8b_channelwise_int8_float32kv.litertlm
  • Model Size: 7.74 GB
  • Quantization Mode: Channel-wise INT8 (Weights), Float32 (KV Cache)

Integration

Ready to integrate this into your product? Get started here.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}
Downloads last month
209
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for litert-community/Qwen3-8B

Finetuned
Qwen/Qwen3-8B
Quantized
(275)
this model

Collection including litert-community/Qwen3-8B

Paper for litert-community/Qwen3-8B