Bert-Base-Uncased-Hf: Optimized for Qualcomm Devices

Bert is a lightweight BERT model designed for efficient self-supervised learning of language representations. It can be used for masked language modeling and as a backbone for various NLP tasks.

This is based on the implementation of Bert-Base-Uncased-Hf found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime	Precision	Chipset	SDK Versions	Download
ONNX	float	Universal	QAIRT 2.42, ONNX Runtime 1.24.1	Download
ONNX	w8a16	Universal	QAIRT 2.42, ONNX Runtime 1.24.1	Download
QNN_DLC	float	Universal	QAIRT 2.43	Download
QNN_DLC	w8a16	Universal	QAIRT 2.43	Download
TFLITE	float	Universal	QAIRT 2.43, TFLite 2.17.0	Download

For more device-specific assets and performance metrics, visit Bert-Base-Uncased-Hf on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

Custom weights (e.g., fine-tuned checkpoints)
Custom input shapes
Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Bert-Base-Uncased-Hf on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.text_generation

Model Stats:

Model checkpoint: google-bert/bert-base-uncased
Input resolution: 1x384
Number of parameters: 110M
Model size (float): 418 MB

Performance Summary

Model	Runtime	Precision	Chipset	Inference Time (ms)	Peak Memory Range (MB)	Primary Compute Unit
Bert-Base-Uncased-Hf	ONNX	float	Snapdragon® X2 Elite	14.602 ms	265 - 265 MB	NPU
Bert-Base-Uncased-Hf	ONNX	float	Snapdragon® X Elite	31.009 ms	265 - 265 MB	NPU
Bert-Base-Uncased-Hf	ONNX	float	Snapdragon® 8 Gen 3 Mobile	23.478 ms	0 - 743 MB	NPU
Bert-Base-Uncased-Hf	ONNX	float	Qualcomm® QCS8550 (Proxy)	31.542 ms	0 - 325 MB	NPU
Bert-Base-Uncased-Hf	ONNX	float	Qualcomm® QCS9075	35.956 ms	0 - 3 MB	NPU
Bert-Base-Uncased-Hf	ONNX	float	Snapdragon® 8 Elite For Galaxy Mobile	16.933 ms	0 - 661 MB	NPU
Bert-Base-Uncased-Hf	ONNX	float	Snapdragon® 8 Elite Gen 5 Mobile	13.767 ms	0 - 719 MB	NPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Snapdragon® X2 Elite	8.747 ms	154 - 154 MB	NPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Snapdragon® X Elite	20.897 ms	154 - 154 MB	NPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Snapdragon® 8 Gen 3 Mobile	14.802 ms	0 - 587 MB	NPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Qualcomm® QCS6490	2309.508 ms	190 - 282 MB	CPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Qualcomm® QCS8550 (Proxy)	19.827 ms	0 - 167 MB	NPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Qualcomm® QCS9075	20.917 ms	0 - 3 MB	NPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Qualcomm® QCM6690	1208.392 ms	201 - 215 MB	CPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Snapdragon® 8 Elite For Galaxy Mobile	10.882 ms	0 - 419 MB	NPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Snapdragon® 7 Gen 4 Mobile	1181.56 ms	204 - 218 MB	CPU
Bert-Base-Uncased-Hf	ONNX	w8a16	Snapdragon® 8 Elite Gen 5 Mobile	8.339 ms	0 - 410 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Snapdragon® X2 Elite	10.625 ms	1 - 1 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Snapdragon® X Elite	22.549 ms	0 - 0 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Snapdragon® 8 Gen 3 Mobile	16.997 ms	0 - 583 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Qualcomm® QCS8275 (Proxy)	81.544 ms	0 - 520 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Qualcomm® QCS8550 (Proxy)	23.039 ms	0 - 2 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Qualcomm® SA8775P	28.671 ms	0 - 520 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Qualcomm® QCS9075	28.988 ms	0 - 2 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Qualcomm® QCS8450 (Proxy)	47.896 ms	0 - 565 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Qualcomm® SA7255P	81.544 ms	0 - 520 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Qualcomm® SA8295P	35.329 ms	0 - 501 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Snapdragon® 8 Elite For Galaxy Mobile	11.792 ms	0 - 518 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	float	Snapdragon® 8 Elite Gen 5 Mobile	9.464 ms	0 - 538 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Snapdragon® X2 Elite	6.082 ms	1 - 1 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Snapdragon® X Elite	13.915 ms	0 - 0 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Snapdragon® 8 Gen 3 Mobile	9.152 ms	0 - 499 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Qualcomm® QCS8275 (Proxy)	30.748 ms	0 - 408 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Qualcomm® QCS8550 (Proxy)	13.281 ms	0 - 2 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Qualcomm® SA8775P	13.183 ms	0 - 408 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Qualcomm® QCS9075	15.38 ms	0 - 2 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Qualcomm® SA7255P	30.748 ms	0 - 408 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Snapdragon® 8 Elite For Galaxy Mobile	7.28 ms	0 - 409 MB	NPU
Bert-Base-Uncased-Hf	QNN_DLC	w8a16	Snapdragon® 8 Elite Gen 5 Mobile	5.137 ms	0 - 412 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Snapdragon® 8 Gen 3 Mobile	17.115 ms	0 - 590 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Qualcomm® QCS8275 (Proxy)	81.868 ms	0 - 531 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Qualcomm® QCS8550 (Proxy)	23.109 ms	0 - 3 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Qualcomm® SA8775P	28.875 ms	0 - 533 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Qualcomm® QCS9075	29.242 ms	0 - 259 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Qualcomm® QCS8450 (Proxy)	48.227 ms	0 - 566 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Qualcomm® SA7255P	81.868 ms	0 - 531 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Qualcomm® SA8295P	35.638 ms	0 - 503 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Snapdragon® 8 Elite For Galaxy Mobile	12.114 ms	0 - 527 MB	NPU
Bert-Base-Uncased-Hf	TFLITE	float	Snapdragon® 8 Elite Gen 5 Mobile	9.74 ms	0 - 544 MB	NPU

License

The license for the original implementation of Bert-Base-Uncased-Hf can be found here.

References

Community

Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
For questions or feedback please reach out to us.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for qualcomm/Bert-Base-Uncased-Hf

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26