Bert-Base-Uncased-Hf: Optimized for Qualcomm Devices

Bert is a lightweight BERT model designed for efficient self-supervised learning of language representations. It can be used for masked language modeling and as a backbone for various NLP tasks.

This is based on the implementation of Bert-Base-Uncased-Hf found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime Precision Chipset SDK Versions Download
ONNX float Universal QAIRT 2.42, ONNX Runtime 1.24.1 Download
ONNX w8a16 Universal QAIRT 2.42, ONNX Runtime 1.24.1 Download
QNN_DLC float Universal QAIRT 2.43 Download
QNN_DLC w8a16 Universal QAIRT 2.43 Download
TFLITE float Universal QAIRT 2.43, TFLite 2.17.0 Download

For more device-specific assets and performance metrics, visit Bert-Base-Uncased-Hf on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

  • Custom weights (e.g., fine-tuned checkpoints)
  • Custom input shapes
  • Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Bert-Base-Uncased-Hf on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.text_generation

Model Stats:

  • Model checkpoint: google-bert/bert-base-uncased
  • Input resolution: 1x384
  • Number of parameters: 110M
  • Model size (float): 418 MB

Performance Summary

Model Runtime Precision Chipset Inference Time (ms) Peak Memory Range (MB) Primary Compute Unit
Bert-Base-Uncased-Hf ONNX float Snapdragon® X2 Elite 14.602 ms 265 - 265 MB NPU
Bert-Base-Uncased-Hf ONNX float Snapdragon® X Elite 31.009 ms 265 - 265 MB NPU
Bert-Base-Uncased-Hf ONNX float Snapdragon® 8 Gen 3 Mobile 23.478 ms 0 - 743 MB NPU
Bert-Base-Uncased-Hf ONNX float Qualcomm® QCS8550 (Proxy) 31.542 ms 0 - 325 MB NPU
Bert-Base-Uncased-Hf ONNX float Qualcomm® QCS9075 35.956 ms 0 - 3 MB NPU
Bert-Base-Uncased-Hf ONNX float Snapdragon® 8 Elite For Galaxy Mobile 16.933 ms 0 - 661 MB NPU
Bert-Base-Uncased-Hf ONNX float Snapdragon® 8 Elite Gen 5 Mobile 13.767 ms 0 - 719 MB NPU
Bert-Base-Uncased-Hf ONNX w8a16 Snapdragon® X2 Elite 8.747 ms 154 - 154 MB NPU
Bert-Base-Uncased-Hf ONNX w8a16 Snapdragon® X Elite 20.897 ms 154 - 154 MB NPU
Bert-Base-Uncased-Hf ONNX w8a16 Snapdragon® 8 Gen 3 Mobile 14.802 ms 0 - 587 MB NPU
Bert-Base-Uncased-Hf ONNX w8a16 Qualcomm® QCS6490 2309.508 ms 190 - 282 MB CPU
Bert-Base-Uncased-Hf ONNX w8a16 Qualcomm® QCS8550 (Proxy) 19.827 ms 0 - 167 MB NPU
Bert-Base-Uncased-Hf ONNX w8a16 Qualcomm® QCS9075 20.917 ms 0 - 3 MB NPU
Bert-Base-Uncased-Hf ONNX w8a16 Qualcomm® QCM6690 1208.392 ms 201 - 215 MB CPU
Bert-Base-Uncased-Hf ONNX w8a16 Snapdragon® 8 Elite For Galaxy Mobile 10.882 ms 0 - 419 MB NPU
Bert-Base-Uncased-Hf ONNX w8a16 Snapdragon® 7 Gen 4 Mobile 1181.56 ms 204 - 218 MB CPU
Bert-Base-Uncased-Hf ONNX w8a16 Snapdragon® 8 Elite Gen 5 Mobile 8.339 ms 0 - 410 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Snapdragon® X2 Elite 10.625 ms 1 - 1 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Snapdragon® X Elite 22.549 ms 0 - 0 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Snapdragon® 8 Gen 3 Mobile 16.997 ms 0 - 583 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Qualcomm® QCS8275 (Proxy) 81.544 ms 0 - 520 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Qualcomm® QCS8550 (Proxy) 23.039 ms 0 - 2 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Qualcomm® SA8775P 28.671 ms 0 - 520 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Qualcomm® QCS9075 28.988 ms 0 - 2 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Qualcomm® QCS8450 (Proxy) 47.896 ms 0 - 565 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Qualcomm® SA7255P 81.544 ms 0 - 520 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Qualcomm® SA8295P 35.329 ms 0 - 501 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 11.792 ms 0 - 518 MB NPU
Bert-Base-Uncased-Hf QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 9.464 ms 0 - 538 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Snapdragon® X2 Elite 6.082 ms 1 - 1 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Snapdragon® X Elite 13.915 ms 0 - 0 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Snapdragon® 8 Gen 3 Mobile 9.152 ms 0 - 499 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Qualcomm® QCS8275 (Proxy) 30.748 ms 0 - 408 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Qualcomm® QCS8550 (Proxy) 13.281 ms 0 - 2 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Qualcomm® SA8775P 13.183 ms 0 - 408 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Qualcomm® QCS9075 15.38 ms 0 - 2 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Qualcomm® SA7255P 30.748 ms 0 - 408 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Snapdragon® 8 Elite For Galaxy Mobile 7.28 ms 0 - 409 MB NPU
Bert-Base-Uncased-Hf QNN_DLC w8a16 Snapdragon® 8 Elite Gen 5 Mobile 5.137 ms 0 - 412 MB NPU
Bert-Base-Uncased-Hf TFLITE float Snapdragon® 8 Gen 3 Mobile 17.115 ms 0 - 590 MB NPU
Bert-Base-Uncased-Hf TFLITE float Qualcomm® QCS8275 (Proxy) 81.868 ms 0 - 531 MB NPU
Bert-Base-Uncased-Hf TFLITE float Qualcomm® QCS8550 (Proxy) 23.109 ms 0 - 3 MB NPU
Bert-Base-Uncased-Hf TFLITE float Qualcomm® SA8775P 28.875 ms 0 - 533 MB NPU
Bert-Base-Uncased-Hf TFLITE float Qualcomm® QCS9075 29.242 ms 0 - 259 MB NPU
Bert-Base-Uncased-Hf TFLITE float Qualcomm® QCS8450 (Proxy) 48.227 ms 0 - 566 MB NPU
Bert-Base-Uncased-Hf TFLITE float Qualcomm® SA7255P 81.868 ms 0 - 531 MB NPU
Bert-Base-Uncased-Hf TFLITE float Qualcomm® SA8295P 35.638 ms 0 - 503 MB NPU
Bert-Base-Uncased-Hf TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 12.114 ms 0 - 527 MB NPU
Bert-Base-Uncased-Hf TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 9.74 ms 0 - 544 MB NPU

License

  • The license for the original implementation of Bert-Base-Uncased-Hf can be found here.

References

Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for qualcomm/Bert-Base-Uncased-Hf