NLLB-200 Fine-tuned for Uzbek ↔ English Translation

This model is a fine-tuned version of facebook/nllb-200-distilled-600M specifically optimized for bidirectional translation between Uzbek (uz) and English (en).

Model Description

This translation model has been fine-tuned to provide high-quality translations for the Uzbek-English language pair, addressing the limited availability of quality translation models for Uzbek language.

Base Model: facebook/nllb-200-distilled-600M
Language Pairs:

  • English β†’ Uzbek (en β†’ uz)
  • Uzbek β†’ English (uz β†’ en)

Training Data

The model was fine-tuned on a diverse dataset totaling approximately 100% of training data, composed of:

  • 15% - Curated parallel corpus from uza.uz information portal web pages
  • 30% - Uzbek texts translated to English using Gemma-3-27b-it model
  • 35% - English texts translated to Uzbek
  • 20% - Self-improvement dataset: translations generated by the model itself, with low-quality outputs corrected using Gemini model and used for re-fine-tuning

This multi-source approach ensures robust performance across different domains and translation directions.

Performance

The model was evaluated on 200 samples from the openlanguagedata/flores_plus dataset using multiple metrics: BLEU, CHRF, COMET, and BLEURT.

Benchmark Results

English β†’ Uzbek

Model BLEU CHRF COMET BLEURT
NLLB-200-uz-en-v1 20.22 59.3 0.906 0.766
Tahrirchi Tilmoch 19.83 58.01 0.91 0.795
NLLB-200 Baseline 13.07 51.73 0.881 0.707

Uzbek β†’ English

Model BLEU CHRF COMET BLEURT
NLLB-200-uz-en-v1 34.47 62.19 0.874 0.747
Tahrirchi Tilmoch 33.76 61.97 0.876 0.754
NLLB-200 Baseline 30.28 58.28 0.856 0.718

Usage

Installation

pip install transformers sentencepiece

Python Example

from transformers import pipeline

def translate(text, src_lang, tgt_lang):
    """
    Translate text between Uzbek and English using transformers pipeline.
    """
    translator = pipeline(
        "translation",
        model="OvozifyLabs/nllb-en-uz-v1",
        src_lang=src_lang,
        tgt_lang=tgt_lang,
        max_length=512
    )
    
    result = translator(text)
    return result[0]["translation_text"]

# English β†’ Uzbek
en_text = "Hello, how are you today?"
uz_translation = translate(en_text, "eng_Latn", "uzn_Latn")
print("EN:", en_text)
print("UZ:", uz_translation)

# Uzbek β†’ English
uz_text = "Salom, bugun qandaysiz?"
en_translation = translate(uz_text, "uzn_Latn", "eng_Latn")
print("UZ:", uz_text)
print("EN:", en_translation)

Batch Translation Example

def translate_batch(texts, src_lang, tgt_lang):
    """
    Translate a list of texts using the transformers pipeline.
    Pipeline automatically supports batch input.
    """

    translator = pipeline(
        "translation",
        model="OvozifyLabs/nllb-en-uz-v1",
        src_lang=src_lang,
        tgt_lang=tgt_lang,
        max_length=512
    )

    results = translator(texts)

    return [item["translation_text"] for item in results]


# Example usage
texts = [
    "Machine learning is fascinating.",
    "I love learning new languages.",
    "This is a great translation model."
]

translations = translate_batch(texts, src_lang="eng_Latn", tgt_lang="uzn_Latn")

for orig, trans in zip(texts, translations):
    print(f"{orig} β†’ {trans}")

Intended Use

This model is intended for:

  • General-purpose translation between Uzbek and English
  • Content localization for web applications
  • Educational purposes and language learning
  • Research in machine translation for low-resource languages

Acknowledgments

License

This model inherits the license from the base NLLB-200 model. Please refer to the original model card for licensing information.

Contact

For questions, issues, or feedback, please open an issue in the model repository.

Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support