NLLB-200 Fine-tuned for Uzbek ↔ English Translation

This model is a fine-tuned version of facebook/nllb-200-distilled-600M specifically optimized for bidirectional translation between Uzbek (uz) and English (en).

Model Description

This translation model has been fine-tuned to provide high-quality translations for the Uzbek-English language pair, addressing the limited availability of quality translation models for Uzbek language.

Base Model: facebook/nllb-200-distilled-600M
Language Pairs:

English → Uzbek (en → uz)
Uzbek → English (uz → en)

Training Data

The model was fine-tuned on a diverse dataset totaling approximately 100% of training data, composed of:

15% - Curated parallel corpus from uza.uz information portal web pages
30% - Uzbek texts translated to English using Gemma-3-27b-it model
35% - English texts translated to Uzbek
20% - Self-improvement dataset: translations generated by the model itself, with low-quality outputs corrected using Gemini model and used for re-fine-tuning

This multi-source approach ensures robust performance across different domains and translation directions.

Performance

The model was evaluated on 200 samples from the openlanguagedata/flores_plus dataset using multiple metrics: BLEU, CHRF, COMET, and BLEURT.

Benchmark Results

English → Uzbek

Model	BLEU	CHRF	COMET	BLEURT
NLLB-200-uz-en-v1	20.22	59.3	0.906	0.766
Tahrirchi Tilmoch	19.83	58.01	0.91	0.795
NLLB-200 Baseline	13.07	51.73	0.881	0.707

Uzbek → English

Model	BLEU	CHRF	COMET	BLEURT
NLLB-200-uz-en-v1	34.47	62.19	0.874	0.747
Tahrirchi Tilmoch	33.76	61.97	0.876	0.754
NLLB-200 Baseline	30.28	58.28	0.856	0.718

Usage

Installation

pip install transformers sentencepiece

Python Example

from transformers import pipeline

def translate(text, src_lang, tgt_lang):
    """
    Translate text between Uzbek and English using transformers pipeline.
    """
    translator = pipeline(
        "translation",
        model="OvozifyLabs/nllb-en-uz-v1",
        src_lang=src_lang,
        tgt_lang=tgt_lang,
        max_length=512
    )
    
    result = translator(text)
    return result[0]["translation_text"]

# English → Uzbek
en_text = "Hello, how are you today?"
uz_translation = translate(en_text, "eng_Latn", "uzn_Latn")
print("EN:", en_text)
print("UZ:", uz_translation)

# Uzbek → English
uz_text = "Salom, bugun qandaysiz?"
en_translation = translate(uz_text, "uzn_Latn", "eng_Latn")
print("UZ:", uz_text)
print("EN:", en_translation)

Batch Translation Example

def translate_batch(texts, src_lang, tgt_lang):
    """
    Translate a list of texts using the transformers pipeline.
    Pipeline automatically supports batch input.
    """

    translator = pipeline(
        "translation",
        model="OvozifyLabs/nllb-en-uz-v1",
        src_lang=src_lang,
        tgt_lang=tgt_lang,
        max_length=512
    )

    results = translator(texts)

    return [item["translation_text"] for item in results]


# Example usage
texts = [
    "Machine learning is fascinating.",
    "I love learning new languages.",
    "This is a great translation model."
]

translations = translate_batch(texts, src_lang="eng_Latn", tgt_lang="uzn_Latn")

for orig, trans in zip(texts, translations):
    print(f"{orig} → {trans}")

Intended Use

This model is intended for:

General-purpose translation between Uzbek and English
Content localization for web applications
Educational purposes and language learning
Research in machine translation for low-resource languages

Acknowledgments

Base model: facebook/nllb-200-distilled-600M
Evaluation dataset: openlanguagedata/flores_plus

License

This model inherits the license from the base NLLB-200 model. Please refer to the original model card for licensing information.

Contact

For questions, issues, or feedback, please open an issue in the model repository.

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support