YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

library_name: transformers tags: - yoruba - tone-restoration - diacritics - nlp - seq2seq - mt5

Model Card for JohnsonPedia01/mT5_base_yoruba_tone_restoration

This model is a fine-tuned version of google/mt5-base for automatic Yoruba tone and diacritic restoration.
It restores missing diacritics in Yoruba text to improve readability and support downstream NLP tasks.

Model Details

Model Description

This mT5-base model has been fine-tuned specifically for Yoruba text to restore tone marks (diacritics) automatically. It can be used to improve text quality for NLP preprocessing, ASR post-processing, and research purposes.

Developed by: Babarinde Johnson
Funded by: N/A
Shared by: JohnsonPedia01
Model type: Seq2Seq (Text-to-Text, Transformer)
Language(s): Yoruba
License: Apache-2.0
Finetuned from model: google/mt5-base

Model Sources

Repository: Hugging Face Model Hub
Paper: N/A
Demo: N/A

Uses

Direct Use

Restoring Yoruba diacritics in plain text
Preprocessing for NLP pipelines
Enhancing readability of Yoruba text

Downstream Use

Post-processing for automatic speech recognition (ASR) outputs
Input cleaning for text-to-speech (TTS) or machine translation models

Out-of-Scope Use

Non-Yoruba languages
Text with heavy code-mixing or non-standard Yoruba spelling

Bias, Risks, and Limitations

May not handle code-mixed text or informal spelling
Training data is limited to standard Yoruba text; may propagate biases in source text
Can occasionally misplace diacritics in rare words

Recommendations

Validate outputs in critical applications (e.g., education, publications)
Avoid using for languages other than Yoruba

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("JohnsonPedia01/mT5_base_yoruba_tone_restoration")
model = AutoModelForSeq2SeqLM.from_pretrained("JohnsonPedia01/mT5_base_yoruba_tone_restoration")

yoruba_tone_pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

example = "omo mi wa nita nitoripe johnson"
output = yoruba_tone_pipe(example)
print(output[0]['generated_text'])

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support