YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

library_name: transformers tags: - yoruba - tone-restoration - diacritics - nlp - seq2seq - mt5

Model Card for JohnsonPedia01/mT5_base_yoruba_tone_restoration

This model is a fine-tuned version of google/mt5-base for automatic Yoruba tone and diacritic restoration.
It restores missing diacritics in Yoruba text to improve readability and support downstream NLP tasks.

Model Details

Model Description

This mT5-base model has been fine-tuned specifically for Yoruba text to restore tone marks (diacritics) automatically. It can be used to improve text quality for NLP preprocessing, ASR post-processing, and research purposes.

  • Developed by: Babarinde Johnson
  • Funded by: N/A
  • Shared by: JohnsonPedia01
  • Model type: Seq2Seq (Text-to-Text, Transformer)
  • Language(s): Yoruba
  • License: Apache-2.0
  • Finetuned from model: google/mt5-base

Model Sources

Uses

Direct Use

  • Restoring Yoruba diacritics in plain text
  • Preprocessing for NLP pipelines
  • Enhancing readability of Yoruba text

Downstream Use

  • Post-processing for automatic speech recognition (ASR) outputs
  • Input cleaning for text-to-speech (TTS) or machine translation models

Out-of-Scope Use

  • Non-Yoruba languages
  • Text with heavy code-mixing or non-standard Yoruba spelling

Bias, Risks, and Limitations

  • May not handle code-mixed text or informal spelling
  • Training data is limited to standard Yoruba text; may propagate biases in source text
  • Can occasionally misplace diacritics in rare words

Recommendations

  • Validate outputs in critical applications (e.g., education, publications)
  • Avoid using for languages other than Yoruba

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("JohnsonPedia01/mT5_base_yoruba_tone_restoration")
model = AutoModelForSeq2SeqLM.from_pretrained("JohnsonPedia01/mT5_base_yoruba_tone_restoration")

yoruba_tone_pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

example = "omo mi wa nita nitoripe johnson"
output = yoruba_tone_pipe(example)
print(output[0]['generated_text'])
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support