YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
library_name: transformers tags: - yoruba - tone-restoration - diacritics - nlp - seq2seq - mt5
Model Card for JohnsonPedia01/mT5_base_yoruba_tone_restoration
This model is a fine-tuned version of google/mt5-base for automatic Yoruba tone and diacritic restoration.
It restores missing diacritics in Yoruba text to improve readability and support downstream NLP tasks.
Model Details
Model Description
This mT5-base model has been fine-tuned specifically for Yoruba text to restore tone marks (diacritics) automatically. It can be used to improve text quality for NLP preprocessing, ASR post-processing, and research purposes.
- Developed by: Babarinde Johnson
- Funded by: N/A
- Shared by: JohnsonPedia01
- Model type: Seq2Seq (Text-to-Text, Transformer)
- Language(s): Yoruba
- License: Apache-2.0
- Finetuned from model: google/mt5-base
Model Sources
- Repository: Hugging Face Model Hub
- Paper: N/A
- Demo: N/A
Uses
Direct Use
- Restoring Yoruba diacritics in plain text
- Preprocessing for NLP pipelines
- Enhancing readability of Yoruba text
Downstream Use
- Post-processing for automatic speech recognition (ASR) outputs
- Input cleaning for text-to-speech (TTS) or machine translation models
Out-of-Scope Use
- Non-Yoruba languages
- Text with heavy code-mixing or non-standard Yoruba spelling
Bias, Risks, and Limitations
- May not handle code-mixed text or informal spelling
- Training data is limited to standard Yoruba text; may propagate biases in source text
- Can occasionally misplace diacritics in rare words
Recommendations
- Validate outputs in critical applications (e.g., education, publications)
- Avoid using for languages other than Yoruba
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
tokenizer = AutoTokenizer.from_pretrained("JohnsonPedia01/mT5_base_yoruba_tone_restoration")
model = AutoModelForSeq2SeqLM.from_pretrained("JohnsonPedia01/mT5_base_yoruba_tone_restoration")
yoruba_tone_pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
example = "omo mi wa nita nitoripe johnson"
output = yoruba_tone_pipe(example)
print(output[0]['generated_text'])
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
1
Ask for provider support