masakhane/masakhaner
Updated • 584 • 9
How to use arnolfokam/mbert-base-uncased-swa with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="arnolfokam/mbert-base-uncased-swa") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("arnolfokam/mbert-base-uncased-swa")
model = AutoModelForTokenClassification.from_pretrained("arnolfokam/mbert-base-uncased-swa")mbert-base-uncased-swa is a model based on the fine-tuned Multilingual BERT base uncased model. It has been trained to recognize four types of entities:
This model was fine-tuned on the Swahili corpus (swa) of the MasakhaNER dataset. However, we thresholded the number of entity groups per sentence in this dataset to 10 entity groups.
This model was trained on a single NVIDIA P5000 from Paperspace
We evaluated this model on the test split of the Swahili corpus (swa) present in the MasakhaNER with no thresholding.
| Model Name | Precision | Recall | F1-score |
|---|---|---|---|
| mbert-base-uncased-swa | 85.59 | 90.80 | 88.12 |
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("arnolfokam/mbert-base-uncased-swa")
model = AutoModelForTokenClassification.from_pretrained("arnolfokam/mbert-base-uncased-swa")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Wizara ya afya ya Tanzania imeripoti Jumatatu kuwa, watu takriban 14 zaidi wamepata maambukizi ya Covid-19."
ner_results = nlp(example)
print(ner_results)