Instructions to use MLRS/BERTu with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MLRS/BERTu with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="MLRS/BERTu")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("MLRS/BERTu") model = AutoModelForMaskedLM.from_pretrained("MLRS/BERTu") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - mt | |
| datasets: | |
| - MLRS/korpus_malti | |
| model-index: | |
| - name: BERTu | |
| results: | |
| - task: | |
| type: dependency-parsing | |
| name: Dependency Parsing | |
| dataset: | |
| type: universal_dependencies | |
| args: mt_mudt | |
| name: Maltese Universal Dependencies Treebank (MUDT) | |
| metrics: | |
| - type: uas | |
| value: 92.31 | |
| name: Unlabelled Attachment Score | |
| - type: las | |
| value: 88.14 | |
| name: Labelled Attachment Score | |
| - task: | |
| type: part-of-speech-tagging | |
| name: Part-of-Speech Tagging | |
| dataset: | |
| type: mlrs_pos | |
| name: MLRS POS dataset | |
| metrics: | |
| - type: accuracy | |
| value: 98.58 | |
| name: UPOS Accuracy | |
| args: upos | |
| - type: accuracy | |
| value: 98.54 | |
| name: XPOS Accuracy | |
| args: xpos | |
| - task: | |
| type: named-entity-recognition | |
| name: Named Entity Recognition | |
| dataset: | |
| type: wikiann | |
| name: WikiAnn (Maltese) | |
| args: mt | |
| metrics: | |
| - type: f1 | |
| args: span | |
| value: 86.77 | |
| name: Span-based F1 | |
| - task: | |
| type: sentiment-analysis | |
| name: Sentiment Analysis | |
| dataset: | |
| type: mt-sentiment-analysis | |
| name: Maltese Sentiment Analysis Dataset | |
| metrics: | |
| - type: f1 | |
| args: macro | |
| value: 78.96 | |
| name: Macro-averaged F1 | |
| license: cc-by-nc-sa-4.0 | |
| widget: | |
| - text: "Malta hija gżira fil-[MASK]." | |
| # BERTu | |
| A Maltese monolingual model pre-trained from scratch on the Korpus Malti v4.0 using the BERT (base) architecture. | |
| ## License | |
| This work is licensed under a | |
| [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. | |
| Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). | |
| [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] | |
| [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
| [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png | |
| ## Citation | |
| This work was first presented in [Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese](https://aclanthology.org/2022.deeplo-1.10/). | |
| Cite it as follows: | |
| ```bibtex | |
| @inproceedings{BERTu, | |
| title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese", | |
| author = "Micallef, Kurt and | |
| Gatt, Albert and | |
| Tanti, Marc and | |
| van der Plas, Lonneke and | |
| Borg, Claudia", | |
| booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing", | |
| month = jul, | |
| year = "2022", | |
| address = "Hybrid", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2022.deeplo-1.10", | |
| doi = "10.18653/v1/2022.deeplo-1.10", | |
| pages = "90--101", | |
| } | |
| ``` | |