Introducing cosmosGPT: Monolingual Training for Turkish Language Models
Paper
β’ 2404.17336 β’ Published
β’ 5
This is a Turkish GPT-2 large model. GPT-2 is designed for text generation tasks, providing the ability to continue a given text snippet in a coherent and contextually relevant manner. Due to the diverse nature of the training data, which includes websites, books, and other text sources, this model can exhibit biases. Users should be aware of these biases and use the model responsibly.
from transformers import AutoTokenizer, GPT2LMHeadModel
from transformers import pipeline
model = GPT2LMHeadModel.from_pretrained("ytu-ce-cosmos/turkish-gpt2-large")
tokenizer = AutoTokenizer.from_pretrained("ytu-ce-cosmos/turkish-gpt2-large")
text_generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
r = text_generator("Teknolojinin geliΕimi hayatΔ±mΔ±zΔ± ΓΆnemli ΓΆlΓ§ΓΌde etkiledi. ", max_length=100)
[{'generated_text': 'Teknolojinin geliΕimi hayatΔ±mΔ±zΔ± ΓΆnemli ΓΆlΓ§ΓΌde etkiledi. "Sosyal aΔ" adΔ±nΔ± verdiΔimiz yeni bir iletiΕim Γ§aΔΔ±mΔ±z oluΕtu. '}]
Relevant information can be found in the paper.
@article{kesgin2024introducing,
title={Introducing cosmosGPT: Monolingual Training for Turkish Language Models},
author={Kesgin, H Toprak and Yuce, M Kaan and Dogan, Eren and Uzun, M Egemen and Uz, Atahan and Seyrek, H Emre and Zeer, Ahmed and Amasyali, M Fatih},
journal={arXiv preprint arXiv:2404.17336},
year={2024}
}
COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department
https://cosmos.yildiz.edu.tr/
cosmos@yildiz.edu.tr