Update README.md

a2931e8 verified about 1 year ago

4.69 kB

	---
	library_name: transformers
	tags: [translation, hinglish, LoRA, NLP]
	---

	# Model Card for English to Hinglish Translation Model

	## Model Details

	### Model Description

	This is a fine-tuned T5-small model for translating English sentences into Hinglish (a mix of Hindi and English written in Latin script). The model was trained using LoRA (Low-Rank Adaptation) to optimize training efficiency.

	- Developed by: Team AI-Pradarshan(Rashmi Rai, Ayesha, Bitasta)
	- Funded by [optional]: [More Information Needed]
	- Shared by [optional]: [Your Hugging Face Username]
	- Model type: Sequence-to-Sequence Language Model
	- Language(s) (NLP): English, Hinglish
	- License: MIT
	- Finetuned from model [optional]: google-t5/t5-small

	### Model Sources [optional]

	- Repository: [https://huggingface.co/rairashmi/hinglish_translation_lora](https://huggingface.co/rairashmi/hinglish_translation_lora)
	- Dataset: [rairashmi/en-to-hinglish-dataset](https://huggingface.co/datasets/rairashmi/en-to-hinglish-dataset)

	## Uses

	### Direct Use

	This model can be used to translate English sentences into Hinglish text directly via Hugging Face Transformers.

	### Downstream Use [optional]

	The model can be fine-tuned further or integrated into conversational AI systems and chatbots.

	### Out-of-Scope Use

	- This model is not designed for real-time conversational applications.
	- It may not perform well on non-standard or highly domain-specific English text.

	## Bias, Risks, and Limitations

	- The dataset used may contain inherent biases in Hinglish translation styles.
	- Accuracy may vary for different dialects and sentence structures.

	### Recommendations

	Users should be aware of translation inconsistencies and verify translations for critical applications.

	## How to Get Started with the Model

	```python
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

	model_name = "rairashmi/hinglish_translation_lora"
	model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	def translate_english_to_hinglish(text):
	inputs = tokenizer(f"translate English to Hinglish: {text}", return_tensors="pt", padding=True, truncation=True)
	outputs = model.generate(**inputs)
	return tokenizer.decode(outputs[0], skip_special_tokens=True)

	sentence = "How are you?"
	translation = translate_english_to_hinglish(sentence)
	print(f"🔹 English: {sentence}")
	print(f"🟢 Hinglish: {translation}")
	```

	## Training Details

	### Training Data

	The model was trained on the rairashmi/en-to-hinglish-dataset, a parallel corpus of English-Hinglish text pairs.

	### Training Procedure

	#### Preprocessing [optional]

	- Tokenized using the T5 tokenizer
	- Padding and truncation applied with a max length of 128

	#### Training Hyperparameters

	- Learning Rate: 2e-5
	- Batch Size: 8
	- Epochs: 2
	- Mixed Precision: FP16

	#### Speeds, Sizes, Times [optional]

	- Training took approximately X hours on an A100 GPU
	- Model size: T5-Small with LoRA adapters

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	- Evaluated on a held-out validation split of the dataset.

	#### Factors

	- Evaluated across different sentence lengths and complexities.

	#### Metrics

	- BLEU Score: X.XX (Evaluated using `sacrebleu`)

	### Results

	- The model achieves X.XX BLEU Score on the test set.

	## Model Examination [optional]

	[More Information Needed]

	## Environmental Impact

	- Hardware Type: A100 GPU
	- Hours used: X
	- Cloud Provider: [More Information Needed]
	- Compute Region: [More Information Needed]
	- Carbon Emitted: [More Information Needed]

	## Technical Specifications [optional]

	### Model Architecture and Objective

	- The model is based on T5-small architecture fine-tuned for machine translation.

	### Compute Infrastructure

	#### Hardware

	- Training was performed on a single A100 GPU

	#### Software

	- Transformers, Datasets, PEFT, Accelerate, Evaluate, Torch

	## Citation [optional]

	BibTeX:
	```bibtex
	@misc{hinglish_translation,
	author = {Your Name},
	title = {English to Hinglish Translation Model},
	year = {2025},
	url = {https://huggingface.co/rairashmi/hinglish_translation_lora}
	}
	```

	## Glossary [optional]

	- Hinglish: A mix of Hindi and English written in Latin script.

	## More Information [optional]

	For further details, check out the [Hugging Face Model Page](https://huggingface.co/rairashmi/hinglish_translation_lora).

	## Model Card Authors [optional]

	- [Your Name or Organization]

	## Model Card Contact

	For any issues or questions, contact [Your Contact Information].