SentenceTransformer based on Alibaba-NLP/gte-multilingual-base

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-multilingual-base on the mitrasamgraha dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-multilingual-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • mitrasamgraha

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sanganaka/gte-multilingual-base-sanskritFT")
# Run inference
sentences = [
    'O Śākyamuni, conquering the powerful host of Māra, You found peace, immortality, and the happiness of that supreme enlightenment',
    'मारस् त्वयास्तु विजितस् सबलो मुनीन्द्रः प्राप्ता शिवा अमृतशान्तवराग्रबोधिः ।',
    'न हि तथता द्वयप्रभाविता नानात्वप्रभाविता ।',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Translation

Metric Value
src2trg_accuracy 0.923
trg2src_accuracy 0.9194
mean_accuracy 0.9212

Translation

Metric Value
src2trg_accuracy 0.909
trg2src_accuracy 0.9013
mean_accuracy 0.9052

Training Details

Training Dataset

mitrasamgraha

  • Dataset: mitrasamgraha
  • Size: 477,170 training samples
  • Columns: english and sanskrit_Deva
  • Approximate statistics based on the first 1000 samples:
    english sanskrit_Deva
    type string string
    details
    • min: 20 tokens
    • mean: 43.11 tokens
    • max: 90 tokens
    • min: 19 tokens
    • mean: 33.88 tokens
    • max: 78 tokens
  • Samples:
    english sanskrit_Deva
    My patience is almost worn out, like that of a creeper under the winter frost. It is decayed, and neither lives nor perishes at once. जर्जरीकृत्य वस्तूनि त्यजन्ती विभ्रती तथा । मार्गशीर्षान्तवल्लीव धृतिर्विधुरतां गता ॥
    Our minds are partly settled in worldly things, and partly fixed in their giver (the Supreme soul). This divided state of the mind is termed its half waking condition. अपहस्तितसर्वार्थमनवस्थितिरास्थिता । गृहीत्वोत्सृज्य चात्मानं भवस्थितिरवस्थिता ॥
    My mind is in a state of suspense, being unable to ascertain the real nature of my soul. I am like one in the dark, who is deceived by the stump of a fallen tree at a distance, to think it a human figure. चलिताचलितेनान्तरवष्टम्भेन मे मतिः । दरिद्रा छिन्नवृक्षस्य मूलेनेव विडम्ब्यते ॥
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

mitrasamgraha

  • Dataset: mitrasamgraha
  • Size: 5,560 evaluation samples
  • Columns: english and sanskrit_Deva
  • Approximate statistics based on the first 1000 samples:
    english sanskrit_Deva
    type string string
    details
    • min: 5 tokens
    • mean: 58.68 tokens
    • max: 387 tokens
    • min: 6 tokens
    • mean: 44.88 tokens
    • max: 257 tokens
  • Samples:
    english sanskrit_Deva
    Thereupon he takes the winnowing basket and the Agnihotra ladle , with the text : 'For the work (I take) you, for pervasion (or accomplishment) you two! ' For the sacrifice is a work: hence, in saying 'for the work you two, ' he says, 'for the sacrifice. ' And 'for pervasion you two, ' he says, because he, as it were, pervades (goes through, accomplishes) the sacrifice. He then restrains his speech; for (restrained) speech means undisturbed sacrifice; so that (in so doing) he thinks: 'May I accomplish the sacrifice! ' He now heats (the two objects on the Grhapatya), with the formula : 'Scorched is the Rakshas, scorched are the enemies! ' or : 'Burnt out is the Rakshas, burnt out are the enemies! ' अग्ने व्रतपते व्रतं चरिष्यामि तचकेयं तन्मे राध्यतामित्यग्निर्वै देवानां व्रतपतिस्तस्मा एवैतत्प्राह व्रतं चरिष्यामि तच्चकेयं तन्मे राध्यतामिति नात्र तिरोहितमिवास्ति ॥ अथ संस्थिते विसृजते । अग्ने व्रतपते व्रतमचारिषं तादशकम् तन्मे राधीत्यशकद्येतद्यो यज्ञस्य संस्थामगन्नराधि ह्यस्मै यो यज्ञस्य संस्थामगन्नेतेन न्वेव भूयिष्ठा इव व्रतमुपयन्त्यनेन त्वेवोपेयात् ॥ द्वयं वा इदं न तृतीयमस्ति ।
    For the gods, when they were performing the sacrifice, were afraid of a disturbance on the part of the Asuras and Rakshas: hence by this means he expels from here, at the very opening of the sacrifice, the evil spirits, the Rakshas. एतद्धवै देवा व्रतं चरन्ति यत्सत्यं तस्मात्ते यशो यशो ह भवति य एवं विद्वांत्सत्यंवदति ॥ अथ संस्थिते विसृजते ।
    He now steps forward (to the cart ), with the text : 'I move along the wide arial realm. ' For the Rakshas roams about in the air, rootless and unfettered in both directions (below and above); and in order that this man (the Adhvaryu) may move about the air, rootless and unfettered in both directions, he by this very prayer renders the atmosphere free from danger and evil spirits. स वा आरण्यमेवाश्नीयात् । या वारण्या ओषधयो यद्वा वृक्ष्यं तदु ह स्माहापि बर्कुर्वार्ष्णो मासान्मे पचत न वा एतेसां हविर्गृह्णन्तीति तदु तथा न कुर्याद्व्रीहियवयोर्वा एतदुपजं यचमीधान्यं तद्व्रीहियवावेवैतेन भूयांसौ करोति तस्मादारण्यमेवाश्नीयात् ॥
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 256
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss translate-val_mean_accuracy translate-test_mean_accuracy
0 0 - - 0.1199 -
0.0671 500 2.6703 - - -
0.1341 1000 1.0365 - - -
0.2012 1500 0.7332 - - -
0.2682 2000 1.1093 - - -
0.3353 2500 1.2023 - - -
0.4024 3000 0.881 - - -
0.4694 3500 0.6805 - - -
0.5365 4000 0.4721 - - -
0.6035 4500 0.5204 - - -
0.6706 5000 0.826 - - -
0.7377 5500 0.4227 - - -
0.8047 6000 0.6064 - - -
0.8718 6500 0.5305 - - -
0.9388 7000 0.2833 - - -
1.0 7456 - 0.4987 0.8395 -
1.0059 7500 0.5046 - - -
1.0730 8000 0.6544 - - -
1.1400 8500 0.1318 - - -
1.2071 9000 0.2052 - - -
1.2741 9500 0.4326 - - -
1.3412 10000 0.8093 - - -
1.4083 10500 0.359 - - -
1.4753 11000 0.3121 - - -
1.5424 11500 0.22 - - -
1.6094 12000 0.3279 - - -
1.6765 12500 0.5532 - - -
1.7436 13000 0.1995 - - -
1.8106 13500 0.4978 - - -
1.8777 14000 0.2835 - - -
1.9447 14500 0.2226 - - -
2.0 14912 - 0.3390 0.8889 -
2.0118 15000 0.3366 - - -
2.0789 15500 0.3983 - - -
2.1459 16000 0.069 - - -
2.2130 16500 0.155 - - -
2.2800 17000 0.3068 - - -
2.3471 17500 0.6613 - - -
2.4142 18000 0.2298 - - -
2.4812 18500 0.2289 - - -
2.5483 19000 0.1568 - - -
2.6153 19500 0.2711 - - -
2.6824 20000 0.4366 - - -
2.7495 20500 0.1444 - - -
2.8165 21000 0.5052 - - -
2.8836 21500 0.1476 - - -
2.9506 22000 0.2083 - - -
3.0 22368 - 0.2841 0.9085 -
3.0177 22500 0.2726 - - -
3.0848 23000 0.2729 - - -
3.1518 23500 0.0443 - - -
3.2189 24000 0.1391 - - -
3.2859 24500 0.2466 - - -
3.3530 25000 0.5791 - - -
3.4201 25500 0.1796 - - -
3.4871 26000 0.1869 - - -
3.5542 26500 0.1222 - - -
3.6212 27000 0.2651 - - -
3.6883 27500 0.3499 - - -
3.7554 28000 0.1205 - - -
3.8224 28500 0.4764 - - -
3.8895 29000 0.107 - - -
3.9565 29500 0.2057 - - -
4.0 29824 - 0.2578 0.9184 -
4.0236 30000 0.2405 - - -
4.0907 30500 0.2041 - - -
4.1577 31000 0.0337 - - -
4.2248 31500 0.1362 - - -
4.2918 32000 0.2321 - - -
4.3589 32500 0.507 - - -
4.4260 33000 0.1695 - - -
4.4930 33500 0.1579 - - -
4.5601 34000 0.1044 - - -
4.6271 34500 0.2514 - - -
4.6942 35000 0.3157 - - -
4.7613 35500 0.1305 - - -
4.8283 36000 0.4673 - - -
4.8954 36500 0.0763 - - -
4.9624 37000 0.222 - - -
5.0 37280 - 0.2479 0.9212 0.9052

Framework Versions

  • Python: 3.12.4
  • Sentence Transformers: 3.2.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sanganaka/gte-multilingual-base-sanskritFT

Finetuned
(94)
this model

Papers for sanganaka/gte-multilingual-base-sanskritFT

Evaluation results