DATALORA: Dynamic Adaptive Token-Level Optimization with Rank-Adaptive LoRA

DATALORA is a novel training optimization method that combines three key innovations into a unified framework for efficient language model fine-tuning:

  1. Unified Saliency Network (USN) β€” Joint token importance scoring and expert routing
  2. Mixture of LoRA Experts β€” 8 specialized LoRA adapters with dynamic routing
  3. Dynamic Token Pruning β€” Curriculum-based token retention scheduling

Model Details

Property Value
Base Model mistralai/Mistral-7B-v0.3
Method DATALORA (USN + MoLoRA + Token Pruning)
Training Data Open-Orca/OpenOrca (5K samples)
LoRA Experts 8
LoRA Rank 16
Target Retention 50%
Quantization 4-bit (NF4)
Training Epochs 3
Curriculum Warmup β†’ Sparsification β†’ Hardening

Architecture

Input Tokens
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Unified Saliency Network  β”‚
β”‚   (Shared backbone)         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Token β”‚ Router   β”‚ Rank      β”‚
β”‚Scoresβ”‚ Logits   β”‚ Scale     β”‚
β””β”€β”€β”¬β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
   β”‚        β”‚           β”‚
   β–Ό        β–Ό           β–Ό
Token    Expert      Dynamic
Pruning  Selection   Rank Scaling
   β”‚        β”‚           β”‚
   β–Ό        β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Mixture of LoRA Experts    β”‚
β”‚  (8 specialized adapters)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Insight

Token importance and task complexity are correlated signals. Complex inputs need more tokens AND stronger adaptation. The USN learns this correlation jointly, enabling:

  • Efficient computation via token pruning (skip unimportant tokens)
  • Specialized adaptation via expert routing (different experts for different input types)
  • Adaptive capacity via rank scaling (more capacity for harder inputs)

Training Details

Curriculum Learning (3 Phases)

  1. Warmup (Epoch 1): Dense training, high temperature (soft decisions), all tokens retained
  2. Sparsification (Epoch 2): Gradually increase token pruning, temperature annealing
  3. Hardening (Epoch 3): Low temperature (discrete decisions), target 50% retention

Loss Function

L_total = L_LM + Ξ»_prune * L_retention + Ξ±_balance * L_balance
  • L_LM: Standard language modeling loss
  • L_retention: Encourages target token retention ratio
  • L_balance: Encourages uniform expert utilization

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load base model
tokenizer = AutoTokenizer.from_pretrained("tugrulkaya/datalora-mistral-7b")
model = AutoModelForCausalLM.from_pretrained(
    "tugrulkaya/datalora-mistral-7b",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Generate
prompt = "What is machine learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

DATALORA Framework

For training your own DATALORA models, see the full framework:

Components

  • UnifiedSaliencyNetwork: Joint token scoring + expert routing + rank scaling
  • SimpleMixtureOfExperts: 8 LoRA experts with weighted combination
  • CurriculumScheduler: 3-phase training with temperature annealing
  • DATALORATrainer: Extended HuggingFace Trainer with curriculum integration

Training Configuration

from models import DATALORAConfig

config = DATALORAConfig(
    base_model="mistralai/Mistral-7B-v0.3",
    num_lora_experts=8,        # Number of LoRA experts
    lora_rank=16,              # Base LoRA rank
    lora_alpha=32,             # LoRA scaling factor
    target_retention=0.5,      # Target 50% token retention
    num_active_experts=2,      # Top-K expert selection
    temperature_init=5.0,      # Initial Gumbel-softmax temperature
    temperature_final=0.5,     # Final temperature
)

Citation

@misc{kaya2025datalora,
  title={DATALORA: Dynamic Adaptive Token-Level Optimization with Rank-Adaptive LoRA},
  author={Kaya, Tuğrul},
  year={2025},
  url={https://huggingface.co/tugrulkaya/datalora-mistral-7b}
}

License

Apache 2.0 (inherited from Mistral-7B-v0.3)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tugrulkaya/datalora-mistral-7b

Adapter
(348)
this model

Dataset used to train tugrulkaya/datalora-mistral-7b