DATALORA: Dynamic Adaptive Token-Level Optimization with Rank-Adaptive LoRA

DATALORA is a novel training optimization method that combines three key innovations into a unified framework for efficient language model fine-tuning:

Unified Saliency Network (USN) — Joint token importance scoring and expert routing
Mixture of LoRA Experts — 8 specialized LoRA adapters with dynamic routing
Dynamic Token Pruning — Curriculum-based token retention scheduling

Model Details

Property	Value
Base Model	`mistralai/Mistral-7B-v0.3`
Method	DATALORA (USN + MoLoRA + Token Pruning)
Training Data	Open-Orca/OpenOrca (5K samples)
LoRA Experts	8
LoRA Rank	16
Target Retention	50%
Quantization	4-bit (NF4)
Training Epochs	3
Curriculum	Warmup → Sparsification → Hardening

Architecture

Input Tokens
    │
    ▼
┌─────────────────────────────┐
│   Unified Saliency Network  │
│   (Shared backbone)         │
├──────┬──────────┬───────────┤
│Token │ Router   │ Rank      │
│Scores│ Logits   │ Scale     │
└──┬───┴────┬─────┴─────┬─────┘
   │        │           │
   ▼        ▼           ▼
Token    Expert      Dynamic
Pruning  Selection   Rank Scaling
   │        │           │
   ▼        ▼           ▼
┌─────────────────────────────┐
│  Mixture of LoRA Experts    │
│  (8 specialized adapters)   │
└─────────────────────────────┘

Key Insight

Token importance and task complexity are correlated signals. Complex inputs need more tokens AND stronger adaptation. The USN learns this correlation jointly, enabling:

Efficient computation via token pruning (skip unimportant tokens)
Specialized adaptation via expert routing (different experts for different input types)
Adaptive capacity via rank scaling (more capacity for harder inputs)

Training Details

Curriculum Learning (3 Phases)

Warmup (Epoch 1): Dense training, high temperature (soft decisions), all tokens retained
Sparsification (Epoch 2): Gradually increase token pruning, temperature annealing
Hardening (Epoch 3): Low temperature (discrete decisions), target 50% retention

Loss Function

L_total = L_LM + λ_prune * L_retention + α_balance * L_balance

L_LM: Standard language modeling loss
L_retention: Encourages target token retention ratio
L_balance: Encourages uniform expert utilization

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load base model
tokenizer = AutoTokenizer.from_pretrained("tugrulkaya/datalora-mistral-7b")
model = AutoModelForCausalLM.from_pretrained(
    "tugrulkaya/datalora-mistral-7b",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Generate
prompt = "What is machine learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

DATALORA Framework

For training your own DATALORA models, see the full framework:

Components

UnifiedSaliencyNetwork: Joint token scoring + expert routing + rank scaling
SimpleMixtureOfExperts: 8 LoRA experts with weighted combination
CurriculumScheduler: 3-phase training with temperature annealing
DATALORATrainer: Extended HuggingFace Trainer with curriculum integration

Training Configuration

from models import DATALORAConfig

config = DATALORAConfig(
    base_model="mistralai/Mistral-7B-v0.3",
    num_lora_experts=8,        # Number of LoRA experts
    lora_rank=16,              # Base LoRA rank
    lora_alpha=32,             # LoRA scaling factor
    target_retention=0.5,      # Target 50% token retention
    num_active_experts=2,      # Top-K expert selection
    temperature_init=5.0,      # Initial Gumbel-softmax temperature
    temperature_final=0.5,     # Final temperature
)

Citation

@misc{kaya2025datalora,
  title={DATALORA: Dynamic Adaptive Token-Level Optimization with Rank-Adaptive LoRA},
  author={Kaya, Tuğrul},
  year={2025},
  url={https://huggingface.co/tugrulkaya/datalora-mistral-7b}
}

License

Apache 2.0 (inherited from Mistral-7B-v0.3)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for tugrulkaya/datalora-mistral-7b

Base model

mistralai/Mistral-7B-v0.3

Adapter

(348)

this model

tugrulkaya
/

datalora-mistral-7b