Instructions to use Venkat9990/finance-specialist-v7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Venkat9990/finance-specialist-v7 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Venkat9990/finance-specialist-v7",
	filename="gguf/finance-specialist-v7-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Venkat9990/finance-specialist-v7 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Venkat9990/finance-specialist-v7:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Venkat9990/finance-specialist-v7:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Venkat9990/finance-specialist-v7:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Venkat9990/finance-specialist-v7:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Venkat9990/finance-specialist-v7:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Venkat9990/finance-specialist-v7:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Venkat9990/finance-specialist-v7:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Venkat9990/finance-specialist-v7:Q4_K_M

Use Docker

docker model run hf.co/Venkat9990/finance-specialist-v7:Q4_K_M

LM Studio
Jan

vLLM

How to use Venkat9990/finance-specialist-v7 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Venkat9990/finance-specialist-v7"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Venkat9990/finance-specialist-v7",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Venkat9990/finance-specialist-v7:Q4_K_M

Ollama
How to use Venkat9990/finance-specialist-v7 with Ollama:
```
ollama run hf.co/Venkat9990/finance-specialist-v7:Q4_K_M
```

Unsloth Studio new

How to use Venkat9990/finance-specialist-v7 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Venkat9990/finance-specialist-v7 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Venkat9990/finance-specialist-v7 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Venkat9990/finance-specialist-v7 to start chatting

Docker Model Runner
How to use Venkat9990/finance-specialist-v7 with Docker Model Runner:
```
docker model run hf.co/Venkat9990/finance-specialist-v7:Q4_K_M
```

Lemonade

How to use Venkat9990/finance-specialist-v7 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Venkat9990/finance-specialist-v7:Q4_K_M

Run and chat with the model

lemonade run user.finance-specialist-v7-Q4_K_M

List all available models

lemonade list

Finance Specialist v7

A fine-tuned Llama 3.2 1B Instruct model specialized for finance conversations, trained with knowledge-preserving LoRA techniques using llm-forge.

Model Details

Property	Value
Base Model	unsloth/Llama-3.2-1B-Instruct
Parameters	1.24B (1.7M trainable via LoRA)
Training Method	LoRA (r=8, alpha=16, attention-only)
Training Data	Josephgflowers/Finance-Instruct-500k
Samples Used	5,675 (20K loaded, 72% removed by data cleaning pipeline)
Training Time	6 min 52 sec on 1x NVIDIA A100 80GB
License	Apache 2.0

Key Design: Zero Catastrophic Forgetting

This model was carefully tuned to add finance conversational ability without destroying the base model's general knowledge. Previous versions (v1-v6) suffered from catastrophic forgetting. v7 fixes this with:

LoRA r=8 (minimal weight perturbation)
Attention-only targets (q/k/v/o_proj) — MLP reasoning layers untouched
Learning rate 1e-5 (5x lower than v6)
Data cleaning (removed 72% of noisy/duplicate training samples)
No NEFTune noise (amplified forgetting on small datasets)
Single epoch (no overfitting)

Benchmark Results

General Knowledge Preservation (v7 vs Base)

Benchmark	Base	v7	Delta	Verdict
MMLU (57 subjects, 5-shot)	46.05%	45.86%	-0.19%	Minimal
GSM8K (math reasoning)	33.59%	31.99%	-1.60%	Minimal
IFEval (instruction following)	43.07%	41.04%	-2.03%	Moderate
ARC Challenge	37.88%	37.97%	+0.09%	Preserved
ARC Easy	68.81%	68.35%	-0.46%	Minimal
HellaSwag	61.59%	60.88%	-0.71%	Minimal
Winogrande	61.80%	61.88%	+0.08%	Preserved
TruthfulQA MC2	43.37%	42.52%	-0.85%	Minimal

Finance Domain (v7 vs Base)

Benchmark	Base	v7	Delta
MMLU Business Ethics	49.00%	49.00%	0.00%
MMLU Econometrics	28.95%	28.95%	0.00%
MMLU Prof. Accounting	35.11%	35.46%	+0.35%

Comparison with v6 (which had catastrophic forgetting)

Benchmark	v6	v7	Recovery
GSM8K	6.07%	31.99%	+25.92 pts
IFEval	25.32%	41.04%	+15.72 pts
MMLU	38.67%	45.86%	+7.19 pts
Business Ethics	28.00%	49.00%	+21.00 pts

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Venkat9990/finance-specialist-v7",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Venkat9990/finance-specialist-v7")

messages = [
    {"role": "system", "content": "You are a finance specialist AI assistant."},
    {"role": "user", "content": "What is a bond yield curve inversion?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=256, temperature=0.1, top_p=0.9)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

With Ollama (GGUF)

# Download GGUF and Modelfile from this repo, then:
ollama create finance-specialist-v7 -f Modelfile
ollama run finance-specialist-v7

Training Configuration

model:
  name: unsloth/Llama-3.2-1B-Instruct
  max_seq_length: 2048
  torch_dtype: bf16

lora:
  r: 8
  alpha: 16
  target_modules: [q_proj, v_proj, k_proj, o_proj]
  use_rslora: false

training:
  mode: lora
  learning_rate: 1.0e-5
  num_epochs: 1
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 8
  gradient_checkpointing: true
  assistant_only_loss: true
  completion_only_loss: true
  neftune_noise_alpha: null
  label_smoothing_factor: 0.0

data:
  train_path: Josephgflowers/Finance-Instruct-500k
  format: sharegpt
  max_samples: 20000
  cleaning:
    enabled: true
    quality_preset: permissive
    dedup_enabled: true

Training Metrics

Train loss: 2.16 → 0.72 (avg 1.569)
Eval loss: 1.326
Token accuracy: 67.8% (eval)
Masked tokens: 97.7%
Hardware: 1x NVIDIA A100 80GB (Hopper HPC)

Built With

llm-forge — Config-driven, YAML-first open-source LLM training platform.

Author

Naga Venkata Sai Chennu (@Venkat9990) — George Mason University

Downloads last month: 44

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for Venkat9990/finance-specialist-v7

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

unsloth/Llama-3.2-1B-Instruct

Adapter

(403)

this model

Dataset used to train Venkat9990/finance-specialist-v7

Evaluation results

accuracy on MMLU
self-reported

45.860
accuracy (normalized) on ARC Challenge
self-reported

37.970
accuracy (normalized) on HellaSwag
self-reported

60.880
exact match on GSM8K
self-reported

31.990