Instructions to use Venkat9990/finance-specialist-v7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Venkat9990/finance-specialist-v7 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Venkat9990/finance-specialist-v7", filename="gguf/finance-specialist-v7-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Venkat9990/finance-specialist-v7 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Venkat9990/finance-specialist-v7:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Venkat9990/finance-specialist-v7:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Venkat9990/finance-specialist-v7:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Venkat9990/finance-specialist-v7:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Venkat9990/finance-specialist-v7:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Venkat9990/finance-specialist-v7:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Venkat9990/finance-specialist-v7:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Venkat9990/finance-specialist-v7:Q4_K_M
Use Docker
docker model run hf.co/Venkat9990/finance-specialist-v7:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Venkat9990/finance-specialist-v7 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Venkat9990/finance-specialist-v7" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Venkat9990/finance-specialist-v7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Venkat9990/finance-specialist-v7:Q4_K_M
- Ollama
How to use Venkat9990/finance-specialist-v7 with Ollama:
ollama run hf.co/Venkat9990/finance-specialist-v7:Q4_K_M
- Unsloth Studio new
How to use Venkat9990/finance-specialist-v7 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Venkat9990/finance-specialist-v7 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Venkat9990/finance-specialist-v7 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Venkat9990/finance-specialist-v7 to start chatting
- Docker Model Runner
How to use Venkat9990/finance-specialist-v7 with Docker Model Runner:
docker model run hf.co/Venkat9990/finance-specialist-v7:Q4_K_M
- Lemonade
How to use Venkat9990/finance-specialist-v7 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Venkat9990/finance-specialist-v7:Q4_K_M
Run and chat with the model
lemonade run user.finance-specialist-v7-Q4_K_M
List all available models
lemonade list
Finance Specialist v7
A fine-tuned Llama 3.2 1B Instruct model specialized for finance conversations, trained with knowledge-preserving LoRA techniques using llm-forge.
Model Details
| Property | Value |
|---|---|
| Base Model | unsloth/Llama-3.2-1B-Instruct |
| Parameters | 1.24B (1.7M trainable via LoRA) |
| Training Method | LoRA (r=8, alpha=16, attention-only) |
| Training Data | Josephgflowers/Finance-Instruct-500k |
| Samples Used | 5,675 (20K loaded, 72% removed by data cleaning pipeline) |
| Training Time | 6 min 52 sec on 1x NVIDIA A100 80GB |
| License | Apache 2.0 |
Key Design: Zero Catastrophic Forgetting
This model was carefully tuned to add finance conversational ability without destroying the base model's general knowledge. Previous versions (v1-v6) suffered from catastrophic forgetting. v7 fixes this with:
- LoRA r=8 (minimal weight perturbation)
- Attention-only targets (q/k/v/o_proj) — MLP reasoning layers untouched
- Learning rate 1e-5 (5x lower than v6)
- Data cleaning (removed 72% of noisy/duplicate training samples)
- No NEFTune noise (amplified forgetting on small datasets)
- Single epoch (no overfitting)
Benchmark Results
General Knowledge Preservation (v7 vs Base)
| Benchmark | Base | v7 | Delta | Verdict |
|---|---|---|---|---|
| MMLU (57 subjects, 5-shot) | 46.05% | 45.86% | -0.19% | Minimal |
| GSM8K (math reasoning) | 33.59% | 31.99% | -1.60% | Minimal |
| IFEval (instruction following) | 43.07% | 41.04% | -2.03% | Moderate |
| ARC Challenge | 37.88% | 37.97% | +0.09% | Preserved |
| ARC Easy | 68.81% | 68.35% | -0.46% | Minimal |
| HellaSwag | 61.59% | 60.88% | -0.71% | Minimal |
| Winogrande | 61.80% | 61.88% | +0.08% | Preserved |
| TruthfulQA MC2 | 43.37% | 42.52% | -0.85% | Minimal |
Finance Domain (v7 vs Base)
| Benchmark | Base | v7 | Delta |
|---|---|---|---|
| MMLU Business Ethics | 49.00% | 49.00% | 0.00% |
| MMLU Econometrics | 28.95% | 28.95% | 0.00% |
| MMLU Prof. Accounting | 35.11% | 35.46% | +0.35% |
Comparison with v6 (which had catastrophic forgetting)
| Benchmark | v6 | v7 | Recovery |
|---|---|---|---|
| GSM8K | 6.07% | 31.99% | +25.92 pts |
| IFEval | 25.32% | 41.04% | +15.72 pts |
| MMLU | 38.67% | 45.86% | +7.19 pts |
| Business Ethics | 28.00% | 49.00% | +21.00 pts |
Usage
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"Venkat9990/finance-specialist-v7",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Venkat9990/finance-specialist-v7")
messages = [
{"role": "system", "content": "You are a finance specialist AI assistant."},
{"role": "user", "content": "What is a bond yield curve inversion?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=256, temperature=0.1, top_p=0.9)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
With Ollama (GGUF)
# Download GGUF and Modelfile from this repo, then:
ollama create finance-specialist-v7 -f Modelfile
ollama run finance-specialist-v7
Training Configuration
model:
name: unsloth/Llama-3.2-1B-Instruct
max_seq_length: 2048
torch_dtype: bf16
lora:
r: 8
alpha: 16
target_modules: [q_proj, v_proj, k_proj, o_proj]
use_rslora: false
training:
mode: lora
learning_rate: 1.0e-5
num_epochs: 1
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
gradient_checkpointing: true
assistant_only_loss: true
completion_only_loss: true
neftune_noise_alpha: null
label_smoothing_factor: 0.0
data:
train_path: Josephgflowers/Finance-Instruct-500k
format: sharegpt
max_samples: 20000
cleaning:
enabled: true
quality_preset: permissive
dedup_enabled: true
Training Metrics
- Train loss: 2.16 → 0.72 (avg 1.569)
- Eval loss: 1.326
- Token accuracy: 67.8% (eval)
- Masked tokens: 97.7%
- Hardware: 1x NVIDIA A100 80GB (Hopper HPC)
Built With
llm-forge — Config-driven, YAML-first open-source LLM training platform.
Author
Naga Venkata Sai Chennu (@Venkat9990) — George Mason University
- Downloads last month
- 44
Model tree for Venkat9990/finance-specialist-v7
Base model
meta-llama/Llama-3.2-1B-InstructDataset used to train Venkat9990/finance-specialist-v7
Evaluation results
- accuracy on MMLUself-reported45.860
- accuracy (normalized) on ARC Challengeself-reported37.970
- accuracy (normalized) on HellaSwagself-reported60.880
- exact match on GSM8Kself-reported31.990