QLoRA: Efficient Finetuning of Quantized LLMs
Paper
• 2305.14314 • Published
• 59
This repo contains 8-bit quantized (using bitsandbytes) model Mistral AI_'s Mistral-7B-Instruct-v0.2
QLoRA: Efficient Finetuning of Quantized LLMs: arXiv - QLoRA: Efficient Finetuning of Quantized LLMs
Hugging Face Blog post on 8-bit quantization using bitsandbytes: A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes
bitsandbytes github repo: bitsandbytes github repo
Use the code below to get started with the model.
!pip install --quiet bitsandbytes
!pip install --quiet --upgrade transformers # Install latest version of transformers
!pip install --quiet --upgrade accelerate
!pip install --quiet sentencepiece
pip install flash-attn --no-build-isolation
import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM
model_id_mistral = "alokabhishek/Mistral-7B-Instruct-v0.2-bnb-8bit"
tokenizer_mistral = AutoTokenizer.from_pretrained(model_id_mistral, use_fast=True)
model_mistral = AutoModelForCausalLM.from_pretrained(
model_id_mistral,
device_map="auto"
)
pipe_mistral = pipeline(model=model_mistral, tokenizer=tokenizer_mistral, task='text-generation')
prompt_mistral = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
output_mistral = pipe_llama(prompt_mistral, max_new_tokens=512)
print(output_mistral[0]["generated_text"])
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]