Instructions to use AbdulrhmanEldeeb/lora_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AbdulrhmanEldeeb/lora_model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AbdulrhmanEldeeb/lora_model")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AbdulrhmanEldeeb/lora_model", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AbdulrhmanEldeeb/lora_model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AbdulrhmanEldeeb/lora_model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AbdulrhmanEldeeb/lora_model", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AbdulrhmanEldeeb/lora_model
- SGLang
How to use AbdulrhmanEldeeb/lora_model with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AbdulrhmanEldeeb/lora_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AbdulrhmanEldeeb/lora_model", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AbdulrhmanEldeeb/lora_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AbdulrhmanEldeeb/lora_model", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use AbdulrhmanEldeeb/lora_model with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AbdulrhmanEldeeb/lora_model to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AbdulrhmanEldeeb/lora_model to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AbdulrhmanEldeeb/lora_model to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="AbdulrhmanEldeeb/lora_model", max_seq_length=2048, ) - Docker Model Runner
How to use AbdulrhmanEldeeb/lora_model with Docker Model Runner:
docker model run hf.co/AbdulrhmanEldeeb/lora_model
YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
Fine-Tuned Llama Model for Metallurgy and Materials Science
- Developed by: Abdulrhman37
- License: Apache-2.0
- Base Model: unsloth/meta-llama-3.1-8b-bnb-4bit
This fine-tuned Llama model specializes in metallurgy, materials science, and engineering. It has been enhanced to provide precise and detailed responses to technical queries, making it a valuable tool for professionals, researchers, and enthusiasts in the field.
π οΈ Training Details
This model was fine-tuned with:
- Unsloth: Enabled 2x faster training using efficient parameter optimization.
- Hugging Face TRL: Used for advanced fine-tuning and training capabilities.
Fine-tuning focused on enhancing domain-specific knowledge using a dataset curated from various metallurgical research and practical case studies.
For a detailed walkthrough of the fine-tuning process, refer to this notebook.
π Features
- Supports text generation with scientific and technical insights.
- Provides domain-specific reasoning with references to key metallurgical principles and mechanisms.
- Built for fast inference with bnb-4bit quantization for optimized performance.
π Example Use Cases
- Material property analysis (e.g., "How does adding rare earth elements affect magnesium alloys?").
- Failure mechanism exploration (e.g., "What causes porosity in gas metal arc welding?").
- Corrosion prevention methods (e.g., "How does cathodic protection work in marine environments?").
π¦ How to Use
- Install Dependencies:
%%capture
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
- Load the model:
metallurgy_prompt = """You are a highly knowledgeable assistant specializing in metallurgy, materials science,
and engineering. Below is a technical instruction.Your task is to provide an accurate, domain-specific response that appropriately addresses the request.
Ensure Your response is detailed,Provide scientifically rigorous and quantitative responses,Reference fundamental principles and mechanisms,
Include potential equations, calculations, or microstructural insights where relevant,Support statements with scientific reasoning,
Discuss potential variations or alternative interpretations
### Instruction:
{}
### Input:
{}
### Response:
{}"""
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
if True:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Abdulrhman37/lora_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
- Use the fine tunned model:
# function tp process question
def answer(q: str):
"""
Generates a detailed response to a metallurgy-related question using a pre-trained language model.
Args:
q (str): The question or instruction to be answered.
Returns:
str: The generated response from the model, specifically the content after "### Response:".
"""
# Initialize the language model for fast inference
FastLanguageModel.for_inference(model) # Enables 2x faster native inference
# Format the input question using the metallurgy prompt template
inputs = tokenizer(
[
metallurgy_prompt.format(
q, # Instruction: The main question
"", # Input: Empty for now as no specific input is provided
"" # Output: Placeholder for the generated response
)
],
return_tensors="pt" # Return input tensors
).to("cuda") # Transfer tensors to GPU for faster computation
# Generate the model's output based on the formatted input
outputs = model.generate(**inputs, use_cache=True) # Use cached values to speed up decoding
# Decode the model's output into readable text
result = tokenizer.batch_decode(outputs)
# Split the result into sections before and after "### Response:"
split_content = result[0].split("### Response:")
before_response = split_content[0].strip() # Extract content before "Response"
after_response = split_content[1].strip().replace('<|end_of_text|>', '') # Clean up response content
# Prepare a detailed response dictionary for debugging or additional processing
detailed = {
'after_response': after_response, # The main content of the generated response
'before_response': before_response, # Metadata or introductory content before the response
'full_result': result # The full raw output from the model
}
# Return only the generated response content
return detailed['after_response']
# asking model a technical question
q="To improve strength, toughness, and shock-resistance in Mg-Al-Mn system cast magnesium alloys (e.g. AM100A),what should I do ?"
from pprint import pprint
pprint(answer(q))
follow this notebook for help to use the model
π§ Contact
For any inquiries, feedback, or collaboration opportunities, feel free to reach out:
- Email: abdodebo3@gmail.com
- GitHub
- Phone: +20 1026821545
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
