Instructions to use pathcosmos/frankenstallm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pathcosmos/frankenstallm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="pathcosmos/frankenstallm")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("pathcosmos/frankenstallm") model = AutoModelForCausalLM.from_pretrained("pathcosmos/frankenstallm") - llama-cpp-python
How to use pathcosmos/frankenstallm with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="pathcosmos/frankenstallm", filename="gguf/frankenstallm-3b-Q4_K_M.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use pathcosmos/frankenstallm with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pathcosmos/frankenstallm:Q4_K_M # Run inference directly in the terminal: llama-cli -hf pathcosmos/frankenstallm:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pathcosmos/frankenstallm:Q4_K_M # Run inference directly in the terminal: llama-cli -hf pathcosmos/frankenstallm:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf pathcosmos/frankenstallm:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf pathcosmos/frankenstallm:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf pathcosmos/frankenstallm:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf pathcosmos/frankenstallm:Q4_K_M
Use Docker
docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use pathcosmos/frankenstallm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "pathcosmos/frankenstallm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pathcosmos/frankenstallm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M
- SGLang
How to use pathcosmos/frankenstallm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "pathcosmos/frankenstallm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pathcosmos/frankenstallm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "pathcosmos/frankenstallm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pathcosmos/frankenstallm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use pathcosmos/frankenstallm with Ollama:
ollama run hf.co/pathcosmos/frankenstallm:Q4_K_M
- Unsloth Studio
How to use pathcosmos/frankenstallm with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pathcosmos/frankenstallm to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pathcosmos/frankenstallm to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pathcosmos/frankenstallm to start chatting
- Docker Model Runner
How to use pathcosmos/frankenstallm with Docker Model Runner:
docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M
- Lemonade
How to use pathcosmos/frankenstallm with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull pathcosmos/frankenstallm:Q4_K_M
Run and chat with the model
lemonade run user.frankenstallm-Q4_K_M
List all available models
lemonade list
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
μμ μμΉ β ννλ μ΄
λ³λ ¬ μ²λ¦¬ κ°λ₯ν μμ μ νμ μλΈ μμ΄μ νΈλ‘ λΆλ°°νλ€.
- 볡μ‘ν μ½λ μμ± / μ€κ³ νλ¨ β
model: sonnet - λΉ λ₯Έ νμ Β· μ‘°ν Β· κ°λ¨ν νμΌ μμ± β
model: haiku - μμ΄μ νΈ μλ£ ν κ²°κ³Ό νμ; νμ μ
resumeμΌλ‘ μ¬νΈμΆ - μ: λͺ¨λΈ ꡬν(sonnet) + λ°μ΄ν° μ€ν¬λ¦½νΈ(sonnet) + μ€μ νμΌ(haiku) λμ μ€ν
νλ‘μ νΈ λͺ©μ
μκ·λͺ¨ LLM(Large Language Model) μ€ν νλ‘μ νΈ. 8Γ NVIDIA B200 GPU νκ²½μμ LLM μ¬μ νμ΅(pretraining) λλ νμΈνλ(fine-tuning) μ μ§μ ꡬννκ³ μ€ννλ€.
νλμ¨μ΄ νκ²½
| νλͺ© | μ¬μ |
|---|---|
| GPU | 8Γ NVIDIA B200 (183 GB VRAM each, ~1.47 TB total) |
| RAM | 2.2 TB |
| CUDA | 13.0 |
| Storage (μμ ) | /PROJECT/0325120031_A/ghong/taketimes/ β 3.5 TB, μ¬μ 2.2 TB |
| Storage (ν) | /home/ghong β 5 GB (μκ·λͺ¨ μ½λλ§ μ μ₯) |
μ£Όμ: 체ν¬ν¬μΈνΈ, λ°μ΄ν°μ
λ± λμ©λ νμΌμ λ°λμ /PROJECT/0325120031_A/ghong/taketimes/llm-bang/ νμμ μ μ₯ν κ². ν λλ ν 리(/home/ghong) μ©λ μ΄κ³Ό μ£Όμ.
μ¬μ μ€μΉλ λΌμ΄λΈλ¬λ¦¬
torch 2.10.0a0+b4e4ee81d3.nv25.12 # NV 컀μ€ν
λΉλ (B200 μ΅μ ν)
flash_attn 2.7.4.post1+25.12 # FlashAttention-2 μ¬μ© κ°λ₯
datasets 4.4.1
tokenizers 0.22.1
huggingface_hub 1.2.3
κ²½κ³ : PyTorchλ NVIDIA 컀μ€ν λΉλ(
nv25.12)κ° μ€μΉλ¨.pip install torchλ‘ μ¬μ€μΉνλ©΄ B200 μ΅μ νκ° κΉ¨μ§ μ μμ β PyTorch μ¬μ€μΉ κΈμ§.
μΆκ° μ€μΉ νμ λΌμ΄λΈλ¬λ¦¬
pip install transformers accelerate peft trl deepspeed bitsandbytes sentencepiece wandb
κΆμ₯ νλ‘μ νΈ κ΅¬μ‘°
llm-bang/
βββ CLAUDE.md
βββ data/ # νμ΅ λ°μ΄ν° (μλ³Έ ν
μ€νΈ, μ μ²λ¦¬ μλ£λ³Έ)
βββ tokenizer/ # ν ν¬λμ΄μ νμ΅Β·μ μ₯
βββ model/ # λͺ¨λΈ μν€ν
μ² μ μ (nn.Module)
βββ train/ # νμ΅ μ€ν¬λ¦½νΈ (λ¨μΌ GPU / DDP / FSDP)
βββ eval/ # νκ° μ€ν¬λ¦½νΈ (perplexity, downstream task)
βββ configs/ # YAML/JSON νμ΅ μ€μ νμΌ
βββ checkpoints/ # λͺ¨λΈ 체ν¬ν¬μΈνΈ (λμ©λ)
λ©ν°-GPU νμ΅ μ€ν ν¨ν΄
# torchrun (DDP) β 8 GPU
torchrun --nproc_per_node=8 train/pretrain.py --config configs/small_lm.yaml
# λ¨μΌ GPU ν
μ€νΈ
python train/pretrain.py --config configs/small_lm.yaml --device cuda:0
# FSDP (λͺ¨λΈ μ€λ©, λν λͺ¨λΈ)
torchrun --nproc_per_node=8 train/pretrain.py --config configs/large_lm.yaml --strategy fsdp
λͺ¨λΈ κ·λͺ¨ κ°μ΄λ (νλμ¨μ΄ κΈ°μ€)
| λͺ¨λΈ ν¬κΈ° | μΆμ² μ λ΅ | μ΅μ GPU μ |
|---|---|---|
| ~1B param | DDP, bf16 | 1 GPU |
| ~7B param | DDP λλ FSDP, bf16 | 2β4 GPU |
| ~13B param | FSDP, bf16/fp8 | 4 GPU |
| ~70B param | FSDP + ZeRO-3, bf16/fp8 | 8 GPU |
B200μ FP8 λ€μ΄ν°λΈ μ§μ β νμ΅ μ torch.float8_e4m3fn νμ© κ°λ₯.
μ°Έκ³ (μ΄μ νλ‘μ νΈ)
/PROJECT/0325120031_A/ghong/taketimes/_deprecated/work/ β 2CRM λκ» μ€μΈ‘κ° μμΈ‘(LightGBM, ClickHouse) νλ‘μ νΈ.
λλ©μΈ λ°μ΄ν°(곡μ₯ μΌμ, μ½μΌ κ·Έλ μ΄λ) νμ μ μ°Έκ³ .