Instructions to use sztyberj/IncunabuLM-111M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sztyberj/IncunabuLM-111M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sztyberj/IncunabuLM-111M")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("sztyberj/IncunabuLM-111M", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use sztyberj/IncunabuLM-111M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sztyberj/IncunabuLM-111M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sztyberj/IncunabuLM-111M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/sztyberj/IncunabuLM-111M
- SGLang
How to use sztyberj/IncunabuLM-111M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sztyberj/IncunabuLM-111M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sztyberj/IncunabuLM-111M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sztyberj/IncunabuLM-111M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sztyberj/IncunabuLM-111M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use sztyberj/IncunabuLM-111M with Docker Model Runner:
docker model run hf.co/sztyberj/IncunabuLM-111M
IncunabuLM
Model Description
IncunabuLM is a decoder-only transformer language model designed for text generation tasks. The model implements a custom architecture with RMSNorm normalization and modern attention mechanisms, optimized for resource-efficient training and inference.
Model Details
Model Type
- Architecture: Decoder-only Transformer
- Language(s): Primarily trained on Polish text (lectures)
- Model size: 111.7M parameters
Source and Base Model
- Base Architecture: Custom implementation inspired by modern transformer architectures
- Training Approach: Trained from scratch on Polish text corpus
- Educational Source: Implementation follows principles from Andrej Karpathy's "Let's build GPT: from scratch, in code, spelled out" tutorial
- Tutorial Reference: https://www.youtube.com/watch?v=kCc8FmEb1nY
- Custom Modifications:
- RMSNorm instead of LayerNorm for improved stability
- SiLU activation functions in feed-forward networks
- Optimized for resource-efficient training and inference
Architecture Details
- Layers: 12 transformer blocks
- Hidden size: 768
- Attention heads: 12
- Head dimension: 64
- Context length: 2048 tokens
- Vocabulary size: 16,384 tokens
- Normalization: RMSNorm (Root Mean Square Layer Normalization)
- Activation: SiLU (Swish) in feed-forward networks
- Attention: Causal self-attention with triangular masking
Key Features
- RMSNorm: More stable and efficient than LayerNorm
- SiLU Activation: Better gradient flow than ReLU
- BPE Tokenization: Byte-level BPE with 16K vocabulary
- Mixed Precision: Support for bfloat16/float16 training
- Generation Controls: Temperature, top-k sampling, repetition penalty
Training Details
Training Data
- Dataset: Custom Polish text corpus
- Preprocessing: Byte-level BPE tokenization
- Split: 90% training, 10% validation
Training Configuration
- Batch size: 8 (with gradient accumulation steps: 8)
- Effective batch size: 64
- Context length: 2048 tokens
- Training steps: 50,000
- Optimizer: AdamW
- Learning rate: 3e-4 (peak)
- Learning rate schedule: Cosine with linear warmup
- Warmup steps: 2,000
- Weight decay: 0.1
- Gradient clipping: 1.0
- Dropout: 0.2
Training Infrastructure
- Hardware: 1x Nvidia A100 80Gb
- Precision: Mixed precision (bfloat16/float16)
- Gradient scaling: Automatic mixed precision with GradScaler
Performance
Model Size and Efficiency
- Parameters: 111.7M (111,718,144 total parameters)
- Context window: 2048 tokens
- Inference speed: Optimized for single-GPU inference
Training Metrics
- Final training loss: [4.5544]
- Final validation loss: [4.7100]
- Training time: [~8h]
Limitations and Biases
Known Limitations
- Context Length: Limited to 2048 tokens, may struggle with very long documents
- Language Scope: Primarily designed for Polish text, may have reduced performance on other languages
- Model Size: At 111M parameters, may have limited knowledge compared to larger models
- Training Data: Performance heavily dependent on training corpus quality and diversity
Potential Biases
- Language Bias: Optimized for Polish language patterns
- Domain Bias: Reflects the domain distribution of training data
- Temporal Bias: Training data cutoff affects knowledge of recent events
- Cultural Bias: May reflect cultural perspectives present in training data
Technical Specifications
Hardware Recommendations
- Minimum: 4GB GPU memory for inference
- CPU: Compatible but significantly slower
Jakub Sztyber