Crimson
A high-performance, hybrid signal-processing language model architecture.
πΉ Overview
Crimson is a generative language model that deviates from the traditional Transformer architecture by utilizing a hybrid approach of Local and Global Convolutions. By leveraging Fast Fourier Transforms (FFT) for global context, Crimson achieves a massive receptive field with a fraction of the computational overhead associated with standard attention mechanisms.
The architecture is designed for efficiency, speed, and high-quality generation, featuring a custom vocabulary reduction system that optimizes the embedding space for specific datasets.
π Key Features
- Hybrid Convolutional Blocks: Merges depth-wise local convolutions for immediate context with FFT-powered global convolutions for long-range dependencies.
- FFT-Based Global Context: Uses frequency-domain processing to handle long sequences efficiently.
- Vocabulary Reduction: Custom token remapping (
REDUCE_VOCAB) that shrinks the model size by focusing only on tokens present in the training corpus. - Hardware Optimized: Full support for Apple Silicon (MPS), NVIDIA GPUs (CUDA with TF32), and efficient CPU execution.
- Lightweight & Fast: The current 8.9M parameter model provides a perfect balance between intelligence and speed.
π Architecture Details
| Parameter | Value |
|---|---|
| Model Size | 8.9 Million Parameters |
| Layers | 4 Blocks |
| Model Dimension (D_MODEL) | 256 |
| Context Length (MAX_SEQ_LEN) | 1024 |
| Local Kernel Size | 5 |
| Global Kernel Size | 256 |
| Global Every N Layers | 2 |
π¦ Installation
Download this repository and extract it.
π§ͺ Usage
1. Training the Base Model
Place your .txt data files in the data/ directory and run:
python train_gclm_base.py
This script will build the vocabulary and train the initial foundation model (crimson_base_8.9M.pt).
2. Fine-tuning for Chat (SFT)
Use your chat-formatted data (e.g., chat_data.txt) to fine-tune the model into an instruct-following assistant:
python finetune_gclm_base.py
3. Interactive Chat Interface
Launch the Tkinter-based UI to interact with your fine-tuned model:
python chat_interface.py
π¨ Visualization
The model uses a unique "Signal Processing" philosophy, treating text sequences as multidimensional signals that are filtered through both time-domain (Local) and frequency-domain (Global) kernels.
Built with β€οΈ by AG from AG Corp