Crimson

Crimson Hero

A high-performance, hybrid signal-processing language model architecture.

🌹 Overview

Crimson is a generative language model that deviates from the traditional Transformer architecture by utilizing a hybrid approach of Local and Global Convolutions. By leveraging Fast Fourier Transforms (FFT) for global context, Crimson achieves a massive receptive field with a fraction of the computational overhead associated with standard attention mechanisms.

The architecture is designed for efficiency, speed, and high-quality generation, featuring a custom vocabulary reduction system that optimizes the embedding space for specific datasets.

🚀 Key Features

Hybrid Convolutional Blocks: Merges depth-wise local convolutions for immediate context with FFT-powered global convolutions for long-range dependencies.
FFT-Based Global Context: Uses frequency-domain processing to handle long sequences efficiently.
Vocabulary Reduction: Custom token remapping (REDUCE_VOCAB) that shrinks the model size by focusing only on tokens present in the training corpus.
Hardware Optimized: Full support for Apple Silicon (MPS), NVIDIA GPUs (CUDA with TF32), and efficient CPU execution.
Lightweight & Fast: The current 8.9M parameter model provides a perfect balance between intelligence and speed.

🛠 Architecture Details

Parameter	Value
Model Size	8.9 Million Parameters
Layers	4 Blocks
Model Dimension (D_MODEL)	256
Context Length (MAX_SEQ_LEN)	1024
Local Kernel Size	5
Global Kernel Size	256
Global Every N Layers	2

📦 Installation

Download this repository and extract it.

🧪 Usage

1. Training the Base Model

Place your .txt data files in the data/ directory and run:

python train_gclm_base.py

This script will build the vocabulary and train the initial foundation model (crimson_base_8.9M.pt).

2. Fine-tuning for Chat (SFT)

Use your chat-formatted data (e.g., chat_data.txt) to fine-tune the model into an instruct-following assistant:

python finetune_gclm_base.py

3. Interactive Chat Interface

Launch the Tkinter-based UI to interact with your fine-tuned model:

python chat_interface.py

🎨 Visualization

The model uses a unique "Signal Processing" philosophy, treating text sequences as multidimensional signals that are filtered through both time-domain (Local) and frequency-domain (Global) kernels.

Built with ❤️ by AG from AG Corp

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support