Crimson

Crimson Hero

A high-performance, hybrid signal-processing language model architecture.


🌹 Overview

Crimson is a generative language model that deviates from the traditional Transformer architecture by utilizing a hybrid approach of Local and Global Convolutions. By leveraging Fast Fourier Transforms (FFT) for global context, Crimson achieves a massive receptive field with a fraction of the computational overhead associated with standard attention mechanisms.

The architecture is designed for efficiency, speed, and high-quality generation, featuring a custom vocabulary reduction system that optimizes the embedding space for specific datasets.

πŸš€ Key Features

  • Hybrid Convolutional Blocks: Merges depth-wise local convolutions for immediate context with FFT-powered global convolutions for long-range dependencies.
  • FFT-Based Global Context: Uses frequency-domain processing to handle long sequences efficiently.
  • Vocabulary Reduction: Custom token remapping (REDUCE_VOCAB) that shrinks the model size by focusing only on tokens present in the training corpus.
  • Hardware Optimized: Full support for Apple Silicon (MPS), NVIDIA GPUs (CUDA with TF32), and efficient CPU execution.
  • Lightweight & Fast: The current 8.9M parameter model provides a perfect balance between intelligence and speed.

πŸ›  Architecture Details

Parameter Value
Model Size 8.9 Million Parameters
Layers 4 Blocks
Model Dimension (D_MODEL) 256
Context Length (MAX_SEQ_LEN) 1024
Local Kernel Size 5
Global Kernel Size 256
Global Every N Layers 2

πŸ“¦ Installation

Download this repository and extract it.


πŸ§ͺ Usage

1. Training the Base Model

Place your .txt data files in the data/ directory and run:

python train_gclm_base.py

This script will build the vocabulary and train the initial foundation model (crimson_base_8.9M.pt).

2. Fine-tuning for Chat (SFT)

Use your chat-formatted data (e.g., chat_data.txt) to fine-tune the model into an instruct-following assistant:

python finetune_gclm_base.py

3. Interactive Chat Interface

Launch the Tkinter-based UI to interact with your fine-tuned model:

python chat_interface.py

🎨 Visualization

The model uses a unique "Signal Processing" philosophy, treating text sequences as multidimensional signals that are filtered through both time-domain (Local) and frequency-domain (Global) kernels.


Built with ❀️ by AG from AG Corp

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support