Pi-Lumine 4B β Flow-Matching Action Decoder for Elden Ring
A Pi0.5-style flow-matching action decoder trained on top of a frozen Qwen3.5-4B VLM backbone.
Architecture
- Base VLM: Qwen/Qwen3.5-4B (frozen, not included β downloaded at runtime)
- Action Decoder: FiLM-conditioned transformer with cross-attention to VLM hidden states
- 2 decoder layers, VLM dim 2560 β decoder dim 1024, 8 attention heads
- Projection layers decouple decoder from VLM hidden size
- Instruction-conditioned via AdaptiveRMSNorm (FiLM)
- Sinusoidal time embedding for flow matching
- ~64M trainable parameters
- Action Space: 6 steps x 20 dims (4 sticks + 16 buttons per step)
- Training: Flow matching with Euler ODE integration at inference
Files
action_decoder.ptβ Trained action decoder weightsdecoder_config.jsonβ Architecture and tokenizer configtokenizer.json/tokenizer_config.jsonβ Tokenizer with special tokenschat_template.jinjaβ Chat templateprocessor_config.jsonβ Processor config
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support