Bodega-Raptor-8B-MXFP4
Practical Reasoning for Everyday Use
Bodega-Raptor-8B-MXFP4 is part of our Raptor series of generalist models, designed to handle the everyday work of development, analysis, and problem-solving. With 8 billion parameters and MXFP4 quantization, this model strikes a balance between capability and efficiency—robust enough for serious work, light enough to run all day on your laptop as part of Bodega OS.
The Raptor Series
The Raptor series represents our approach to generalist models: not specialized for any single task, but competent across the broad range of activities that make up real work. Code generation and analysis. Logical problem-solving. Multi-step reasoning. Technical writing. These models do not try to be the best at everything—they try to be reliably good at everything you actually need.
Raptor-8B sits in the sweet spot. Smaller models sacrifice too much capability. Larger models demand more resources than most tasks justify. Eight billion parameters gives you enough capacity for sophisticated reasoning without the overhead of our larger models.
Architecture and Performance
The model uses MXFP4 quantization at 4.25 bits per parameter, bringing memory requirements down to 4GB. On M1 Max, you are looking at 150 tokens per second sustained throughput. That is fast enough for interactive development workflows, efficient enough that you are not waiting for the model to catch up with your thinking.
The MLX-based inference leverages unified memory architecture, eliminating costly data transfers between CPU and GPU. This matters for sustained performance—you can keep the model loaded and ready without it consuming resources in the background. It is there when you need it, invisible when you do not.
What Raptor-8B Does
The model handles code understanding and explanation across major programming languages. It can analyze your code, explain what it does, suggest improvements, and help you debug when things go wrong. Algorithm design and refactoring are core capabilities—the model understands not just syntax but the broader patterns and practices that make code maintainable.
For analytical tasks, Raptor-8B does problem decomposition, logical analysis, and pattern recognition. It can help you think through complex problems by breaking them into manageable pieces, identifying the core issues, and suggesting approaches. This is not about the model solving everything for you—it is about having a capable thinking partner that can help you work through difficult problems.
Technical writing and documentation benefit from the model's ability to understand both the technical content and how to communicate it clearly. Concept explanation, step-by-step reasoning, and structured thinking are natural for a model trained to be a generalist rather than a specialist.
Running On-Premises
Raptor-8B runs entirely on your hardware as part of Bodega OS. Your code, your problems, your half-finished ideas—none of it leaves your machine. The model integrates with Bodega's retrieval engines, allowing it to search through your local codebase and documentation while maintaining privacy.
This is particularly valuable for development work where you are iterating quickly and do not want to context-switch to a web interface or wait for API calls. The model is always available, always fast, always working with your local data.
Why MXFP4
MXFP4 quantization represents a practical compromise between quality and efficiency. Four-bit quantization is aggressive enough to significantly reduce memory usage, but the mixed floating-point approach preserves more information than naive integer quantization. The result is a model that fits comfortably on consumer hardware while maintaining reasoning quality.
We tested extensively to ensure that quantization does not degrade the model's core capabilities. Code generation remains accurate. Logical reasoning stays coherent. The model does not hallucinate more or lose track of context. The trade-off is purely in the details—slightly less nuanced language, occasional minor inaccuracies—but the fundamental capability remains intact.
Technical Details
Eight billion parameters total. MXFP4 quantization at 4.25 bits per parameter. Memory footprint of 2-3GB depending on configuration. Sustained throughput of 150 tokens per second on M1 max. MLX-based inference optimized for unified memory architecture.
The model runs efficiently on M series chips. Memory bandwidth matters more than raw compute for inference at this scale, which is why Apple Silicon's unified memory architecture provides good performance despite having fewer specialized compute units than discrete GPUs.
Context window supports extended reasoning and code analysis. The model can handle substantial codebases, long documents, and multi-turn conversations without losing coherence. This makes it practical for real development workflows where you need to maintain context across multiple interactions.
Disclaimer
SRSWTI is not the creator or owner of the underlying foundation model architecture. The foundation model is created and provided by third parties. SRSWTI has trained this model on top of the foundation model but does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any outputs. You understand that this model can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. SRSWTI may not monitor or control all model outputs and cannot, and does not, take responsibility for any such outputs. SRSWTI disclaims all warranties or guarantees about the accuracy, reliability or benefits of this model. SRSWTI further disclaims any warranty that the model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to this model, your downloading of this model, or use of this model provided by or through SRSWTI.
Crafted by the Bodega team at SRSWTI Research Labs
Building the world's fastest inference and retrieval engines
Making AI accessible, efficient, and powerful for everyone
Developed by SRSWTI Inc. - Building world's fastest retrieval and inference engines.
- Downloads last month
- 28
4-bit
