Could you cook a similar version for Step 3.5?
@mratsim Would you apply the same technique to create a version for Step 3.5 Flash? A similar sized model to Minimax M2.1 with great performance.
https://huggingface.co/stepfun-ai/Step-3.5-Flash-FP8
Would be a level up for 2x RTX 6000 Pro hosts.
+1
That's indeed something I'm interesting in but I will be unavailable for a month or so and I need to add specific modeling to llmcompressor.
Short term - M2.5 was just released - Will that be drop in for your existing method?
Happy to help if you’re busy.
Yes drop-in and currently cooking
Thanks - Working on some NVFP4 quant. Hoping availability of NVFP4 will encourage more optimization work on it for our cards..
Re MiniMax-M2.5 - Following accuracy degradation concerns after using the new batch_size=32 feature in LLMcompressor I have reuploaded quants with batch_size=1 to ensure my calibration dataset is passed as-is and not truncated to the shortest sequence in the batch. Please redownload for highest quality! (see thread https://huggingface.co/mratsim/MiniMax-M2.5-BF16-INT4-AWQ/discussions/4)