Could you cook a similar version for Step 3.5?

#8
by bigstorm - opened

@mratsim Would you apply the same technique to create a version for Step 3.5 Flash? A similar sized model to Minimax M2.1 with great performance.

https://huggingface.co/stepfun-ai/Step-3.5-Flash-FP8

Would be a level up for 2x RTX 6000 Pro hosts.

+1

That's indeed something I'm interesting in but I will be unavailable for a month or so and I need to add specific modeling to llmcompressor.

Short term - M2.5 was just released - Will that be drop in for your existing method?

Happy to help if you’re busy.

Yes drop-in and currently cooking

Thanks - Working on some NVFP4 quant. Hoping availability of NVFP4 will encourage more optimization work on it for our cards..

Re MiniMax-M2.5 - Following accuracy degradation concerns after using the new batch_size=32 feature in LLMcompressor I have reuploaded quants with batch_size=1 to ensure my calibration dataset is passed as-is and not truncated to the shortest sequence in the batch. Please redownload for highest quality! (see thread https://huggingface.co/mratsim/MiniMax-M2.5-BF16-INT4-AWQ/discussions/4)

Sign up or log in to comment