mratsim/MiniMax-M2.1-FP8-INT4-AWQ · Could you cook a similar version for Step 3.5?

Could you cook a similar version for Step 3.5?

by bigstorm - opened 9 days ago

9 days ago

@mratsim Would you apply the same technique to create a version for Step 3.5 Flash? A similar sized model to Minimax M2.1 with great performance.

https://huggingface.co/stepfun-ai/Step-3.5-Flash-FP8

Would be a level up for 2x RTX 6000 Pro hosts.

droussis

6 days ago

mratsim

Owner 5 days ago

That's indeed something I'm interesting in but I will be unavailable for a month or so and I need to add specific modeling to llmcompressor.

bigstorm

5 days ago

Short term - M2.5 was just released - Will that be drop in for your existing method?

Happy to help if you’re busy.

mratsim

Owner 4 days ago

Yes drop-in and currently cooking

bigstorm

4 days ago

•

edited 4 days ago

Thanks - Working on some NVFP4 quant. Hoping availability of NVFP4 will encourage more optimization work on it for our cards..

mratsim

Owner 4 days ago

BF16+INT4 Mixed precision is out: https://huggingface.co/mratsim/Minimax-M2.5-BF16-INT4-AWQ

mratsim

Owner 3 days ago

Re MiniMax-M2.5 - Following accuracy degradation concerns after using the new batch_size=32 feature in LLMcompressor I have reuploaded quants with batch_size=1 to ensure my calibration dataset is passed as-is and not truncated to the shortest sequence in the batch. Please redownload for highest quality! (see thread https://huggingface.co/mratsim/MiniMax-M2.5-BF16-INT4-AWQ/discussions/4)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment