Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs!
llama.cpp has fixed a bug which caused the model to loop and produce poor outputs. The calculation for vectorized key_gdiff has been corrected.
Thanks to the work of llama.cpp and contributors, we have now have reconverted and re-uploaded the model.
Please re-download and update llama.cpp thanks!
All have now been updated.
See file history for last updated ones.
Please let us know if you see an improvement!
Q8, MXFP4, F16 are not updated however, you still must update llama.cpp.
We also made a new tutorial on running our dynamic FP8 quant and have a new MXFP4 GGUF.
Guide: https://unsloth.ai/docs/models/qwen3-coder-next

Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!
Looks marvelous...
Any plans to roll out a REAP version?
Your https://huggingface.co/unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF has fantastic results.
I'm getting a lot of '"filePath"/home/' invalid json syntax in tool calls (with Q6_K_XL in opencode), and looping instead of fixing it (even when told to fix it, annoyingly). Is this why? Will pulling the files down again fix that?
Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!
Those quants do not use imatrix, so they are fine to be used as-is. Only quants using imatrixes needed to be requantized.
Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!
Those are not imatrix so it's not needed
Those quants do not use imatrix, so they are fine to be used as-is. Only quants using imatrixes needed to be requantized.
But Q2-Q6 are re-uploaded too. They don't use imatrix I think. I don't understand.
They are imatrix. Only ones arent are 8bit and above and MXFP$
They are imatrix. Only ones arent are 8bit and above and MXFP$
Sorry, I thought only models with initial "I" use imatrix. Is there any docs to better understand quant model naming and imatrix?
It seems that IQ quants require imatrix, and Q quants don't, but you can still use imatrix to improve Q quants accuracy.
I'm getting a lot of '"filePath"/home/' invalid json syntax in tool calls (with Q6_K_XL in opencode), and looping instead of fixing it (even when told to fix it, annoyingly). Is this why? Will pulling the files down again fix that?
Same for me with the last version of opencode 1.1.50 and llama.cpp 7941.
The model crashes with this model : Error message: JSON Parse error: Unrecognized token '/']
This problem is specific to this Qwen3-coder-next, because I don't have it with other models.
Edit :
tooling calls fail with opencode, discussion here https://www.reddit.com/r/LocalLLaMA/comments/1qvacqo/does_qwen3codernext_work_in_opencode_currently_or/
Edit (bis) :
I've changed my configuration in opencode and specified the option tool_call and reasoning, now it seems to fix the problem :
"qwen3-coder-next": {
"name": "qwen3-coder-next (local)",
"tool_call": true,
"reasoning": true,
"limit": {
"context": 136608,
"output": 25536
}
}
Latest Q6_K_XL GGUF no longer detects the parameters like the architecture or context length in LM Studio (0.4.1). Previous upload was able to detect these without issue.
MXFP4 GGUF is a little better, but incorrectly lists the model as "512x2.5B" whereas Qwen3-Next flavors (not the coder release) are displayed as "80B-A3B" in LM Studio.
I have this problem too. In LM studio are unrecognized. Previous models fails in agentic mode via continue, right now MXFP4 works well after update
This is the mainline llama.cpp PR in question for those following along at home: https://github.com/ggml-org/llama.cpp/pull/19324
For me it works perfectly with the latest llama. Actually, this is the first model that I don't want to reduce the temperature of; it is just perfect.
Will other qwen3next imatrix releases be reuploaded too?
is it me or 1st file is incomplete?
Saving to: βQwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.ggufβ
Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gg 100%[==============================================================================================>] 5.66M --.-KB/s in 0.05s
2026-02-06 15:26:48 (105 MB/s) - βQwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.ggufβ saved [5936032/5936032]
is it me or 1st file is incomplete?
Saving to: βQwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.ggufβ
Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gg 100%[==============================================================================================>] 5.66M --.-KB/s in 0.05s
2026-02-06 15:26:48 (105 MB/s) - βQwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.ggufβ saved [5936032/5936032]
My bad was using old files references when it was 1/2.
I have updated llama.cpp and there is still an issue with the Q6_K_XL reupload. Other quants, like Q4_K_XL seem good though.
The issue is that architecture and context length cannot be detected. This prevents the ability to set a context length above default (2048 tokens) and the inability to register what architecture it is prevents the model from being coherent as the backend doesnt do any qwen3next specific things.
Again, this is just an issue with the newly reuploaded Q6_K_XL. I understand that the other quants are working well for everyone. The original upload was working fine before this. Please take a look at the newly uploaded Q6_K_XL.
In LM Studio - Q6, Q6_K_XL, Q8 all of them not working... Only MXFP4
Same, the UD quants seem to suffer from something (I tried the Q3_K_XL and Q4_K_XL ones). MXFP4 is fine for me too
Weights are updated again?
Honestly providing a short commit msg instead of the default Upload folder using huggingface_hub would prevent many wondering each time you update the weights... Because honestly, providing no changelog feels a bit like experimenting without knowing what you do and rely only on user feedback. But I know it's not the case lol so I don't understand the underlying reason
And I mean... It's not like you are updating thousands of models each days RUDE... Sorry for this...
Tried the Q8_K_XL quant in LM Studio, doesn't recognise it as MoE and can't select more than 2k context.
Sorry yes we did do an update, it only affects the smaller quants, but in smallish ways - some tensors are upcast a bit more, so it retains a bit more accuracy.
I can ask LM Studion on Q6_K_Xl / Q8_K_XL, but my guess is it's not liking some upcasted F16 layers
Thanks, @danielhanchen . For what it's worth, while I agree that a clear changelog (and version tags) would be good, I really appreciate the updates.
Model releases do tend to be sloppily version controlled; to my mind they should be semantically versioned and tagged rigorously in version control, just like any other "software". That said, you ARE actually maintaining the models, like software. There are so many models on here that have invalid chat templates, bad weights, or some other issue (even @nvidia ;/ ), and people are just left to download tens of GB and realise that it doesn't work on their own. So, you're doing the right thing here by fixing the issues, of course :)
Yes, unsloth do a great job.
They are imatrix. Only ones arent are 8bit and above and MXFP$
Sorry, I thought only models with initial "I" use imatrix. Is there any docs to better understand quant model naming and imatrix?
It seems that IQ quants require imatrix, and Q quants don't, but you can still use imatrix to improve Q quants accuracy.
My understanding is that IQ quants are made with imatrix files, but IQ*.gguf files include the relevant scaling. So unless you're making (or debugging) iq-quants, you don't need the imatrix files.
