Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs!

#5
by danielhanchen - opened

llama.cpp has fixed a bug which caused the model to loop and produce poor outputs. The calculation for vectorized key_gdiff has been corrected.
Thanks to the work of llama.cpp and contributors, we have now have reconverted and re-uploaded the model.

Please re-download and update llama.cpp thanks!

All have now been updated.

See file history for last updated ones.

Please let us know if you see an improvement!
Q8, MXFP4, F16 are not updated however, you still must update llama.cpp.

We also made a new tutorial on running our dynamic FP8 quant and have a new MXFP4 GGUF.

Guide: https://unsloth.ai/docs/models/qwen3-coder-next

qwen3-coder-next fixed

danielhanchen pinned discussion
danielhanchen changed discussion title from Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs! to Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs! (Still in progress)

Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!

Looks marvelous...
Any plans to roll out a REAP version?
Your https://huggingface.co/unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF has fantastic results.

I'm getting a lot of '"filePath"/home/' invalid json syntax in tool calls (with Q6_K_XL in opencode), and looping instead of fixing it (even when told to fix it, annoyingly). Is this why? Will pulling the files down again fix that?

Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!

Those quants do not use imatrix, so they are fine to be used as-is. Only quants using imatrixes needed to be requantized.

This comment has been hidden
Unsloth AI org

Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!

Those are not imatrix so it's not needed

Unsloth AI org

Those quants do not use imatrix, so they are fine to be used as-is. Only quants using imatrixes needed to be requantized.

But Q2-Q6 are re-uploaded too. They don't use imatrix I think. I don't understand.

They are imatrix. Only ones arent are 8bit and above and MXFP$

This comment has been hidden

They are imatrix. Only ones arent are 8bit and above and MXFP$

Sorry, I thought only models with initial "I" use imatrix. Is there any docs to better understand quant model naming and imatrix?

It seems that IQ quants require imatrix, and Q quants don't, but you can still use imatrix to improve Q quants accuracy.

This comment has been hidden
This comment has been hidden

I'm getting a lot of '"filePath"/home/' invalid json syntax in tool calls (with Q6_K_XL in opencode), and looping instead of fixing it (even when told to fix it, annoyingly). Is this why? Will pulling the files down again fix that?

Same for me with the last version of opencode 1.1.50 and llama.cpp 7941.
The model crashes with this model : Error message: JSON Parse error: Unrecognized token '/']
This problem is specific to this Qwen3-coder-next, because I don't have it with other models.

Edit :
tooling calls fail with opencode, discussion here https://www.reddit.com/r/LocalLLaMA/comments/1qvacqo/does_qwen3codernext_work_in_opencode_currently_or/

Edit (bis) :
I've changed my configuration in opencode and specified the option tool_call and reasoning, now it seems to fix the problem :

"qwen3-coder-next": {

"name": "qwen3-coder-next (local)",

"tool_call": true,

"reasoning": true,

"limit": {

"context": 136608,

"output": 25536

}

}

Latest Q6_K_XL GGUF no longer detects the parameters like the architecture or context length in LM Studio (0.4.1). Previous upload was able to detect these without issue.
MXFP4 GGUF is a little better, but incorrectly lists the model as "512x2.5B" whereas Qwen3-Next flavors (not the coder release) are displayed as "80B-A3B" in LM Studio.
image

I have this problem too. In LM studio are unrecognized. Previous models fails in agentic mode via continue, right now MXFP4 works well after update

This is the mainline llama.cpp PR in question for those following along at home: https://github.com/ggml-org/llama.cpp/pull/19324

For me it works perfectly with the latest llama. Actually, this is the first model that I don't want to reduce the temperature of; it is just perfect.

Will other qwen3next imatrix releases be reuploaded too?

is it me or 1st file is incomplete?

Saving to: β€˜Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf’

Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gg 100%[==============================================================================================>] 5.66M --.-KB/s in 0.05s

2026-02-06 15:26:48 (105 MB/s) - β€˜Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf’ saved [5936032/5936032]

is it me or 1st file is incomplete?

Saving to: β€˜Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf’

Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gg 100%[==============================================================================================>] 5.66M --.-KB/s in 0.05s

2026-02-06 15:26:48 (105 MB/s) - β€˜Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf’ saved [5936032/5936032]

My bad was using old files references when it was 1/2.

danielhanchen changed discussion title from Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs! (Still in progress) to Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs!

I have updated llama.cpp and there is still an issue with the Q6_K_XL reupload. Other quants, like Q4_K_XL seem good though.
image
The issue is that architecture and context length cannot be detected. This prevents the ability to set a context length above default (2048 tokens) and the inability to register what architecture it is prevents the model from being coherent as the backend doesnt do any qwen3next specific things.
{BD8DFE8A-809E-491B-BC27-390A90433226}
Again, this is just an issue with the newly reuploaded Q6_K_XL. I understand that the other quants are working well for everyone. The original upload was working fine before this. Please take a look at the newly uploaded Q6_K_XL.

Image_prompt_closeup_202602081614

In LM Studio - Q6, Q6_K_XL, Q8 all of them not working... Only MXFP4

Same, the UD quants seem to suffer from something (I tried the Q3_K_XL and Q4_K_XL ones). MXFP4 is fine for me too

Weights are updated again?

Honestly providing a short commit msg instead of the default Upload folder using huggingface_hub would prevent many wondering each time you update the weights... Because honestly, providing no changelog feels a bit like experimenting without knowing what you do and rely only on user feedback. But I know it's not the case lol so I don't understand the underlying reason

And I mean... It's not like you are updating thousands of models each days RUDE... Sorry for this...

Tried the Q8_K_XL quant in LM Studio, doesn't recognise it as MoE and can't select more than 2k context.

Unsloth AI org

Sorry yes we did do an update, it only affects the smaller quants, but in smallish ways - some tensors are upcast a bit more, so it retains a bit more accuracy.

I can ask LM Studion on Q6_K_Xl / Q8_K_XL, but my guess is it's not liking some upcasted F16 layers

Thanks, @danielhanchen . For what it's worth, while I agree that a clear changelog (and version tags) would be good, I really appreciate the updates.

Model releases do tend to be sloppily version controlled; to my mind they should be semantically versioned and tagged rigorously in version control, just like any other "software". That said, you ARE actually maintaining the models, like software. There are so many models on here that have invalid chat templates, bad weights, or some other issue (even @nvidia ;/ ), and people are just left to download tens of GB and realise that it doesn't work on their own. So, you're doing the right thing here by fixing the issues, of course :)

Yes, unsloth do a great job.

They are imatrix. Only ones arent are 8bit and above and MXFP$

Sorry, I thought only models with initial "I" use imatrix. Is there any docs to better understand quant model naming and imatrix?

It seems that IQ quants require imatrix, and Q quants don't, but you can still use imatrix to improve Q quants accuracy.

My understanding is that IQ quants are made with imatrix files, but IQ*.gguf files include the relevant scaling. So unless you're making (or debugging) iq-quants, you don't need the imatrix files.

Sign up or log in to comment