unsloth/Qwen3-Coder-Next-GGUF · Feb 4: Qwen3-Coder-Next GGUFs reuploaded

Unsloth AI org 11 days ago

•

llama.cpp has fixed a bug which caused the model to loop and produce poor outputs. The calculation for vectorized key_gdiff has been corrected.
Thanks to the work of llama.cpp and contributors, we have now have reconverted and re-uploaded the model.

Please re-download and update llama.cpp thanks!

All have now been updated.

See file history for last updated ones.

Please let us know if you see an improvement!
Q8, MXFP4, F16 are not updated however, you still must update llama.cpp.

We also made a new tutorial on running our dynamic FP8 quant and have a new MXFP4 GGUF.

Guide: https://unsloth.ai/docs/models/qwen3-coder-next

qwen3-coder-next fixed

danielhanchen pinned discussion 11 days ago

danielhanchen changed discussion title from Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs! to Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs! (Still in progress) 11 days ago

Gallardo994

11 days ago

Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!

Reverger

11 days ago

Looks marvelous...
Any plans to roll out a REAP version?
Your https://huggingface.co/unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF has fantastic results.

JoeSmith245

11 days ago

•

edited 11 days ago

I'm getting a lot of '"filePath"/home/' invalid json syntax in tool calls (with Q6_K_XL in opencode), and looping instead of fixing it (even when told to fix it, annoyingly). Is this why? Will pulling the files down again fix that?

noctrex

10 days ago

Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!

Those quants do not use imatrix, so they are fine to be used as-is. Only quants using imatrixes needed to be requantized.

e1732a364fed

10 days ago

This comment has been hidden

danielhanchen

Unsloth AI org 10 days ago

Neither MXFP4 nor Q8 variants have been updated, is this intended or should we expect an update for those quants as well? Thanks for your hard work!

Those are not imatrix so it's not needed

danielhanchen

Unsloth AI org 10 days ago

Those quants do not use imatrix, so they are fine to be used as-is. Only quants using imatrixes needed to be requantized.

But Q2-Q6 are re-uploaded too. They don't use imatrix I think. I don't understand.

They are imatrix. Only ones arent are 8bit and above and MXFP$

e1732a364fed

10 days ago

This comment has been hidden

CHNtentes

10 days ago

They are imatrix. Only ones arent are 8bit and above and MXFP$

Sorry, I thought only models with initial "I" use imatrix. Is there any docs to better understand quant model naming and imatrix?

It seems that IQ quants require imatrix, and Q quants don't, but you can still use imatrix to improve Q quants accuracy.

e1732a364fed

10 days ago

This comment has been hidden

e1732a364fed

10 days ago

This comment has been hidden

jibe77

10 days ago

•

edited 8 days ago

I'm getting a lot of '"filePath"/home/' invalid json syntax in tool calls (with Q6_K_XL in opencode), and looping instead of fixing it (even when told to fix it, annoyingly). Is this why? Will pulling the files down again fix that?

Same for me with the last version of opencode 1.1.50 and llama.cpp 7941.
The model crashes with this model : Error message: JSON Parse error: Unrecognized token '/']
This problem is specific to this Qwen3-coder-next, because I don't have it with other models.

Edit :
tooling calls fail with opencode, discussion here https://www.reddit.com/r/LocalLLaMA/comments/1qvacqo/does_qwen3codernext_work_in_opencode_currently_or/

Edit (bis) :
I've changed my configuration in opencode and specified the option tool_call and reasoning, now it seems to fix the problem :

"qwen3-coder-next": {

"name": "qwen3-coder-next (local)",

"tool_call": true,

"reasoning": true,

"limit": {

"context": 136608,

"output": 25536

}

thaatz

10 days ago

Latest Q6_K_XL GGUF no longer detects the parameters like the architecture or context length in LM Studio (0.4.1). Previous upload was able to detect these without issue.
MXFP4 GGUF is a little better, but incorrectly lists the model as "512x2.5B" whereas Qwen3-Next flavors (not the coder release) are displayed as "80B-A3B" in LM Studio.

Luke2406

10 days ago

I have this problem too. In LM studio are unrecognized. Previous models fails in agentic mode via continue, right now MXFP4 works well after update

ubergarm

10 days ago

This is the mainline llama.cpp PR in question for those following along at home: https://github.com/ggml-org/llama.cpp/pull/19324

puchuu

9 days ago

For me it works perfectly with the latest llama. Actually, this is the first model that I don't want to reduce the temperature of; it is just perfect.

L29Ah

9 days ago

Will other qwen3next imatrix releases be reuploaded too?

jcaneira

9 days ago

is it me or 1st file is incomplete?

Saving to: ‘Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf’

Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gg 100%[==============================================================================================>] 5.66M --.-KB/s in 0.05s

2026-02-06 15:26:48 (105 MB/s) - ‘Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf’ saved [5936032/5936032]

jcaneira

9 days ago

is it me or 1st file is incomplete?

Saving to: ‘Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf’

Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gg 100%[==============================================================================================>] 5.66M --.-KB/s in 0.05s

2026-02-06 15:26:48 (105 MB/s) - ‘Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf’ saved [5936032/5936032]

My bad was using old files references when it was 1/2.

danielhanchen changed discussion title from Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs! (Still in progress) to Feb 4: Qwen3-Coder-Next GGUFs reuploaded - much better outputs! 9 days ago

thaatz

8 days ago

•

edited 8 days ago

I have updated llama.cpp and there is still an issue with the Q6_K_XL reupload. Other quants, like Q4_K_XL seem good though.

The issue is that architecture and context length cannot be detected. This prevents the ability to set a context length above default (2048 tokens) and the inability to register what architecture it is prevents the model from being coherent as the backend doesnt do any qwen3next specific things.

Again, this is just an issue with the newly reuploaded Q6_K_XL. I understand that the other quants are working well for everyone. The original upload was working fine before this. Please take a look at the newly uploaded Q6_K_XL.

tanjib12

7 days ago

Luke2406

7 days ago

In LM Studio - Q6, Q6_K_XL, Q8 all of them not working... Only MXFP4

owao

6 days ago

Same, the UD quants seem to suffer from something (I tried the Q3_K_XL and Q4_K_XL ones). MXFP4 is fine for me too

engrtipusultan

2 days ago

Weights are updated again?

owao

1 day ago

Honestly providing a short commit msg instead of the default Upload folder using huggingface_hub would prevent many wondering each time you update the weights... Because honestly, providing no changelog feels a bit like experimenting without knowing what you do and rely only on user feedback. But I know it's not the case lol so I don't understand the underlying reason

owao

1 day ago

•

edited about 17 hours ago

~~And I mean... It's not like you are updating thousands of models each days~~ RUDE... Sorry for this...

Cubes123

1 day ago

Tried the Q8_K_XL quant in LM Studio, doesn't recognise it as MoE and can't select more than 2k context.

danielhanchen

Unsloth AI org 1 day ago

Sorry yes we did do an update, it only affects the smaller quants, but in smallish ways - some tensors are upcast a bit more, so it retains a bit more accuracy.

I can ask LM Studion on Q6_K_Xl / Q8_K_XL, but my guess is it's not liking some upcasted F16 layers

JoeSmith245

1 day ago

•

edited 1 day ago

Thanks, @danielhanchen . For what it's worth, while I agree that a clear changelog (and version tags) would be good, I really appreciate the updates.

Model releases do tend to be sloppily version controlled; to my mind they should be semantically versioned and tagged rigorously in version control, just like any other "software". That said, you ARE actually maintaining the models, like software. There are so many models on here that have invalid chat templates, bad weights, or some other issue (even @nvidia ;/ ), and people are just left to download tens of GB and realise that it doesn't work on their own. So, you're doing the right thing here by fixing the issues, of course :)

Cubes123

1 day ago

Yes, unsloth do a great job.

JoeSmith245

1 day ago

They are imatrix. Only ones arent are 8bit and above and MXFP$

Sorry, I thought only models with initial "I" use imatrix. Is there any docs to better understand quant model naming and imatrix?

It seems that IQ quants require imatrix, and Q quants don't, but you can still use imatrix to improve Q quants accuracy.

My understanding is that IQ quants are made with imatrix files, but IQ*.gguf files include the relevant scaling. So unless you're making (or debugging) iq-quants, you don't need the imatrix files.