LTX-2.3_-_T2V_Basic and ltx-2.3-22b-distilled-1.1 - Audio Output Bad, Really Bad!
LTX-2.3_-_T2V_Basic and ltx-2.3-22b-distilled-1.1 - Audio Output Bad, Really Bad!
Like the title says, the audio output is very "BUZZ", like there is a lot of bees in the place, does anyone have a solution for it?
LTX-2.3 Version (Kijai):
ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors
VAE Version (Kijai):
LTX23_audio_vae_bf16.safetensors / LTX23_video_vae_bf16.safetensors
Maybe Maybe Maybe: "I think I figured out how to fix the audio issues in LTX 2.3"
I just don't know to fuse the workflows to make it as simple as: LTX-2.3_-_T2V_Basic by RuneXX 🤔
Thats the new v1.1 model ... from my own runs so far its been sounding great.
But will take a look at that particular workflow (made for v1.0 model, but shouldnt be much different)
And pretty sure that Reddit post "fix" is not needed ;-) It mentions ClownShark and Res_2s ... and thats not something I put in my workflows, dont feel its needed (and the reddit post ask you to remove those)
Just to make sure, double check that you are using LTX-2.3 vae (audio and video), and not LTX-2.0 vae (if you also used that model in the past)
That being said.. the T2V workflow is the oldest one in the collection, so I'll take a look, it might be something that have since been changed to better settings in later workflows ;-)
Actually even more important:
- Are you using any custom loras? Made by users?
Most/many of those are trained on images only, and do create a metallic buzz. They have no audio training, and are simple loras
(there are work arounds such as using Advanced Lora Loader from KJ, to "mute" all the audio parts of those loras)
I'll try see if i can find a way to add the Advanced Lora loader in a way where you can easily use more than 1 lora.
But if you are using user made loras, try remove those, and see if its all fine then ;-)
Thank you for your time!
I'm not using any LoRA, not even this one:
ltx-2.3-22b-distilled-1.1_lora-dynamic_fro09_avg_rank_111_bf16.safetensors (Kijai)
Because I don't even know the purpose of this file 😜
ltx-2.3-22b-distilled-1.1_lora-dynamic_fro09_avg_rank_111_bf16.safetensors (Kijai)
Because I don't even know the purpose of this file 😜
Its used together with the DEV model to run the dev model as if it was the distilled model (aka, low steps, low cfg).
But if you are using distilled model that lora is not needed ;)
Any chance there are something in your video that are really loud? explosions etc? something that was always a bit "buzzy"
And also make sure you are using version 1.1 of the upscale model. I did notice one thing in that workflow from the "old" days, that the sampler is set to LCM in the 2nd pass. Try set it to euler_cfg_pp. But shouldnt really matter, but I never tried that with the v1.1 model .
Interesting 🤔
Tested my end, and all seems ok, with the T2V workflow and the new version 1.1 distilled model (from Kijai). And nothing changed (except the model, v1.1 distilled main and v1.1 upscaler)
Not sure whats different your end then.
Try double check the models... And perhaps re-download the workflow fresh, in case you accidently bumped into one of the nodes or settings somewhere.
(I'll update that workflow though, to euler_cfg_pp instead of LCM, since thats more in line with ltx defaults. But the above video is with LCM and V.1.1 model)
No bees, but lots of birds ;-) but i prompted for that, so there would be more than just the vocals, to see if something failed
Tested a few more runs. Maybe there is a little bit of faint humming, but not sure. Its probably just a bit of ambient sound.
And probably not what you are having your end.
And sorry for the angry sounding videos ;-) seems like LTX went for that that tone when the prompt including "sounding bad" etc.. haha ;-) I should have prompted "happy gothic witch is saying" hehe
I made Zer0 changes in the nodes department, maybe the problem was the Spatial Upscaler x2.
I changed the "Spatial Upscaler x2" to Version 1.1 and made a test with Music and Sound FX.
The result is very crystal clear.
The thing is, the generated video is rendered as 1024x1920... Even if I set it to 1080x1920 🤔
Yeah, that could actually be the reason. Version v1.0 model could use both the 1.0 and 1.1 upscaler.
But the 1.1 model has a lot of changes, so they updated all their IC-Loras etc.. perhaps it also needs the v1.1 upscaler.
As for the resolution, LTX must have size divisible by 32, and will auto adjust other size.
So for 1080 that is not divisible by 32. So that does end up as 1024.
You can use 1088 though
I'm generating some kind of Liminal Spaces / Analog Horror videos and the sound was/is getting out a little crazy/non-sense.
But, I generated a Happy Golden Retriever video with Music and SFX and the result is kind of remarkable! 😅
Thank you again for your time, looks like everything is solved 😎
I'm generating some kind of Liminal Spaces / Analog Horror videos and the sound was/is getting out a little crazy/non-sense.
ah yes, that might have been very noisy if the model didnt like v.1.0 upscaler ;-)
"(there are work arounds such as using Advanced Lora Loader from KJ, to "mute" all the audio parts of those loras)"
Interesting. I myself use several loras in your workflows. I do find there is sometimes a robotic voice and buzzing sound in the background for a lot of my generations. How would I go about adding this to your workflows? Is it similar to the Power Lora Loader (rgthree) but replaces that? Whats the exact name called? Its not popping up in Comfy when doing a search for it. Also checked Comfy manager but nada.
Instead of loading those "simpler" use made loras in the Power Lora, add "LTX2 Lora Loader Advanced " and load those loras in that node. ("simpler" because they are usually not trained on audio data)
Its similar to Power Lora Loader, with the exception of it being 1 lora per node.
So I was thinking making a subgraph that had 6 or so of the Advanced Lora Loader, and only show the drop down to load each lora at parent top level workflow. To make it easier.
But for now, you can try add the Advanced Lora Loader yourself. Just connect it before the Power Lora Loader. Model in (previously connected to power lora) and then model out (to the power lora loader)
And lastly the critical part, mute all the audio weights (set strength to zero)
You can chain multiple of them together (and is essentially what i might add to the workflows, but "hidden" in a subgraph. Since i bet "simpler" user made loras will just be more and more out there)
( the "video to audio" you can probably leave at 1... )
Helpful as always. Thank you Rune. Just tried it out and it does indeed remove the audio from the loras. I get different voice overs now.
@RuneXX awesome work on the workflows!
I'm trying to make some portrait 9:16 aspect ratio videos but when i try anything over 832x1472 such as 1088x1920 I either get a floating section at the top or extremely stretched videos
Are there any improvements we can make to the workflow to enable portrait video to work better at HD please? thx
With the T2V Basic workflow?
And using LTX-2.3 right? (LTX 2.0 had a problem with large HD portraits)
Sorry yes LTX-2.3_-_T2V_Simple_single_pass & ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors
I've checked all the VAE's etc and they are all Ltx2.3
ltx-2.3-spatial-upscaler-x2-1.1.safetensors
And the upscaler? its also at v1.1 ?
But will try here with a HD portrait, see if I can reproduce
