LTX-2.3_-_T2V_Basic and ltx-2.3-22b-distilled-1.1 - Audio Output Bad, Really Bad!

#101

by Hearcharted - opened Apr 16

Apr 16

LTX-2.3_-_T2V_Basic and ltx-2.3-22b-distilled-1.1 - Audio Output Bad, Really Bad!

Like the title says, the audio output is very "BUZZ", like there is a lot of bees in the place, does anyone have a solution for it?

LTX-2.3 Version (Kijai):

ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors

VAE Version (Kijai):

LTX23_audio_vae_bf16.safetensors / LTX23_video_vae_bf16.safetensors

Hearcharted

Apr 16

•

edited Apr 16

Maybe Maybe Maybe: "I think I figured out how to fix the audio issues in LTX 2.3"

https://www.reddit.com/r/StableDiffusion/comments/1s50fji/i_think_i_figured_out_how_to_fix_the_audio_issues/

I just don't know to fuse the workflows to make it as simple as: LTX-2.3_-_T2V_Basic by RuneXX 🤔

RuneXX

Owner Apr 16

•

edited Apr 16

Thats the new v1.1 model ... from my own runs so far its been sounding great.

But will take a look at that particular workflow (made for v1.0 model, but shouldnt be much different)

And pretty sure that Reddit post "fix" is not needed ;-) It mentions ClownShark and Res_2s ... and thats not something I put in my workflows, dont feel its needed (and the reddit post ask you to remove those)

Just to make sure, double check that you are using LTX-2.3 vae (audio and video), and not LTX-2.0 vae (if you also used that model in the past)

That being said.. the T2V workflow is the oldest one in the collection, so I'll take a look, it might be something that have since been changed to better settings in later workflows ;-)

RuneXX

Owner Apr 16

Actually even more important:

Are you using any custom loras? Made by users?

Most/many of those are trained on images only, and do create a metallic buzz. They have no audio training, and are simple loras
(there are work arounds such as using Advanced Lora Loader from KJ, to "mute" all the audio parts of those loras)

I'll try see if i can find a way to add the Advanced Lora loader in a way where you can easily use more than 1 lora.

But if you are using user made loras, try remove those, and see if its all fine then ;-)

Hearcharted

Apr 16

Thank you for your time!

I'm not using any LoRA, not even this one:

ltx-2.3-22b-distilled-1.1_lora-dynamic_fro09_avg_rank_111_bf16.safetensors (Kijai)

Because I don't even know the purpose of this file 😜

RuneXX

Owner Apr 16

•

edited Apr 16

ltx-2.3-22b-distilled-1.1_lora-dynamic_fro09_avg_rank_111_bf16.safetensors (Kijai)
Because I don't even know the purpose of this file 😜

Its used together with the DEV model to run the dev model as if it was the distilled model (aka, low steps, low cfg).
But if you are using distilled model that lora is not needed ;)

Any chance there are something in your video that are really loud? explosions etc? something that was always a bit "buzzy"

And also make sure you are using version 1.1 of the upscale model. I did notice one thing in that workflow from the "old" days, that the sampler is set to LCM in the 2nd pass. Try set it to euler_cfg_pp. But shouldnt really matter, but I never tried that with the v1.1 model .

Hearcharted

Apr 16

Interesting 🤔

RuneXX

Owner Apr 16

•

edited Apr 16

Tested my end, and all seems ok, with the T2V workflow and the new version 1.1 distilled model (from Kijai). And nothing changed (except the model, v1.1 distilled main and v1.1 upscaler)
Not sure whats different your end then.

Try double check the models... And perhaps re-download the workflow fresh, in case you accidently bumped into one of the nodes or settings somewhere.
(I'll update that workflow though, to euler_cfg_pp instead of LCM, since thats more in line with ltx defaults. But the above video is with LCM and V.1.1 model)

No bees, but lots of birds ;-) but i prompted for that, so there would be more than just the vocals, to see if something failed

RuneXX

Owner Apr 16

•

edited Apr 16

Tested a few more runs. Maybe there is a little bit of faint humming, but not sure. Its probably just a bit of ambient sound.
And probably not what you are having your end.

And sorry for the angry sounding videos ;-) seems like LTX went for that that tone when the prompt including "sounding bad" etc.. haha ;-) I should have prompted "happy gothic witch is saying" hehe

Hearcharted

Apr 16

I made Zer0 changes in the nodes department, maybe the problem was the Spatial Upscaler x2.
I changed the "Spatial Upscaler x2" to Version 1.1 and made a test with Music and Sound FX.
The result is very crystal clear.
The thing is, the generated video is rendered as 1024x1920... Even if I set it to 1080x1920 🤔

RuneXX

Owner Apr 16

•

edited Apr 16

Yeah, that could actually be the reason. Version v1.0 model could use both the 1.0 and 1.1 upscaler.

But the 1.1 model has a lot of changes, so they updated all their IC-Loras etc.. perhaps it also needs the v1.1 upscaler.

As for the resolution, LTX must have size divisible by 32, and will auto adjust other size.

So for 1080 that is not divisible by 32. So that does end up as 1024.
You can use 1088 though

Hearcharted

Apr 16

I'm generating some kind of Liminal Spaces / Analog Horror videos and the sound was/is getting out a little crazy/non-sense.

But, I generated a Happy Golden Retriever video with Music and SFX and the result is kind of remarkable! 😅

Hearcharted

Apr 16

Thank you again for your time, looks like everything is solved 😎

RuneXX

Owner Apr 16

I'm generating some kind of Liminal Spaces / Analog Horror videos and the sound was/is getting out a little crazy/non-sense.

ah yes, that might have been very noisy if the model didnt like v.1.0 upscaler ;-)

Portland01

Apr 17

"(there are work arounds such as using Advanced Lora Loader from KJ, to "mute" all the audio parts of those loras)"

Interesting. I myself use several loras in your workflows. I do find there is sometimes a robotic voice and buzzing sound in the background for a lot of my generations. How would I go about adding this to your workflows? Is it similar to the Power Lora Loader (rgthree) but replaces that? Whats the exact name called? Its not popping up in Comfy when doing a search for it. Also checked Comfy manager but nada.

RuneXX

Owner Apr 17

•

edited Apr 17

Instead of loading those "simpler" use made loras in the Power Lora, add "LTX2 Lora Loader Advanced " and load those loras in that node. ("simpler" because they are usually not trained on audio data)
Its similar to Power Lora Loader, with the exception of it being 1 lora per node.

So I was thinking making a subgraph that had 6 or so of the Advanced Lora Loader, and only show the drop down to load each lora at parent top level workflow. To make it easier.

But for now, you can try add the Advanced Lora Loader yourself. Just connect it before the Power Lora Loader. Model in (previously connected to power lora) and then model out (to the power lora loader)

And lastly the critical part, mute all the audio weights (set strength to zero)
You can chain multiple of them together (and is essentially what i might add to the workflows, but "hidden" in a subgraph. Since i bet "simpler" user made loras will just be more and more out there)

( the "video to audio" you can probably leave at 1... )

Portland01

Apr 20

Helpful as always. Thank you Rune. Just tried it out and it does indeed remove the audio from the loras. I get different voice overs now.

holycowdude

Apr 23

@RuneXX awesome work on the workflows!

I'm trying to make some portrait 9:16 aspect ratio videos but when i try anything over 832x1472 such as 1088x1920 I either get a floating section at the top or extremely stretched videos
Are there any improvements we can make to the workflow to enable portrait video to work better at HD please? thx

RuneXX

Owner Apr 23

•

edited Apr 23

With the T2V Basic workflow?

And using LTX-2.3 right? (LTX 2.0 had a problem with large HD portraits)

holycowdude

Apr 23

•

edited Apr 23

Sorry yes LTX-2.3_-_T2V_Simple_single_pass & ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors
I've checked all the VAE's etc and they are all Ltx2.3

ltx-2.3-spatial-upscaler-x2-1.1.safetensors

RuneXX

Owner Apr 23

And the upscaler? its also at v1.1 ?
But will try here with a HD portrait, see if I can reproduce

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment