Workflow : FLF2V - First-Last Frame & First-Middle-Last Frame

#17
by RuneXX - opened

FLF2V & FMLF2V - Frame Injection (In-Place node)
https://huggingface.co/RuneXX/LTX-2.3-Workflows

Will make a guider too that often is a bit more smooth giving a bit more freedom to the model (using the guider nodes)

Was that circular transition effect made using prompt? 🤔

did you try LTXVAddGuideMulti ? two pics work, but 3 pic no.

Was that circular transition effect made using prompt? 🤔

the model will do its own things ;-) from middle to end frame there wasnt much time to do much else i suppose ;-)

did you try LTXVAddGuideMulti ? two pics work, but 3 pic no.

Not yet, will give it a go, the guider i think works a bit better than frame injection. Hopefully it works

LTXVImgToVideoInplaceKJ works , but i find LTX2.3 still not that follow instructions, not as good as wan2.2

FLF2V & FMLF2V - Frame Injection (In-Place node)
https://huggingface.co/RuneXX/LTX-2.3-Workflows

Will make a guider too that often is a bit more smooth giving a bit more freedom to the model (using the guider nodes)

Awesome! Btw u can try my TE's too. https://huggingface.co/Sikaworld1990/gemma-3-12b-it-abliterated-sikaworld-high-fidelity-edition-Ltx-2 and https://huggingface.co/Sikaworld1990/gemma3-12B-hereticx-sikaworld-ltx-2

Awesome! Btw u can try my TE's too. https://huggingface.co/Sikaworld1990/gemma-3-12b-it-abliterated-sikaworld-high-fidelity-edition-Ltx-2 and https://huggingface.co/Sikaworld1990/gemma3-12B-hereticx-sikaworld-ltx-2

Hi, How are your text encoders different from the regular TE? Is it for Purely NSFW prompts?wont the regular gemma TE not work with NSFW at all?

What does everyone concider best practice: Full size image injection? Or resized down to match 50% latent space (when using the 2x upscaler 2nd pass) ?

Awesome! Btw u can try my TE's too. https://huggingface.co/Sikaworld1990/gemma-3-12b-it-abliterated-sikaworld-high-fidelity-edition-Ltx-2 and https://huggingface.co/Sikaworld1990/gemma3-12B-hereticx-sikaworld-ltx-2

Hi, How are your text encoders different from the regular TE? Is it for Purely NSFW prompts?wont the regular gemma TE not work with NSFW at all?

It's all explained in the description it's so called mixed precision and it's not only for nsfw.

What does everyone concider best practice: Full size image injection? Or resized down to match 50% latent space (when using the 2x upscaler 2nd pass) ?

I tried the 1.5x upscaler, and 0.667% scale for the first pass. Its a bit slower to render (since the first pass has larger starting input than the "traditional" half size). But the results are even more HQ (with the caveat i only did 2-3 tests...)
But might be worth a shot, and logically it sort of makes sense, i guess ;-)

I'm having an error with this workflow:

got prompt
invalid prompt: {'type': 'missing_node_type', 'message': "Node 'LTXV Spatio Temporal Tiled VAE Decode' has no class_type. The workflow may be corrupted or a custom node is missing.", 'details': "Node ID '#209'", 'extra_info': {'node_id': '209', 'class_type': None, 'node_title': 'LTXV Spatio Temporal Tiled VAE Decode'}}

I didn't add a thing, just the routes for the models.

Yes, in retrospect that was a bad idea. I left the LTXV Spatio Temporal Tiled VAE Decode at the very end of the workflow, in case someone wanted to use that instead of the default tiled vae decode.
Just delete the node, its at the far right of the workflow, and its not in use

Alternatively install https://github.com/Lightricks/ComfyUI-LTXVideo/ that has this node ;-)

( i will remove it and update the workflow, users that want to use that node instead can add it themselves, since they probably already know it if so )

Btw, LTX-2.3 have official Inpainting IC-LoRA, right? 🤔 does the injected frames can be inpainted too?

T

Btw, LTX-2.3 have official Inpainting IC-LoRA, right? 🤔 does the injected frames can be inpainted too?

It was mentioned in the release.
But looks like that lora is not yet out.

Although the motion tracking lora says "inpainting". So perhaps that its with some spline editor for motion.
Havent checked it out myself yet.

Added a guider version as well...
(i forgot to prompt her to open the beer lol .. but you get the idea ;-))

https://huggingface.co/RuneXX/LTX-2.3-Workflows

Thanks for the effort @RuneXX and a WF. Thanks Kijai and thanks to LTX team.
I try FIRST LAST frame on RTX 3090 24gb (runeXX injection FLF WF) and 128gb RAM and it worked by default parameters. It also work nice with official DEV model and WF (121frame 1088x1920). But with FLF changing resolution (and rotation portrait 9:16) I get this strange output. What is best parameteres to set for "Resize Image by Longer", img_compression, CFG (for best results), steps (20-40 I guess)....what else?
Any advice (or WF for 24gb VRAM) I will try and test. Thanks in advance, and sorry if my English is bad.
Screenshot_79

Strange. I will try portrait mode also, see if its anything wrong ;-)

New WF FLM2V is working. Try it with "DEV transformer only fp8" model (enable lora 384) 8 steps 1088x1920 and it work. 15min for 10 sec .
EDITED: I just noticed one thing, maybe that was problem in FL2v. Output resolution is 1056x1920 (even in settings I set 1088 width x 1920 height)

Yes, I just tried FLM2V with portrait and worked fine.
(i did update it as well, but very small update. Euler_cfg as sampler, and new sigma values to match those from LTX team. Plus I set the middle frame strength to 0.5 but this one can adjust as one wants)

And thinking about it, try download the workflow again. There was an error the first time i uploaded with width set to both height and width for the 2nd image (but that was long ago fixed ;-))
(i'll update it now for Euler_cfg as sampler, and new sigma values though. So if you wait a few minutes )

Just a quick low resolution test to see if all was ok :

LTXVAddGuideMulti last 1 seconds result is very bad, LTXVImgToVideoInplaceKJ don't have that problem,but kj recommend LTXVAddGuideMulti

Depends on the output you want. LTXVImgToVideoInplaceKJ is putting image into a frame, and is very rigid (but for first last frame thats ok)
LTXVAddGuide is a guider gives the model more freedom (usually better for middle frame)

Can even combine, use both

Does anyone know if it's possible to use a custom audio while setting first and last frame? First/last frame works fine, first frame + audio works fine, but is it possible to have both at the same time?

Does anyone know if it's possible to use a custom audio while setting first and last frame? First/last frame works fine, first frame + audio works fine, but is it possible to have both at the same time?

You can combine yes, entirely possible ;-)
Just need to connect the custom audio input at the First-Last frame workflow

(see how its connected in the custom audio workflow. Basically to the LTXVConcatAVLatent instead of empty audio. And copy that over to your FLF workflow. If you have both workflows over and select that area, comfyUI lets you copy from one workflow to the other)

Does anyone know if it's possible to use a custom audio while setting first and last frame? First/last frame works fine, first frame + audio works fine, but is it possible to have both at the same time?

You can combine yes, entirely possible ;-)
Just need to connect the custom audio input at the First-Last frame workflow

(see how its connected in the custom audio workflow. Basically to the LTXVConcatAVLatent instead of empty audio. And copy that over to your FLF workflow. If you have both workflows over and select that area, comfyUI lets you copy from one workflow to the other)

If you could confirm that, you'd have a friend for life. I've spent 2 full days trying to get it working, and while it overlays the audio no problem, I cannot get it to actually use the audio (for lip sync and such, which works fine with just first frame+audio). I feel like I've tried every possible combination.

If you could confirm that, you'd have a friend for life. I've spent 2 full days trying to get it working, and while it overlays the audio no problem, I cannot get it to actually use the audio (for lip sync and such, which works fine with just first frame+audio). I feel like I've tried every possible combination.

Will give it a try ;-) gotta head out for a bit, but will do asap ;-)

I did the FLF by accident, but worked. So I'll do the FMLF also ;-)

The source audio file from Qwen TTS:

Granted the model struggled a little bit, it can be a bit of challenge it seems to not end up with a voice-over narrator.
So transcribing the dialog seemed to help. Then the man says ".... .write out what the audio says ... "

Uploaded First-Middle-Last frame also. With custom audio.
The video above was just a first run, with some better prompting like "slowly putting down the soda can" or different seed, the transition would be different.
The middle frame being so drastically different and just 10 seconds, the model seemed to favour a transition effect.

But you get the idea ;-) Seems to work at least ;-) (although a bit challenging, transcribe the audio seems to help)
might take a few trial and errors with prompt and seed before you get the exact thing you want ;-)

Can also try negative prompting such as "no talking" etc see if that bumps the model in the right direction ;-)

Thank you! The funny thing is, I had it set up correctly the whole time, it's just all of the various different combinations of images/audio I was using to test refuse to work (50+ gens, 0% success). I tried your workflow with my images and still no luck, but I tried with a different set and it works! Still a bit hit or miss, but it's working. It was driving me insane, I knew it was possible. Thank you!

Yes it was a bit more challenging to get the lip-sync instead of a voice over narrator than i would have thought.
But with some trials it gets there ;-)

I have a First Last Frame with Custom Audio, does anyone have one that works with FP8dev not distilled or scaled ?

how do i fix the invalid tokenizer error?

how do i fix the invalid tokenizer error?

Not entirely sure, but perhaps wrong text encoder?
Try these https://huggingface.co/Comfy-Org/ltx-2/tree/main/split_files/text_encoders for Gemma
and combine with this https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders ... in a Dual Clip loader

And check if the clip encoder is set to LTX

image

(and as always keep kjnodes, comfyui, comfyui-gguf nodes up to date)

Sign up or log in to comment