How to make sure transcription_delay_ms is changed when serving with vLLM?
I created a docker for serving Voxtral-Mini-4B-Realtime through vLLM and edited the tekken.json directly inside the container. I then restarted the docker. When inspecting the container again after restarting, the tekken.json was still changed, but the delay between transcriptions still seemed like they were 480ms instead of 2400ms.
How can ensure that transcription_delay_ms is being correctly changed? In my use case, I do not need the streaming to be as fast as 480ms, 2400ms of delay is ok.
Once you've edited the tekken.json correctly (for example in your HF cache where the file was downloaded) the corresponding delay should be applied automatically
@patrickvonplaten I am also trying to modify transcription_delay_ms - I edited tekken.json in the model cache where it was downloaded but no matter what value I put in, it seems it's always waiting 480ms before transcribing (i tried to go all the way to 80 and 2400). I am sure I am editing the correct file: if I change 'streaming_n_left_pad_tokens' to 0 I can see it is effetive as the TTFT latency drops by 80ms ca.