NAMENAME's picture

NAMENAME

VLAD545645645

·

AI & ML interests

None yet

Recent Activity

reacted to owensong's post with 🔥 about 9 hours ago

I just released Inflect-Nano-v1, an ultra-small 4.63 parameter text-to-speech model. The main idea is simple: instead of only making the acoustic model tiny and relying on a larger external vocoder, Inflect-Nano-v1 keeps the complete text-to-waveform stack under 5M parameters. Quick facts: - 4.63M total inference parameters - 3.46M acoustic model - 1.17M vocoder - 24 kHz audio - English-only - Single male voice - Runs locally with a simple PyTorch inference script Why I made it: Most modern TTS models are much larger, and even many “small TTS” projects depend on a separate vocoder. I wanted to see how far a complete tiny TTS stack could be pushed while still producing usable speech. It is not SOTA, and I am not trying to claim it competes with large TTS systems. The interesting part is the size-to-functionality ratio. What works: It can generate arbitrary English speech locally, and the model is small enough to be interesting for: - local voice assistants - embedded/edge experiments - browser or WASM-style TTS exploration - efficient inference research - tiny-model baselines Limitations: The quality is still limited. It can sound robotic, stumble on difficult unseen text, and the vocoder is still a clear bottleneck. Long or unusual prompts are less reliable. So I would frame this as a research/demo release, not a production TTS engine. I’d love feedback from people interested in: - tiny speech models - vocoders - local TTS - efficient inference - embedded speech synthesis - improving small-model generalization If people find it useful, I’m interested in putting more training budget into a stronger v2. Model page: https://huggingface.co/owensong/Inflect-Nano-v1

liked a model 1 day ago

wxli318/PixelWizard

liked a model 2 days ago

Boogu/Boogu-Image-0.1-Edit

View all activity

Organizations

None yet

VLAD545645645 's models

None public yet