5 15 49

NAMENAME

VLAD545645645

AI & ML interests

None yet

Recent Activity

liked a model about 10 hours ago

HaojunChen/PixVerve-L2P

liked a model 1 day ago

hustvl/Moebius

liked a model 2 days ago

MiniT2I/MiniT2I

View all activity

Organizations

None yet

liked a model about 10 hours ago

HaojunChen/PixVerve-L2P

Updated 1 day ago • 3

liked a model 1 day ago

hustvl/Moebius

Updated 3 days ago • 8

liked a model 2 days ago

MiniT2I/MiniT2I

Text-to-Image • Updated 5 days ago • 174 • 10

reacted to owensong's post with 🔥 2 days ago

Post

6238

I just released Inflect-Nano-v1, an ultra-small 4.63 parameter text-to-speech model.

The main idea is simple: instead of only making the acoustic model tiny and relying on a larger external vocoder, Inflect-Nano-v1 keeps the complete text-to-waveform stack under 5M parameters.

Quick facts:
- 4.63M total inference parameters
- 3.46M acoustic model
- 1.17M vocoder
- 24 kHz audio
- English-only
- Single male voice
- Runs locally with a simple PyTorch inference script

Why I made it:
Most modern TTS models are much larger, and even many “small TTS” projects depend on a separate vocoder. I wanted to see how far a complete tiny TTS stack could be pushed while still producing usable speech.

It is not SOTA, and I am not trying to claim it competes with large TTS systems. The interesting part is the size-to-functionality ratio.

What works:
It can generate arbitrary English speech locally, and the model is small enough to be interesting for:

- local voice assistants
- embedded/edge experiments
- browser or WASM-style TTS exploration
- efficient inference research
- tiny-model baselines

Limitations:
The quality is still limited. It can sound robotic, stumble on difficult unseen text, and the vocoder is still a clear bottleneck. Long or unusual prompts are less reliable.

So I would frame this as a research/demo release, not a production TTS engine.

I’d love feedback from people interested in:
- tiny speech models
- vocoders
- local TTS
- efficient inference
- embedded speech synthesis
- improving small-model generalization

If people find it useful, I’m interested in putting more training budget into a stronger v2.

Model page:
owensong/Inflect-Nano-v1

liked a model 3 days ago

wxli318/PixelWizard

Updated 24 days ago • 2

liked a model 4 days ago

Boogu/Boogu-Image-0.1-Edit

Updated about 16 hours ago • 374 • 79

liked a model 5 days ago

yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF

Text Generation • 12B • Updated 3 days ago • 359k • 2.08k

reacted to SeaWolf-AI's post with 😎 6 days ago

Post

4848

Darwin V9 — GPQA Diamond 90.9%, #1 on the leaderboard, with pure greedy decoding
Darwin-398B-JGOS reaches 90.9% (180/198) on GPQA Diamond, the PhD-level scientific reasoning benchmark, ranking #1 on the Hugging Face GPQA Diamond leaderboard. No self-consistency, no test-time compute scaling — this was achieved with a single greedy decode (temperature 0, single sample, max 16,384 tokens). The full eval config is published in the model card, so anyone can reproduce it. Raw reasoning, no score inflation.
The result comes from Darwin V9, a patented evolutionary model-development platform. Its core idea: it never trains a model from scratch.
Why Darwin V9 beats training from scratch

Cost & speed: no trillion-token pretraining run, no months of compute — a purpose-built, high-performance model is produced in a fraction of the time.
Reuse of proven intelligence: instead of re-learning every capability from a blank slate, it selects and combines only the strengths of already-trained, already-validated models, so results are stable and predictable.
Surgical transplantation: it identifies which neural region of which model holds which capability — at the FFN (Feed Forward Network) layer level — and grafts in only the segments that contribute to the target skill.

How it works: a large model (Qwen 3.5 397B) serves as the mother model (the substrate); several father models specialized in reasoning, coding, and language are analyzed layer-by-layer across their FFN regions; the segments that contribute to the target performance are extracted and transplanted into the mother model to produce a new child model. The result is a ~400B MoE that activates only ~17B parameters per token at inference — large-model capacity with efficient inference.
If training from scratch means rebuilding everything from a blank page, Darwin V9 means precisely recombining intelligence that has already been proven. GPQA Diamond #1 is the proof.
Model: FINAL-Bench/Darwin-398B-JGOS