Mayowa Daniel's picture

2 2 4

Mayowa Daniel

mayowadan

·

AI & ML interests

None yet

Recent Activity

liked a Space 21 days ago

ResembleAI/Chatterbox-Multilingual-TTS

new activity 3 months ago

ibm-granite/granite-docling-258M:Hallucinations

new activity 3 months ago

ibm-granite/granite-docling-258M:Why is granite-docling-258M so slow?

View all activity

Organizations

None yet

liked a Space 21 days ago

Chatterbox-Multilingual-TTS

Chatterbox TTS supporting 23 languages

New activity in ibm-granite/granite-docling-258M 3 months ago

Hallucinations

#39 opened 3 months ago by

Why is granite-docling-258M so slow?

#37 opened 3 months ago by

liked a model 4 months ago

calcuis/chatterbox-gguf

Text-to-Speech • 0.3B • Updated Sep 18 • 3.43k • 48

replied to Xenova's post 4 months ago

Amazing work!
Where can I find the unminified code for the demo?

reacted to hexgrad's post with ➕ 6 months ago

Post

4157

IMHO, being able & willing to defeat CAPTCHA, hCaptcha, or any other reasoning puzzle is a must-have for any Web-Browsing / Computer-Using Agent (WB/CUA).

I realize it subverts the purpose of CAPTCHA, but I do not think you can claim to be building AGI/agents without smoothly passing humanity checks. It would be like getting in a self-driving car that requires human intervention over speed bumps. Claiming AGI or even "somewhat powerful AI" seems hollow if you are halted by a mere CAPTCHA.

I imagine OpenAI's Operator is *able* but *not willing* to defeat CAPTCHA. Like their non-profit status, I expect that policy to evolve over time—and if not, rival agent-builders will attack that opening to offer a better product.

2 replies

·

reacted to hexgrad's post with 👍 6 months ago

Post

6127

I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.

reacted to Jaward's post with 🚀 6 months ago

Post

3495

Awesome intro to LLM course "Language Modeling from Scratch" by stanford. love the aesthetics behind the lecture notes, notes-in-code genius idea👍
Course site: https://stanford-cs336.github.io/spring2025/
Repo: https://github.com/stanford-cs336/spring2025-lectures
Videos: https://www.youtube.com/playlist?list=PLoROMvodv4rOY23Y0BoGoBGgQ1zmU_MT_

2 replies

·

upvoted 2 articles 6 months ago

Article

Upgrading Kokoro: natural TTS for short bursts

Nov 22, 2024

•

31

Article

G2P Shrinks Speech Models

Feb 5

•

82

commented on G2P Shrinks Speech Models 6 months ago

Love this explainer

Way more approachable than an academic paper

Noticed an issue with “live” in particular so I look forward to the neural augmentation for homonyms you’ve mentioned.

Let me know if you’d like any help with it whatsoever

commented on Upgrading Kokoro: natural TTS for short bursts 6 months ago

This makes a huge difference.

I noticed kokoro handles like single Roman numerals better than anything OpenAI has to offer.

Thanks so much!

liked a Space 6 months ago

Kokoro TTS

Upgraded to v1.0!

liked a model 6 months ago

mistralai/Mistral-Small-3.1-24B-Instruct-2503

24B • Updated 9 days ago • 84.2k • 1.34k