AI voices usually aim to be realistic in a friendly way, mimicking relaxed, happy, helpful people. But a new open-source model named Dia is leaning into the more emotional spectrum of voices, including some really intense screaming.
Dia’s creators at Nari Labs are a tiny group, but have given AI voices the option to sound like a somewhat melodramatic performer, capable of making realistic laughing, coughing, throat-clearing, sniffing, and yes, yelling.
You might not think that yelling is a big deal for AI at this point, but screaming is hard to fake. It can’t just be talking loudly; it’s an entirely different speech mode.
Emotionally expressive speech is a gap in most AI voices. It’s easy for a voice model to read a bedtime story. However, it’s much harder for it to sound like it’s trying to calm a friend down, or like it just saw something shocking. Most commercial models avoid sounding robotic by smoothing the tone of the voice, which doesn’t leave room for the kind of audio asymmetry of speaking emotionally.
Dia treats nonverbal communication as part of the performance. It knows that “(coughs)” isn’t something to be ignored or read literally. It knows that a scream isn’t just a louder line. And it performs these things with a level of timing, pitch modulation, and breath control that makes them feel more real.
One enterprising user even used it to recreate a bit of the famous Leroy Jenkins sketch carried out on World of Warcraft.
That’s not to say that OpenAI, ElevenLabs, Google, Sesame, and others haven’t produced amazing AI voice models. You can customize OpenAI’s Advanced Voice Mode to speak with different emotions, and ElevenLabs is good at interpreting capitalization and punctuation to adjust speech, but that’s not the same as yelping in surprise or wheezing with laughter.
Sesame is particularly good at sounding and reacting like a real person, but even its models err towards cheerful and generally positive demeanors.
Of course, realism is subjective, and you might work out pretty quickly that Dia is an AI voice. Then again, fake screams and laughs are also pretty human sounds to make in the right context.
Two undergrads. One still in the military. Zero funding.One ridiculous goal: build a TTS model that rivals NotebookLM Podcast, ElevenLabs Studio, and Sesame CSM.Somehow… we pulled it off. Here’s how 👇 pic.twitter.com/8cfJSegciXApril 21, 2025
Scream for AI
What makes this a bigger story than just “AI voice learns a party trick” is what it signals for the broader race in AI for emotional intelligence.
We’re rapidly entering an era where it won’t be enough for your assistant to say the right thing; it’ll need to say it in the right way. Think customer support bots that sound genuinely sorry, teachers that sound encouraging instead of instructional, and in-game characters that convey sincerity.
Of course, giving AI the power to emote convincingly makes it more persuasive and thus potentially more manipulative. If emotional speech can be just another AI tool, then more than a few people may feel like screaming themselves.
Still, I can imagine some fun writing a ghost story for Dia to not just read, but perform, screams and all.