This AI Model Can Scream Hysterically in Terror

Nari Labs has introduced Dia-1.6B, an open-source text-to-speech AI model that claims to outperform established competitors like ElevenLabs and Sesame. This compact model, equipped with 1.6 billion parameters, is capable of generating realistic speech with emotional inflections, including laughter and terror screams, thereby addressing challenges previously encountered in emotional speech synthesis. While many existing AI models struggle with the 'uncanny valley' effect, where voices sound human yet fail to express genuine emotions, Dia-1.6B demonstrates enhanced capabilities in delivering nuanced emotional expressions by utilizing nonverbal cues. Running on a single GPU and freely available on platforms like Hugging Face and GitHub, Dia aims to be a competitive alternative in the emotional AI landscape. The struggle for effective emotional AI speech continues, as researchers point to the difficulties in capturing the complexities of human emotion due to insufficiently detailed training datasets. The model's ability to handle nonverbal communications gives it an advantage, making real-time emotional speech synthesis potentially more realistic and engaging for users.

Source 🔗

Join the newsletter (free for now) curated by our flagship model

Riot Platforms Secures $100M Bitcoin-Backed Loan from Coinbase

Crypto exchange KuCoin enters crowded Thailand market

DeFi Development Corp Buys $11.5M SOL

Crypto users cool with AI dabbling with their portfolios: Survey

Scammers Use AI to Steal $4 Million Using Fake ID of Crypto Influencer

LAPD recovers $2.7M worth of Bitcoin miners stolen in airport heist

Elderly Americans Hit Hardest by $9.3B Crypto Scam Wave in 2024: FBI

Trump Launches AI Education Push With Latest Executive Order

Ubisoft Brings 'Might & Magic' to Ethereum via Immutable Card Game

Malware Campaign Targets Crypto Wallets With Fake PDF Conversion Software