1. Introduction
AI evolution is penetrating our lives and business at an unprecedented speed.
The competition is fierce with OpenAI’s ChatGPT and Google’s Gemini.
The field of “speech synthesis” that generates voice from text has also achieved remarkable development alongside generative AI innovation.
This article summarizes Google’s recently released Gemini 2.5 TTS, OpenAI TTS, and the long-popular VoiceVox’s Zundamon.
Name | Release Date | Main Features |
---|---|---|
Zundamon | June 2021 (VOICEVOX) | ・Free & commercial use (VOICEVOX) ・Brother character of Tohoku Zunko ・Rich character art and MMD materials |
OpenAI TTS | November 6, 2023 (OpenAI Dev Day) | ・Realistic synthetic voice (6 types) ・Integrated text-to-speech conversion ・Voice personality & emotion expression possible |
Gemini 2.5 Pro | March 25, 2025 (Experimental) May 6, 2025 (Preview) |
・Google’s cutting-edge AI model ・Advanced reasoning & coding capabilities ・Multimodal support (text, voice, image, video) |
2. Audio Output Test
Seeing is believing, but hearing is even better. I’ve generated some outputs, so please listen.
Description | Female Voice | Male Voice |
---|---|---|
Zundamon | - | |
OpenAI TTS | ||
Gemini 2.5 Pro |
Among these, Gemini seems to be a step ahead in terms of natural-sounding voice.
While Zundamon sounds unnatural as human voice, it’s familiar and I think it’s still an excellent synthesis engine in terms of clarity.
UI notes for each. I think all UIs have the same quality. VOICEVOX allows you to specify intonation.
— Zundamon UI
— OpenAI TTS UI
— Gemini Pro 2.5 UI
Final Thoughts
The evolution of speech synthesis is amazing…!
I think applications of speech synthesis like simultaneous interpretation systems, AI assistant dialogue, etc., will continue to evolve.
Since more real-time and better quality will be required, technologies that efficiently synthesize with limited computational resources should become more important in the future.
I look forward to future developments.
References
Generate Media | Google AI Studio
OpenAI Text To Speech | Advanced Voice Engine Technology
VOICEVOX | Free Text-to-Speech & Singing Voice Synthesis Software
Introduction to VOICEVOX Speech Synthesis Engine | Hiho’s Blog