1. Introduction

AI evolution is penetrating our lives and business at an unprecedented speed.

The competition is fierce with OpenAI’s ChatGPT and Google’s Gemini.

The field of “speech synthesis” that generates voice from text has also achieved remarkable development alongside generative AI innovation.

This article summarizes Google’s recently released Gemini 2.5 TTS, OpenAI TTS, and the long-popular VoiceVox’s Zundamon.

Name Release Date Main Features
Zundamon June 2021 (VOICEVOX) ・Free & commercial use (VOICEVOX)
・Brother character of Tohoku Zunko
・Rich character art and MMD materials
OpenAI TTS November 6, 2023 (OpenAI Dev Day) ・Realistic synthetic voice (6 types)
・Integrated text-to-speech conversion
・Voice personality & emotion expression possible
Gemini 2.5 Pro March 25, 2025 (Experimental)
May 6, 2025 (Preview)
・Google’s cutting-edge AI model
・Advanced reasoning & coding capabilities
・Multimodal support (text, voice, image, video)

2. Audio Output Test

Seeing is believing, but hearing is even better. I’ve generated some outputs, so please listen.

Description Female Voice Male Voice
Zundamon -
OpenAI TTS
Gemini 2.5 Pro

Among these, Gemini seems to be a step ahead in terms of natural-sounding voice.

While Zundamon sounds unnatural as human voice, it’s familiar and I think it’s still an excellent synthesis engine in terms of clarity.

UI notes for each. I think all UIs have the same quality. VOICEVOX allows you to specify intonation.

— Zundamon UI Zundamon

— OpenAI TTS UI OpenAI TTS UI

— Gemini Pro 2.5 UI Gemini Pro 2.5 UI

Final Thoughts

The evolution of speech synthesis is amazing…!

I think applications of speech synthesis like simultaneous interpretation systems, AI assistant dialogue, etc., will continue to evolve.

Since more real-time and better quality will be required, technologies that efficiently synthesize with limited computational resources should become more important in the future.

I look forward to future developments.

References

Generate Media | Google AI Studio

OpenAI Text To Speech | Advanced Voice Engine Technology

VOICEVOX | Free Text-to-Speech & Singing Voice Synthesis Software

Introduction to VOICEVOX Speech Synthesis Engine | Hiho’s Blog