<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://tts.wiki/index.php?action=history&amp;feed=atom&amp;title=Chatterbox</id>
	<title>Chatterbox - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://tts.wiki/index.php?action=history&amp;feed=atom&amp;title=Chatterbox"/>
	<link rel="alternate" type="text/html" href="https://tts.wiki/index.php?title=Chatterbox&amp;action=history"/>
	<updated>2026-04-03T18:51:50Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.5</generator>
	<entry>
		<id>https://tts.wiki/index.php?title=Chatterbox&amp;diff=42&amp;oldid=prev</id>
		<title>Ttswikiadmin: Add Chatterbox</title>
		<link rel="alternate" type="text/html" href="https://tts.wiki/index.php?title=Chatterbox&amp;diff=42&amp;oldid=prev"/>
		<updated>2025-09-20T16:27:59Z</updated>

		<summary type="html">&lt;p&gt;Add Chatterbox&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{Infobox TTS model&lt;br /&gt;
| name = Chatterbox&lt;br /&gt;
| developer = [[Resemble AI]]&lt;br /&gt;
| release_date = May 2025&lt;br /&gt;
| latest_version = Multilingual 2.0&lt;br /&gt;
| architecture = [[CosyVoice 2.0]]-based&lt;br /&gt;
| parameters = 500 million&lt;br /&gt;
| training_data = 500,000 hours cleaned data&lt;br /&gt;
| languages = 23 languages (multilingual version)&lt;br /&gt;
| voices = Zero-shot voice cloning&lt;br /&gt;
| voice_cloning = Yes (5-second reference)&lt;br /&gt;
| emotion_control = Yes (exaggeration parameter)&lt;br /&gt;
| streaming = Yes&lt;br /&gt;
| latency = Sub-200ms&lt;br /&gt;
| license = [[MIT License]]&lt;br /&gt;
| open_source = Yes&lt;br /&gt;
| code_repository = [https://github.com/resemble-ai/chatterbox GitHub]&lt;br /&gt;
| model_weights = [https://huggingface.co/ResembleAI/chatterbox Hugging Face]&lt;br /&gt;
| demo = [https://huggingface.co/spaces/ResembleAI/Chatterbox HF Spaces]&lt;br /&gt;
| website = [https://www.resemble.ai/chatterbox/ resemble.ai/chatterbox]&lt;br /&gt;
}}&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Chatterbox&amp;#039;&amp;#039;&amp;#039; is an open-source [[text-to-speech]] (TTS) model developed by [[Resemble AI]] and released in May 2025. Built on a modified [[Llama]] architecture with 500M parameters, it is marketed as the first open-source TTS model to include controllable emotion exaggeration and has gained attention for claiming to outperform established commercial systems in user preference evaluations. It is built on the [[CosyVoice|CosyVoice 2.0]] architecture.&lt;br /&gt;
&lt;br /&gt;
== Development and Release ==&lt;br /&gt;
&lt;br /&gt;
Chatterbox was developed by a three-person team at Resemble AI, a voice technology company founded by Zohaib Ahmed and Saqib Muhammad.&amp;lt;ref&amp;gt;https://www.digitalocean.com/community/tutorials/resemble-chatterbox-tts-text-to-speech&amp;lt;/ref&amp;gt; The initial English-only version was released in May 2025 under the [[MIT License]], followed by a multilingual version supporting 23 languages in September 2025.&amp;lt;ref&amp;gt;https://www.resemble.ai/introducing-chatterbox-multilingual-open-source-tts-for-23-languages/&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The project quickly gained popularity in the open-source community, accumulating over 1 million downloads on [[Hugging Face]] and more than 11,000 stars on [[GitHub]] within weeks of release.&amp;lt;ref name=&amp;quot;multilingual&amp;quot;&amp;gt;https://www.resemble.ai/introducing-chatterbox-multilingual-open-source-tts-for-23-languages/&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Technical Architecture ==&lt;br /&gt;
&lt;br /&gt;
Chatterbox utilizes a 500-million parameter model based on a CosyVoice-style modified Llama architecture, significantly smaller than many contemporary TTS systems. The model was trained on approximately 500,000 hours of cleaned audio data and employs what the developers term &amp;quot;alignment-informed inference&amp;quot; for improved stability during generation.&lt;br /&gt;
&lt;br /&gt;
Key technical features include:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Zero-shot voice cloning&amp;#039;&amp;#039;&amp;#039;: Ability to clone voices using as little as 5 seconds of reference audio&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Emotion exaggeration control&amp;#039;&amp;#039;&amp;#039;: A novel parameter allowing users to adjust emotional intensity from monotone to dramatically expressive&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Fast inference&amp;#039;&amp;#039;&amp;#039;: Sub-200ms latency for real-time applications&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Multilingual support&amp;#039;&amp;#039;&amp;#039;: The updated version supports 23 languages including Arabic, Chinese, Hindi, and major European languages&lt;br /&gt;
&lt;br /&gt;
== Performance Claims and Evaluation ==&lt;br /&gt;
&lt;br /&gt;
Resemble AI conducted a comparative evaluation through [[Podonos]], a third-party evaluation service, testing Chatterbox against [[ElevenLabs]], a leading commercial TTS system. In blind A/B testing, 63.75% of evaluators reportedly preferred Chatterbox&amp;#039;s output over ElevenLabs.&amp;lt;ref&amp;gt;https://www.podonos.com/blog/chatterbox&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;https://www.resemble.ai/chatterbox/&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, these results should be interpreted with caution, as the evaluation was limited in scope and conducted by a single third-party service. The testing methodology, sample size, and demographic composition of evaluators have not been independently verified. Additionally, the comparison was limited to a single competitor rather than a comprehensive benchmark against multiple state-of-the-art systems.&lt;br /&gt;
&lt;br /&gt;
== Commercial and Research Impact ==&lt;br /&gt;
&lt;br /&gt;
The release of Chatterbox has been significant for the open-source TTS community, representing one of the first production-grade systems to be freely available under a permissive license. This has enabled developers to integrate high-quality TTS capabilities into applications without licensing costs or vendor dependencies.&lt;br /&gt;
&lt;br /&gt;
The system has found applications in various domains including:&lt;br /&gt;
&lt;br /&gt;
* Audiobook generation and voice narration&lt;br /&gt;
* Game development for non-player character dialogue&lt;br /&gt;
* Educational content creation&lt;br /&gt;
* Accessibility tools for visually impaired users&lt;br /&gt;
* Research and development in speech synthesis&lt;br /&gt;
&lt;br /&gt;
Resemble AI also offers a commercial &amp;quot;Pro&amp;quot; version with enhanced features, service-level agreements, and custom fine-tuning capabilities for enterprise customers requiring guaranteed performance and support. This version is available through their inference partners, such as FAL.&lt;br /&gt;
&lt;br /&gt;
== External Links ==&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/resemble-ai/chatterbox Official Chatterbox repository]&lt;br /&gt;
* [https://huggingface.co/ResembleAI/chatterbox Model on Hugging Face]&lt;br /&gt;
* [https://huggingface.co/spaces/ResembleAI/Chatterbox Interactive demo]&lt;br /&gt;
* [https://resemble-ai.github.io/chatterbox_demopage/ Demo page with audio samples]&lt;br /&gt;
&lt;br /&gt;
[[Category:Speech synthesis]]&lt;br /&gt;
[[Category:Open-source software]]&lt;br /&gt;
[[Category:Artificial intelligence]]&lt;br /&gt;
[[Category:Voice technology]]&lt;br /&gt;
[[Category:MIT License software]]&lt;/div&gt;</summary>
		<author><name>Ttswikiadmin</name></author>
	</entry>
</feed>