Search results

Jump to navigation Jump to search
  • ...ma-3.2-3B]] architecture, it uses a novel approach of using large-language-models with audio tokens instead of traditional TTS-specific architectures. ...onventional text-to-speech systems by using a modified Meta's Llama-3.2-3B language model as its foundation. It takes in a text prompt and generates audio toke ...
    4 KB (486 words) - 16:06, 20 September 2025
  • ...oice-community/VibeVoice</ref><ref>https://huggingface.co/aoi-ot/VibeVoice-Large</ref> VibeVoice uses a hybrid architecture combining large language models with diffusion-based audio generation. The system uses two specialized toke ...
    7 KB (847 words) - 02:53, 23 September 2025
  • ..."Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model," published at AAAI 2025. ...ce Quantization (FSQ) for stability and compatibility with causal language models. ...
    4 KB (533 words) - 02:33, 23 December 2025