MegaTTS 3 Voice Cloning

MegaTTS 3 is a text-to-speech model trained by ByteDance with exceptional voice cloning capabilities. The original authors did not release the WavVAE encoder, so voice cloning was not publicly available; however, thanks to @ACoderPassBy's WavVAE encoder, we can now clone voices with MegaTTS 3!

This is by no means the best voice cloning solution, but it works pretty well for some specific use-cases. Try out multiple and see which one works best for you.

Please use this Space responsibly and do not abuse it! This demo is for research and educational purposes only!

h/t to MysteryShack on Discord for the info about the unofficial WavVAE encoder!

Upload a reference audio clip and enter text to generate speech with the cloned voice.

Reference Audio

Text to Generate

Generated Audio