F5-TTS vs Fish-Speech: I Cloned My Voice With Both on RTX 4090 (2026)
I cloned my voice with two different AI tools on the same RTX 4090 last Tuesday night. F5-TTS did it in 3 seconds using 2.4GB of VRAM. Fish-Speech took 30 seconds and consumed 18GB. Then I played both clips to a colleague without telling him which was which — he picked Fish-Speech as the “real”
Read more about F5-TTS vs Fish-Speech: I Cloned My Voice With Both on RTX 4090 (2026)