I'm still looking for the "perfect" setup in order to clone my voice and use it locally to send voice replies in telegram via openclaw. Does anyone have auch a setup?
You need to provide info on your hardware. Pocket-TTS does cloning on CPU, but for me randomly outputs something pretty weird sounding mixed in with like 90% good outputs. So it hasn't been quite stable enough to run without checking output. But maybe it depends on your voice sample.
Qwen 3 TTS is good for voice cloning but requires GPU of some sort.
I want to be my own personal assistant...
EDIT: I can provide it a RTX 3080ti.