For details please refer to our paper.
Input: input audio
Ours-AC: our method, with absolute pitch and AIC loss
Ours-AC-48k: our method, with absolute pitch and AIC loss, bandwidth extended to 448k
Ours-A: our method, with absolute pitch
Ours-AC-noGAN: our method, with absolute pitch and AIC loss, but without GAN training (all other ours are with GAN)
Ours-AC2: our method, with absolute pitch and AIC loss 2
Ours-ACE: our method, with absolute pitch, energy condition and AIC loss
Ours-R: our method, with relative pitch
Ours-RC: our method, with relative pitch and AIC loss
PitchShift[1]: A pitch shifter using psola
AutoVC-F0[2]: AutoVC-F0, an autoencoder voice conversion method
Wav2Vec[3]: Waw2Vec features voice reconstruction
Target: target audio
MOS and Similarity Scores
MOS scores and similarity scores shows that our best models compare favorably with baselines across gender and seen/unseen speaker conversion cases.
Audio Samples (VCTK Dataset)
>>>More Samples