HiFi-GAN-2: Studio-quality Speech Enhancement
via Generative Adversarial Networks
Conditioned on Acoustic Features
Jiaqi Su, Zeyu Jin, Adam Finkelstein
Real Demo for Ted Talk
Original input:
HiFi-GAN-2 result at 48k:
HiFi-GAN (previous work) result at 48k:
Real Demo for VCTK Noisy
Original input:
HiFi-GAN-2 result at 48k:
HiFi-GAN (previous work) result at 48k:
Real Demo for DAPS
Original input:
HiFi-GAN-2 result at 48k:
HiFi-GAN (previous work) result at 48k:
* Using a model trained on our augmented synthetic dataset with speech corpus from the DAPS Dataset [5] and room impulse responses from MIT IR Survey Dataset [6].
SAMPLES
Loading......
REFERENCES
X. Hao, X. Su, R. Horaud, and X. Li, “Fullsubnet: A fullband
and sub-band fusion model for real-time single-channel
speech enhancement,” arXiv:2010.15508, 2020.
A. D´efossez, G. Synnaeve, and Y. Adi, “Real time speech
enhancement in the waveform domain,” Proc. Interspeech
2020, pp. 3291–3295.
A. Polyak, L. Wolf, Y. Adi, O. Kabeli, and Y. Taigman,
“High fidelity speech regeneration with application to speech
enhancement,” arXiv preprint arXiv:2102.00429, 2021.
J. Su, Z. Jin, and A. Finkelstein, “HiFi-GAN: high-fidelity
denoising and dereverberation based on speech deep features
in adversarial networks,” in Interspeech 2020.
G. J. Mysore, “Can we automatically transform speech recorded
on common consumer devices in real-world environments into
professional production quality speech? A dataset, insights, and
challenges,” IEEE Signal Processing Letters, vol. 22, no. 8, pp.
1006–1010, 2015.
J. Traer and J. H. McDermott, “Statistics of natural reverberation
enable perceptual separation of sound and space,” Proceedings of
the National Academy of Sciences, vol. 113, no. 48, pp. E7856–
E7865, 2016.