BWE on DAPS

Learning Bandwidth Expansion
Using Perceptually-Motivated Loss

Berthy Feng, Zeyu Jin, Jiaqi Su, Adam Finkelstein

[Paper] [Github coming soon]

We introduce a perceptually motivated approach to bandwidth expansion for speech. Our method pairs a new 3-way split variant of the FFTNet neural vocoder structure with a perceptual loss function, combining objectives from both the time and frequency domains. Mean opinion score tests show that it outperforms baseline methods from both domains, even for extreme bandwidth expansion. This page contains the audio clips used in the MOS test described in Section 4 of the paper.

SAMPLES

INPUT: Low-resolution (8 kHz) speech recording
OUR: 3-way FFTNet, trained on L1 + perceptual loss
WAV: waveform-based DNN (Kuleshov)
SPEC: spectrogram-based DNN (Li)
REF: Original high-resolution (44.1 kHz) speech recording