AM on DAPS

Acoustic Matching by Embedding Impulse Responses

Jiaqi Su, Zeyu Jin, Adam Finkelstein

Acoustic matching aims to transform audio recordings made in one acoustic environment to sound as if recorded in a different environment, based on reference recordings made in the target environment. This paper introduces a deep learning solution of two parts to the acoustic matching problem. First, we characterize the acoustic environments by mapping recordings into a low-dimensional embedding that is invariant to speech content and speaker identity. Next, a waveform-to-waveform neural network conditioned on this embedding learns to transform an input waveform to match the acoustic qualities encoded in the target embedding. Listening tests on both simulated and real environments show that the proposed approach improves on state-of-the-art baseline methods.

SAMPLES

NAIVE: No matching
NMD: IR retrieved from NMD and applied to clean signal
EQ-M: Source-differentiated equalization matching
E2E: End-to-end acoustic matching
NN: IR retrieved by NN in the pre-trianed embedding
NN-CO: IR retrieved by NN in the co-trianed embedding
REF: ground truth matching

Choose a speaker:

Choose an environment pair set:

Choose a source environment:

Choose a target environment: