Login | Register

Speech reverberation suppression for time-varying environments using weighted prediction error method with time-varying autoregressive model

Title:

Speech reverberation suppression for time-varying environments using weighted prediction error method with time-varying autoregressive model

Zhu, Wei-Ping ORCID: https://orcid.org/0000-0001-7955-7044, Parchami, Mahdi and Amindavar, Hamidreza (2019) Speech reverberation suppression for time-varying environments using weighted prediction error method with time-varying autoregressive model. Speech Communication, 109 . pp. 1-14. ISSN 01676393 (In Press)

[thumbnail of Zhu 2019.pdf]
Preview
Text (application/pdf)
Zhu 2019.pdf - Accepted Version
Available under License Spectrum Terms of Access.
1MB

Official URL: https://doi.org/10.1016/j.specom.2019.03.002

Abstract

In this paper, a novel approach for the task of speech reverberation suppression in non-stationary (changing) acoustic environments is proposed. The suggested approach is based on the popular weighted prediction error (WPE) method, yet, instead of considering fixed reverberation prediction weights, our method takes into account the more generic time-varying autoregressive (TV-AR) model which allows dynamic estimation and updating for the prediction weights over time. We use an initial estimate of the prediction weights in order to optimally select the TV-AR model order and also to calculate the TV-AR coefficients. Next, by properly interpolating the calculated coefficients, we obtain the ultimate estimate of reverberation prediction weights. Performance evaluation of the proposed approach is shown not only for fixed acoustic rooms but also for environments where the source and/or sensors are moving. Our experiments reveal further reverberation suppression as well as higher quality in the enhanced speech samples in comparison with recent literature within the context of speech dereverberation.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science
Item Type:Article
Refereed:Yes
Authors:Zhu, Wei-Ping and Parchami, Mahdi and Amindavar, Hamidreza
Journal or Publication:Speech Communication
Date:2019
Digital Object Identifier (DOI):10.1016/j.specom.2019.03.002
Keywords:Dereverberation; Speech enhancement; Time-varying autoregressive model; Weighted prediction error
ID Code:985076
Deposited By: ALINE SOREL
Deposited On:08 Apr 2019 15:40
Last Modified:10 Mar 2021 02:00

References:

Y.I. Abramovich, N.K. Spencer, M.D.E. Turley. Order estimation and discrimination between stationary and time-varying (TVAR) autoregressive models, IEEE Trans. Signal Process.,55 (6) (2007), pp. 2861-2876

H. Attias, J.C. Platt, A. Acero, L. Deng. Speech denoising and dereverberation using probabilistic models, Adv. Neural Inf. Process. Syst., 13 (2001), pp. 758-764

X. Bao, J. Zhu. An improved method for late-reverberant suppression based on statistical model, Speech Commun., 55 (9) (2013), pp. 932-940

Brookes, M., 2009. Voice. BOX: Speech Processing Toolbox for MATLAB. Available at http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox, last accessed on June 2018.

M. Clausen, A. Dress, J. Grabmeier, M. Karpinski. On zero-testing and interpolation of k-sparse multivariate polynomials over finite fields, Theor. Comput. Sci., 84 (2) (1991), pp. 151-164

CVX Research, I., 2012. CVX: Matlab Software for Disciplined Convex Programming, version 2.0. Available at http://cvxr.com/cvx, last accessed on May 2016.

Y.C. Eldar, G. Kutyniok. Compressed Sensing: Theory and Applications,Cambridge University Press (2012)

T.H. Falk, C. Zheng, W.Y. Chan. A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech, IEEE Trans. Audio, Speech, Lang. Process., 18 (7) (2010), pp. 1766-1774

K. Furuya, A. Kataoka. Robust speech dereverberation using multichannel blind deconvolution with spectral subtraction, IEEE Trans. Audio, Speech, Lang. Process., 15 (5) (2007), pp. 1579-1591

Garofolo et al., J.S. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1Philadelphia: Linguistic Data Consortium, 1993.

S. Gannot, D. Burshtein, E. Weinstein. Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Trans. Signal Process., 49 (8) (2001), pp. 1614-1626

E.A.P. Habets. Single- and Multi-Microphone Speech Dereverberation using Spectral Enhancement, Technische Universiteit Eindhoven, Netherlands (2007)

M.G. Hall, A.V. Oppenheim, A.S. Willsky. Time-varying parametric modeling of speech, Signal Process., 5 (3) (1983), pp. 267-285

T. Hsiao. Identification of time-varying autoregressive systems using maximum a posteriori estimation, IEEE Trans. Signal Process., 56 (8) (2008), pp. 3497-3509

Y. Hu, P.C. Loizou. Evaluation of objective quality measures for speech enhancement, IEEE Trans. Audio, Speech, Lang.Process., 16 (1) (2008), pp. 229-238

Y. Huang, J. Benesty, J. Chen. A blind channel identification-based two-stage approach to separation and dereverberation of speech signals in a reverberant environment, IEEE Trans. Speech Audio Process., 13 (5) (2005), pp. 882-895

S. Huffel, J. Vandewalle. The total least squares problem: computational aspects and analysis, Frontiers in Applied Mathematics, SIAM, Philadelphia (1991)

A. Jukić, S. Doclo. Speech dereverberation using weighted prediction error with Laplacian model of the desired signal, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 5172-5176 Florence, Italy

A. Jukić, T. van Waterschoot, S. Doclo. Adaptive speech dereverberation using constrained sparse multichannel linear prediction, IEEE Signal Process. Lett., 24 (1) (2017), pp. 101-105

A. Jukić, T. van Waterschoot, T. Gerkmann, S. Doclo. Multi-channel linear prediction-based speech dereverberation with sparse priors, IEEE/ACM Trans. Audio, Speech, Lang. Process., 23 (9) (2015), pp. 1509-1520

K. Kinoshita, M. Delcroix, T. Nakatani, M. Miyoshi. Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction, IEEE Trans. Audio, Speech, Lang. Process., 17 (4) (2009), pp. 534-545

K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, A. Sehr, W. Kellermann, R. Maas. The reverb challenge: a common evaluation framework for dereverberation and recognition of reverberant speech, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2013), pp. 1-4 New Paltz, NY, USA

I. Kodrasi, S. Doclo. Joint dereverberation and noise reduction based on acoustic multi-channel equalization, IEEE/ACM Trans. Audio, Speech, Lang.Process., 24 (4) (2016), pp. 680-693

Lehmann, E. A. Image-source method: matlab code implementationAvailable at http://www.eric-lehmann.com/, last accessed on June.

T. Nakatani, B.H. Juang, T. Yoshioka, K. Kinoshita, M. Delcroix, M. Miyoshi. Speech dereverberation based on maximum-likelihood estimation with time-varying Gaussian source model, IEEE Trans. Audio, Speech, Lang.Process., 16 (8) (2008), pp. 1512-1527

T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, B.H. Juang. Blind speech dereverberation with multi-channel linear prediction based on short time Fourier transform representation, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, USA (2008), pp. 85-88

T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, B.H. Juang. Speech dereverberation based on variance-normalized delayed linear prediction, IEEE Trans. Audio, Speech, Lang.Process., 18 (7) (2010), pp. 1717-1731

Naylor, P., Gaubitch, N. (Eds.), 2010. Speech Dereverberation. Springer-Verlag, London.

M. Parchami, W.P. Zhu, B. Champagne. Model-based estimation of late reverberant spectral variance using modified weighted prediction error method, Speech Commun., 92 (2017), pp. 100-113

M. Parchami, W.P. Zhu, B. Champagne. Speech dereverberation using weighted prediction error with correlated inter-frame speech components, Speech Commun., 87 (2017), pp. 49-57

J. Peng, J. Hampton, A. Doostan. A weighted ℓ1-minimization approach for sparse polynomial chaos expansions, J. Comput. Phys., 267 (2014), pp. 92-111
Recommendation P. 56 Recommendation P.56: Objective measurement of active speech level 1993, ITU-T.
Recommendation P. 862 Recommendation P.862: Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs 2001. ITU-T.

L. Rabiner, R. Schafer. Theory and Applications of Digital Speech Processing, Prentice Hall Press, Upper Saddle River, NJ, USA (2010)

H. Rauhut, R. Ward. Sparse Legendre expansions via ℓ1-minimization, J. Approx. Theory, 164 (5) (2012), pp. 517-533

D. Rudoy, T.F. Quatieri, P.J. Wolfe. Time-varying autoregressions in speech: detection theory and applications, IEEE Trans. Audio, Speech, Lang.Process., 19 (4) (2011), pp. 977-989

S. Sahinler. Bootstrap and Jackknife resampling algorithms for estimation of regression parameters, J. Appl. Quant. Methods (2007), pp. 188-199

Schwarz, B., Revdyn speech database. Speech and Acoustic Lab of the Faculty of Engineering at Bar Ilan University. Available at http://www.eng.biu.ac.il/schwarb/speech-databases/revdyn-database/, last accessed on June 2018.

SimData: dev and eval sets based on WSJCAM0, 2013. REVERB challenge. Available at http://reverb2014.dereverberation.com/download, last accessed on June 2018.

C. Sodsri. Time-Varying Autoregressive Modelling for Nonstationary Acoustic Signal and Its Frequency Analysis, Ph.D. thesis, The Pennsylvania State University (2003)

A. Varga, H.J.M. Steeneken. Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems, Speech Commun., 12 (3) (1993), pp. 247-251

J. Vermaak, C. Andrieu, A. Doucet, S.J. Godsill. Particle methods for bayesian modeling and enhancement of speech signals, IEEE Trans. Speech Audio Process., 10 (3) (2002), pp. 173-185

E. Warsitz, R. Haeb-Umbach. Blind acoustic beamforming based on generalized eigenvalue decomposition, IEEE Trans. Audio, Speech, Lang.Process., 15 (5) (2007), pp. 1529-1539

A. Wiesel, O. Bibi, A. Globerson. Time varying autoregressive moving average models for covariance estimation, IEEE Trans. Signal Process., 61 (11) (2013), pp. 2791-2801

J.M. Yang, H.G. Kang. Online speech dereverberation algorithm based on adaptive multichannel linear prediction, IEEE/ACM Trans. Audio, Speech, Lang. Process., 22 (3) (2014), pp. 608-619

T. Yoshioka, T. Nakatani, M. Miyoshi. Integrated speech enhancement method using noise suppression and dereverberation, IEEE Trans. Audio, Speech, Lang. Process., 17 (2) (2009), pp. 231-246

T. Yoshioka, A. Sehr, M. Delcroix, K. Kinoshita, R. Maas, T. Nakatani, W. Kellermann. Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition, IEEE Signal Process. Mag., 29 (6) (2012), pp. 114-126
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top