Wang, Miao (2020) Speech Enhancement using Fiber Acoustic Sensor. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
3MBMiao_MASc_S2020.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
With the development of IoT (Internet of Things) services and devices, the voice command becomes a more and more important tool for human computer interaction. However, the audio signal recorded by the conventional omni-directional microphone is easy to be corrupted by the environmental noise like interference speech. Although the conventional beamforming techniques are able to point the main lobe of beam pattern at the desired speaker, it requires several omni microphones to form a microphone array, which will occupy large space on an IoT device. Many researchers are devoting their efforts to inventing a microphone of small size that can create directional beam pattern. Recently, researchers get inspirations from the spider’s way to sense the acoustic wave. They invented a new small-size acoustic sensor made of spider silks. This acoustic sensor has a frequency-independent dipole beam pattern for wideband audio signal. Utilizing this fiber acoustic sensor, two compact microphone arrays and corresponding speech enhancement systems can be constructed. The first microphone array consists of one omni-microphone collocated with one fiber acoustic sensor. And the second one consists of two collocated fiber acoustic sensors with orthogonal dipole beam patterns.
By using the first microphone array, a first-order adaptive beamformer is designed in this thesis to reduce speech interference effects and separate speeches. In this design, an adaptive first-order beam pattern is formed by means of normalized least mean square method. Considering a scenario where the desired speech and interference speech are present at the same time, this adaptive beamformer is able to point the null angle of beam pattern at the undesired speaker to achieve speech interference reduction. In order to verify this idea, numerical simulations are conducted in an ideal condition (clean speech without reverberation) and real scenario (clean speech corrupted by white noise and reverberation). The results show that this design is able to improve speech quality significantly in ideal case. Under the condition suffering from white noise and reverberation, the improvement is achieved as well but at a much smaller scale.
By using the second collocated microphone array, a speech enhancement system is proposed to make the collocated fiber acoustic sensors be able to capture speech from any directions. This system includes three main parts. The first part conducts DOA (direction of arrival) estimation empowered by a machine learning method. Here the inter-channel acoustic intensity difference is employed to compute raw DOA estimates with the presence of white noise and reverberation. After obtaining the raw DOA estimates, the machine learning method (wrapped Gaussian mixture model) is used to give a more accurate DOA estimation. This proposed method is robust to both white noise and reverberation with a low computational complexity and solves the phase ambiguity problem (0 and π are identical). In the second part, by using the orthogonality of the dipoles of the two collocated fiber acoustic sensors (one is sinθ and the other is cosθ), along with the DOA (θ) estimated by the wrapped Gaussian mixture model, a steerable dipole beam pattern is generated to point the main lobe at the speaker. In the third part, a noise reduction procedure is applied to the output signal of the steerable beamformer. The proposed method is based on a time-frequency mask, which is used to filter out time-frequency bins of white noise and keep those of speech signal. In order to verify the effectiveness of the designed system, numerical simulations are conducted in the existence of both white noise and reverberation. The result shows that the proposed DOA estimation method is robust to both white noise and reverberation. It implies that this type of microphone array is able to obtain precise speaker spatial information. Meanwhile, the audio quality of the output signal of this system is improved by at least 50%.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Wang, Miao |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Electrical and Computer Engineering |
Date: | 13 April 2020 |
Thesis Supervisor(s): | Zhu, Wei-Ping |
ID Code: | 986722 |
Deposited By: | Miao Wang |
Deposited On: | 30 Jun 2021 15:02 |
Last Modified: | 01 Mar 2022 01:00 |
Repository Staff Only: item control page