The Speech Enhancement through Elimination of Impulsive Disturbance Using Log MMSE Filtering

Sonali N. Malshikare, V. M. Sardar

Abstract


The purpose of speech is communication, i.e., the transmission of messages. A message represented as a sequence of discrete symbols can be quantified by its information content in bits, and the rate of transmission of information is measured in bits/second (bps). In speech production, as well as in many humanengineered electronic communication systems, the information to be transmitted is encoded in the form of a continuously varying (analog) waveform that can be transmitted, recorded, manipulated, and ultimately decoded by a human listener. In the case of speech, the fundamental analog form of the message is an acoustic waveform, which we call the speech signal. Speech signals can be converted to an electrical waveform by a microphone, further manipulated by both analog and digital signal processing, and then converted back to acoustic form by a loudspeaker, a telephone handset or headphone, as desired. Signals are usually corrupted by noise in the real world. To reduce the influence of noise, two research topics are the speech enhancement and speech recognition in noisy environments have arose. It provided that better results in terms of performance parameters, processing time and speech signal quality rather than prior methods.


Full Text:

PDF

References


P. Vary and R. Martin, Digital Speech Transmission: Enhancement, Coding and Error Concealment. Chichester, U.K.: Wiley, 2006.

P. C. Loizou, Speech Enhancement—Theory and Practice. Boca Raton, FL, USA: CRC, Taylor and Francis, 2007.

X. Xiao and R. M. Nickel, “Speech enhancement with inventory style speech resynthesis,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp.1243–1257, Aug. 2010.

J. Ming, R. Srinivasan, and D. Crookes, “A corpus-based approach to speech enhancement from nonstationary noise,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 822–836, May 2011.

J. Ming, R. Srinivasan, and D. Crookes, “A corpus-based approach to speech enhancement from nonstationary noise,” in Proc. INTERSPEECH, Makuhari, Japan, Sep. 2010, pp. 1097–1100.

R. M. Nickel and R. Martin, “Memory and complexity reduction for inventory-style speech enhancement systems,” in Proc. EUSIPCO, Barcelona, Spain, Sep. 2011, pp. 196–200.

D. Kolossa, A. Klimas, and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” in Proc. Workshop Applicat. Signal Process. Audio Acoust. (WASPAA), Oct. 2005, pp. 82–85.

C. Breithaupt, T. Gerkmann, and R. Martin, “Cepstral smoothing of spectral filter gains for speech enhancement without musical noise,” IEEE Signal Process. Lett., vol. 14, no. 12, pp. 1036–1039, Dec. 2007.

R. Nickel, R. F. Astudillo, D. Kolossa, S. Zeiler, and R. Martin, “Inventorystyle speech enhancement with uncertainty-of-observation techniques,” in Proc. ICASSP, Kyoto, Japan, Mar. 2012, pp. 4645–4648.

I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466–475, Sep. 2003.

S. V. Vaseghi, Advanced Signal Processing and Digital Noise Reduction.New York, NY, USA: Wiley, 1996.

J. S. Godsill and J. P. W. Rayner, Digital Audio Restoration. Berlin,Germany: Springer-Verlag, 1998.

J. Lim and A. Oppenheim, “All-pole modeling of degraded speech,”IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-26, no. 3,pp. 197–210, Jun.1978.

D. M. Y. Ephraim, “Speech enhancement using a minimum mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp.1109–1121, Jun. 1984.

S. Böll, “Suppression of acoustic noise in speech using spectral subtraction,”IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27,no. 2, pp. 113–120, Apr. 1979.

R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft decision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 2, pp. 137–145, Apr. 1980.

P. Vary, “Noise suppression by spectral amplitude estimation— Mechanismand theoretical limits,” Signal Process., vol. 8, pp. 387–400, 1985.

D. F.Rosenthal andH. G. Okuno, Computational Auditory Scene Analysis.Mahwah, NJ, USA: Lawrence Erlbaum, 1998.

D. L. Wang and G. J. Brown, “Separation of speech from interferingnsounds based on oscillatory correlations,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 684–697, May 1999.

G. Hu and D. L. Wang, “Monaural speech segregation based on pitch tracking and amplitude modulation,” IEEE Trans. Neural Netw., vol.15, no. 5, pp. 1135–1150, Sep. 2004.

D. L. Wang and G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms and Applications. NewYork, NY,USA:Wiley, 2006.

R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Trans. Speech Audio Process., vol. 9, no. 5, pp. 504–512, Jul. 2001.

D. E. Tsoukalas, J. N. Mourjopoulos, and G. Kokkinakis, “Speech enhancement based on audible noise suppression,” IEEE Trans. Speech.


Refbacks

  • There are currently no refbacks.


Copyright © IJETT, International Journal on Emerging Trends in Technology