A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications

Auditory models are commonly used as feature extractors for automatic speech-recognition systems or as front-ends for robotics, machine-hearing and hearing-aid applications. Although auditory models can capture the biophysical and nonlinear properties of human hearing in great detail, these biophysi...

Full description

Saved in:

Bibliographic Details
Published in	Nature machine intelligence Vol. 3; no. 2; pp. 134 - 143
Main Authors	Baby, Deepak, Van Den Broucke, Arthur, Verhulst, Sarah
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 01.02.2021 Nature Publishing Group
Subjects	631/114 639/166/985 639/705/1042 Acoustics Artificial neural networks Background noise Engineering Feature extraction Hearing aids Human performance Intelligibility Mechanical properties Mechanics Mechanics (physics) Neural networks Ordinary differential equations Performance evaluation Real time Robotics Selectivity Signal processing Simulation Sound intensity Speech Speech recognition Tuning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Auditory models are commonly used as feature extractors for automatic speech-recognition systems or as front-ends for robotics, machine-hearing and hearing-aid applications. Although auditory models can capture the biophysical and nonlinear properties of human hearing in great detail, these biophysical models are computationally expensive and cannot be used in real-time applications. We present a hybrid approach where convolutional neural networks are combined with computational neuroscience to yield a real-time end-to-end model for human cochlear mechanics, including level-dependent filter tuning (CoNNear). The CoNNear model was trained on acoustic speech material and its performance and applicability were evaluated using (unseen) sound stimuli commonly employed in cochlear mechanics research. The CoNNear model accurately simulates human cochlear frequency selectivity and its dependence on sound intensity, an essential quality for robust speech intelligibility at negative speech-to-background-noise ratios. The CoNNear architecture is based on parallel and differentiable computations and has the power to achieve real-time human performance. These unique CoNNear features will enable the next generation of human-like machine-hearing applications. Extensive work has gone into developing peripheral auditory models that capture the nonlinear processing of the ear. But the resulting models are prohibitively slow to use at scale for most machine hearing systems. The authors present a convolutional neural network model that replicates hallmark features of cochlear signal processing, potentially enabling real-time applications.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2522-5839 2522-5839
DOI:	10.1038/s42256-020-00286-8