DeLiang Wangreceived theB.S. degree and the M.S. degree from Peking (Beijing) University and the Ph.D.degree in 1991 from the University of Southern California all in computerscience. Since 1991, he has been with the Department of Computer Science &Engineering and the Center for Cognitive and Brain Sciences at The Ohio StateUniversity, where he is a Professor and University Distinguished Scholar. Healso holds a visiting appointment at the Center of Intelligent Acoustics andImmersive Communications, Northwestern Polytechnical University. He receivedthe Office of Naval Research Young Investigator Award in 1996, the 2005Outstanding Paper Award from IEEETransactions on Neural Networks, and the 2008 Helmholtz Award from theInternational Neural Network Society. He is an IEEE Fellow, and currentlyserves as Co-Editor-in-Chief ofNeuralNetworks.
Speech separation, or the cocktailparty problem, has evaded a solution for decades in speech and audioprocessing. Motivated by auditory perception, I have been advocating a newformulation to this old challenge that estimates an ideal time-frequency mask(binary or ratio). This new formulation has an important implication that thespeech separation problem is open to modern machine learning techniques, anddeep neural networks (DNNs) are particularly well-suited for this task due totheir representational capacity. I will describe recent algorithms that employDNNs for supervised speech separation, including speech enhancement and speakerseparation. DNN-based mask estimation elevates speech separation performance tonew levels, and produces the first demonstration of substantial speechintelligibility improvements for both hearing-impaired and normal-hearinglisteners in background interference. These advances represent major progresstowards solving the cocktail party problem.