Ravi Shankar – “Non-Parallel Emotion Conversion in Speech via Variational Cycle-GAN”

/ May 18, 2020/

When:
May 26, 2020 @ 12:00 pm – 12:30 pm
2020-05-26T12:00:00-04:00
2020-05-26T12:30:00-04:00
Contact:
Jeremias Sulam

Talk 1: Non-Parallel Emotion Conversion in Speech via Variational Cycle-GAN, by Ravi Shankar (ECE, JHU)

Abstract – The quality of speech synthesis has witnessed a tremendous improvement in the recent past, owing mostly to the ability of training deep neural networks. The availability of large amount of transcribed data coupled with efficient representation of the linguistic features have played a key role in this regard. While the current state-of-the-art models can generate emotionally neutral speech with ease, injecting an emotional style still remains an open challenge. Further, speech synthesis is an autoregressive task which makes it slow and computationally cumbersome. In this work, we will look at how converting the emotion in speech directly can provide us a better alternative when the resources are limited. Specifically, we propose an unsupervised framework which converts the underlying emotion of a speech utterance by exploiting relationship between the feature representations. We further propose a new variant of the cycle-GAN that entagles the generators globally by minimizing KL divergence between the input and output distributions. We demonstrate that our method generalizes to unseen speakers as well.

Bio – Ravi is an third year PhD student in the Department of Electrical and Computer Engineering under the supervision of Dr. Archana Venkataraman, and he is a MINDS Data Science Fellow. His works focuses on the problem of emotion conversion in speech. His research interests include speech processing, signal processing and unsupervised machine learning.

Share this Post