Kristen Grauman

/ January 22, 2021/

When:
April 13, 2021 @ 12:00 pm – 1:00 pm
2021-04-13T12:00:00-04:00
2021-04-13T13:00:00-04:00

Title: Sights, sounds, and space: Audio-visual learning in 3D environments

Abstract: Moving around in the world is naturally a multisensory experience, but today’s embodied agents are deaf—restricted to solely their visual perception of the environment.We explore audio-visual learning in complex, acoustically and visually realistic 3D environments. By both seeing and hearing, the agent must learn to navigate to a sounding object, use echolocation to anticipate its 3D surroundings, and discover the link between its visual inputs and spatial sound.
To support this goal, we introduce SoundSpaces: a platform for audio rendering based on geometrical acoustic simulations for two sets of publicly available 3D environments (Matterport3D and Replica).SoundSpacesmakes it possible to insert arbitrary sound sources in an array of real-world scanned environments.Building on this platform, we pursue a series of audio-visual spatial learning tasks.Specifically, in audio-visual navigation, the agent is tasked with traveling to a sounding target in an unfamiliar environment (e.g., go to the ringing phone).In audio-visual floorplan reconstruction, a short video with audio is converted into a house-wide map, where audio allows the system to “see” behind the camera and behind walls.For self-supervised feature learning, we explore how echoes observed in training can enrich an RGB encoder for downstream spatial tasks including monocular depth estimation.Our results suggest how audio can benefit visual understanding of 3D spaces, and our work lays groundwork for new research in embodied AI with audio-visual perception.

Bio: Kristen Graumanis a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist in Facebook AI Research (FAIR).Her research in computer vision and machine learning focuses on visual recognition, video, and embodied perception.Before joining UT-Austin in 2007, she received her Ph.D. at MIT.She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, and recipient of the 2013 Computers and Thought Award.She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award).She currently serves as an Associate Editor-in-Chief for PAMI and previously served as a Program Chair of CVPR 2015 and NeurIPS2018.

Share this Post