Saksham Singh Kushwaha | The University of Texas at Dallas

About Me

I am second-year CS Ph.D. Candidate at the University of Texas at Dallas under Dr. Yapeng Tian. Previously, I did my MS in CS at NYU Courant where I was fortunate to work with Prof. Juan Pablo Bello, Dr. Magdalena Fuentes, Dr. Iran Roman at Music and Audio Research Lab. My research interests lie in multimodal deep learning and machine listening.

Prior to joining NYU, I worked as a machine learning engineer II at Zomato and as a senior data scientist at Sharechat. While in industry I applied ML to solve problems in user recommendation and personalization. I received my B.Tech degree in Mathematics and Computing from IIT Delhi.

In my spare time, I like to play guitar and tennis.

I am always looking for motivated MS / undergraduate students to supervise on deep learning projects. If you have strong coding skills and are interested in research, please reach out via mail.

News

[May 2025] Started internship at Adobe Research.
[Feb 2025] One paper about holistic audio generation accepted at CVPR-25.
[Dec 2024] One paper about spatial audio generation accepted at ICASSP-25.
[Oct 2024] One paper about multimodal dataset distillation accepted at TMLR-24.
[May 2024] One paper accepted at Sight and Sound Workshop @ CVPR-24.
[Mar. 2024] Serving as a Reviewer for ACMMM-24 and ELVM Workshop @ CVPR-24.
[Sept. 2023] Serving as a Reviewer for ICASSP-24.
[July 2023] Serving as a Reviewer for AAAI-24.
[July 2023] One paper about sound source distance estimation is accepted at WASPAA 23.
[May 2023] One paper about multimodal sound recognition is accepted at INTERSPEECH 23.

[May 2023] Joining as a PhD student at UT Dallas.
[May 2023] Completed MS in CS from NYU Courant.
[May 2023] Received Exceptional Contribution award as guitarist for NYU pop/rock ensemble.
[May 2023] NYU pop/rock ensemble won the 2023 Downbeat award for outstanding performance.
[Mar. 2023] Serving as a reviewer for Machine Learning for Signal Processing (MLSP) 2023.
[Nov. 2022] Our paper about sound localization is accepted to DCASE 2022 Workshop.

Publications

CVPR

VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation

Saksham Singh Kushwaha, Yapeng Tian

CVPR, 2025.

PDF Code Project Page Demo Poster

ICASSP

Diff-SAGe: End-to-end spatial audio generation using diffusion models

Saksham Singh Kushwaha, Jianbo Ma, Mark Thomas, Yapeng Tian, Avery Bruni

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.

PDF Project Page Demo Slides Poster

TMLR

Audio-Visual Dataset Distillation

Saksham Singh Kushwaha, Siva Vasireddy, Kai Wang, Yapeng Tian

Transactions on Machine Learning Research (TMLR), 2024.

PDF Code Video

CVPRW

Dataset distillation for audio-visual datasets

Saksham Singh Kushwaha, Siva Vasireddy, Kai Wang, Yapeng Tian

CVPR Sight and Sound Workshop (WSS), 2024.

PDF Slides

WASPAA

Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions

Saksham Singh Kushwaha, Iran Roman, Magdalena Fuentes, Juan Pablo Bello

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023.

PDF Code Poster Slides Video

INTERSPEECH

A multimodal prototypical approach for unsupervised sound classification

Saksham Singh Kushwaha, Magdalena Fuentes

INTERSPEECH 2023.

PDF Code Poster

DCASE

Analyzing the effect of equal-angle spatial discretization on sound event localization and detection

Saksham Singh Kushwaha, Iran Roman, Juan Pablo Bello

Detection and Classification of Acoustic Scenes and Events(DCASE), 2022.

PDF Code Poster