Dr. Jehanzeb Mirza

MIT CSAIL

Research Expertise

Computer Vision

Machine Learning

Deep Learning

Multi-Modal Learning

About

Hi, I am Jehanzeb Mirza. I am a Postdoctoral Researcher at [MIT CSAIL](https://www.csail.mit.edu/), in the Spoken Language Systems Group, led by Dr. [James Glass](https://www.csail.mit.edu/person/jim-glass). I received my Ph.D. in Computer Science (Computer Vision) from [TU Graz, Austria](https://www.tugraz.at/home), where I was advised by Professor [Horst Bischof](https://scholar.google.com/citations?user=_pq05Q4AAAAJ&hl=en), and Professor [Serge Belongie](https://sergebelongie.github.io/) served as an external referee. I am particularly interested in self-supervised learning for uni-modal models and multi-modal learning for vision-language models, with a focus on improving fine-grained understanding.

Legacy Map

Full View

Publications

The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2022

An Efficient Domain-Incremental Learning Approach to Drive in All Weather Conditions

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

2022

Robustness of Object Detectors in Degrading Weather Conditions

2021 IEEE International Intelligent Transportation Systems Conference (ITSC)

2021

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

2023 IEEE/CVF International Conference on Computer Vision (ICCV)

2023

Video Test-Time Adaptation for Action Recognition

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2023

ActMAD: Activation Matching to Align Distributions for Test-Time-Training

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2023

Towards Multimodal In-context Learning for Vision and Language Models

Lecture Notes in Computer Science

2025

MATE: Masked Autoencoders are Online 3D Test-Time Learners

2023 IEEE/CVF International Conference on Computer Vision (ICCV)

2023

Meta-prompting for Automating Zero-Shot Visual Recognition with LLMs

Lecture Notes in Computer Science

2024

Comment on the Paper Titled ’The Origin of Quantum Mechanical Statistics: Insights from Research on Human Language’ (arXiv preprint arXiv:2407.14924, 2024)

Unknown Venue

2024

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

Advances in Neural Information Processing Systems 37

2024

Can Biases in ImageNet Models Explain Generalization?

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2024

Comparison Visual Instruction Tuning

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

2025

Preprint site arXiv is banning computer-science reviews: here’s why

Nature

2025

TTT-KD: Test-Time Training for 3D Semantic Segmentation Through Knowledge Distillation From Foundation Models

2025 International Conference on 3D Vision (3DV)

2025

Exploring Modality Guidance to Enhance VFM-Based Feature Fusion for UDA in 3D Semantic Segmentation

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

2025

ARXİV SƏNƏDLƏRİNİN MÜHAFİZƏXANALARDA MÜHAFİZƏ QAYDALARI

ADMİU Elmi Əsərlər

2025

Test-Time Adversarial Detection and Robustness for Localizing Humans Using Ultra Wide Band Channel Impulse Responses

2023 31st European Signal Processing Conference (EUSIPCO)

2023

Sit Back and Relax: Learning to Drive Incrementally in All Weather Conditions

2023 IEEE Intelligent Vehicles Symposium (IV)

2023

Influence Prediction in Collaboration Networks: An Empirical Study on arXiv

Unknown Venue

2025

Evaluation of Spatio-Temporal Small Object Detection in Real-World Adverse Weather Conditions

2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)

2025

DRT: Detection Refinement for Multiple Object Tracking

Proceedings of the British Machine Vision Conference 2021

2021

Detector-Free Weakly Supervised Grounding by Separation

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

2021

Semi-Supervised Audio-Visual Action Recognition with Audio Source Localization Guided Mixup

2025 IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP)

2025

Affine calibration from moving objects

Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001

Shape-Biased Texture Agnostic Representations for Improved Textureless and Metallic Object Detection and 6D Pose Estimation

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

2025

Online Continual Learning of Diffusion Models: Multi-Mode Adaptive Generative Distillation

2025 IEEE International Conference on Image Processing (ICIP)

2025

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting

Advances in Neural Information Processing Systems 37

2024

Education

TU Graz, Austria

Ph.D. in Computer Vision, Computer Vision / 2024

KIT, Germany

MS in ETIT / 2020

NUST, Pakistan

BS in EE / 2017

Graz University of Technology

Ph.D. in Computer Science / 2024

Karlsruhe Institute of Technology

M.Sc. in Electrical Engineering and Information Technology / 2020

National University of Science and Technology

B.Sc. in Electrical Engineering / 2017

Experience

Massachusetts Institute of Technology (MIT)

Postdoctoral Researcher / November, 2024 — December

Leading research on multimodal learning combining speech vision and language for scalable AI systems. Designing and evaluating methods to improve fine-grained reasoning in large language and vision-language models.

Graz University of Technology

Computer Vision Project Assistant / January, 2021 — October, 2024

Developed self-supervised and unsupervised learning techniques to improve neural network robustness to distribution shifts at test time. Conducted extensive research on LLMs and multimodal VLMs resulting in multiple publications at NeurIPS ICCV and CVPR.

Sony AI

Research Scientist Intern / May, 2024 — August, 2024

Designed multimodal learning methods integrating vision audio and language signals. Prototyped and evaluated models for cross-modal understanding in real-world scenarios.

Intel Labs

Master Thesis Researcher / January, 2020 — July, 2020

Evaluated robustness of state-of-the-art 2D and 3D object detectors for autonomous driving under adverse weather.

C++ Developer Intern / October, 2019 — December, 2019

Implemented state estimation using Unscented Kalman Filter in C++ and OpenCV for real-time object tracking.

Intel

Platform Application Engineer Intern / March, 2019 — August, 2019

Built an automation framework including PCB design and microcontroller integration to streamline internal workflows.

Links & Social Media

Research Web Site

Join Jehanzeb on NotedSource!

Join Now

At NotedSource, we believe that professors, post-docs, scientists and other researchers have deep, untapped knowledge and expertise that can be leveraged to drive innovation within companies. NotedSource is committed to bridging the gap between academia and industry by providing a platform for collaboration with industry and networking with other researchers.

For industry, NotedSource identifies the right academic experts in 24 hours to help organizations build and grow. With a platform of thousands of knowledgeable PhDs, scientists, and industry experts, NotedSource makes connecting and collaborating easy.

For academic researchers such as professors, post-docs, and Ph.D.s, NotedSource provides tools to discover and connect to your colleagues with messaging and news feeds, in addition to the opportunity to be paid for your collaboration with vetted partners.