Dr. Jehanzeb Mirza

MIT CSAIL

Research Expertise

Computer Vision
Machine Learning
Deep Learning
Multi-Modal Learning

About

Hi, I am Jehanzeb Mirza. I am a Postdoctoral Researcher at [MIT CSAIL](https://www.csail.mit.edu/), in the Spoken Language Systems Group, led by Dr. [James Glass](https://www.csail.mit.edu/person/jim-glass). I received my Ph.D. in Computer Science (Computer Vision) from [TU Graz, Austria](https://www.tugraz.at/home), where I was advised by Professor [Horst Bischof](https://scholar.google.com/citations?user=_pq05Q4AAAAJ&hl=en), and Professor [Serge Belongie](https://sergebelongie.github.io/) served as an external referee. I am particularly interested in self-supervised learning for uni-modal models and multi-modal learning for vision-language models, with a focus on improving fine-grained understanding.

Publications

The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) / Jun 01, 2022

Mirza, M. J., Micorek, J., Possegger, H., & Bischof, H. (2022). The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14745–14755. https://doi.org/10.1109/cvpr52688.2022.01435

An Efficient Domain-Incremental Learning Approach to Drive in All Weather Conditions

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) / Jun 01, 2022

Jehanzeb Mirza, M., Masana, M., Possegger, H., & Bischof, H. (2022). An Efficient Domain-Incremental Learning Approach to Drive in All Weather Conditions. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3000–3010. https://doi.org/10.1109/cvprw56347.2022.00339

Robustness of Object Detectors in Degrading Weather Conditions

2021 IEEE International Intelligent Transportation Systems Conference (ITSC) / Sep 19, 2021

Mirza, M. J., Buerkle, C., Jarquin, J., Opitz, M., Oboril, F., Scholl, K.-U., & Bischof, H. (2021). Robustness of Object Detectors in Degrading Weather Conditions. 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2719–2724. https://doi.org/10.1109/itsc48978.2021.9564505

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

2023 IEEE/CVF International Conference on Computer Vision (ICCV) / Oct 01, 2023

Lin, W., Karlinsky, L., Shvetsova, N., Possegger, H., Kozinski, M., Panda, R., Feris, R., Kuehne, H., & Bischof, H. (2023). MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2839–2850. https://doi.org/10.1109/iccv51070.2023.00267

Video Test-Time Adaptation for Action Recognition

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) / Jun 01, 2023

Lin, W., Mirza, M. J., Kozinski, M., Possegger, H., Kuehne, H., & Bischof, H. (2023). Video Test-Time Adaptation for Action Recognition. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22952–22961. https://doi.org/10.1109/cvpr52729.2023.02198

ActMAD: Activation Matching to Align Distributions for Test-Time-Training

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) / Jun 01, 2023

Mirza, M. J., Soneira, P. J., Lin, W., Kozinski, M., Possegger, H., & Bischof, H. (2023). ActMAD: Activation Matching to Align Distributions for Test-Time-Training. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 24152–24161. https://doi.org/10.1109/cvpr52729.2023.02313

Towards Multimodal In-context Learning for Vision and Language Models

Lecture Notes in Computer Science / Jan 01, 2025

Doveh, S., Perek, S., Mirza, M. J., Lin, W., Alfassy, A., Arbelle, A., Ullman, S., & Karlinsky, L. (2025). Towards Multimodal In-context Learning for Vision and Language Models. In Computer Vision – ECCV 2024 Workshops (pp. 250–267). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-93806-1_19

MATE: Masked Autoencoders are Online 3D Test-Time Learners

2023 IEEE/CVF International Conference on Computer Vision (ICCV) / Oct 01, 2023

Mirza, M. J., Shin, I., Lin, W., Schriebl, A., Sun, K., Choe, J., Kozinski, M., Possegger, H., Kweon, I. S., Yoon, K.-J., & Bischof, H. (2023). MATE: Masked Autoencoders are Online 3D Test-Time Learners. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 16663–16672. https://doi.org/10.1109/iccv51070.2023.01532

Meta-prompting for Automating Zero-Shot Visual Recognition with LLMs

Lecture Notes in Computer Science / Oct 20, 2024

Mirza, M. J., Karlinsky, L., Lin, W., Doveh, S., Micorek, J., Kozinski, M., Kuehne, H., & Possegger, H. (2024). Meta-prompting for Automating Zero-Shot Visual Recognition with LLMs. In Computer Vision – ECCV 2024 (pp. 370–387). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-72627-9_21

Comment on the Paper Titled ’The Origin of Quantum Mechanical Statistics: Insights from Research on Human Language’ (arXiv preprint arXiv:2407.14924, 2024)

Dec 02, 2024

Sienicki, K. (2024). Comment on the Paper Titled ’The Origin of Quantum Mechanical Statistics: Insights from Research on Human Language’ (arXiv preprint arXiv:2407.14924, 2024). https://doi.org/10.20944/preprints202411.2377.v1

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

Advances in Neural Information Processing Systems 37 / Jan 01, 2024

Arbelle, A., Butoi, V., Darrell, T., Doveh, S., Feris, R., Gan, C., Hansen, J., Herzig, R., Huang, I., Karlinsky, L., Kuehne, H., Lin, W., Mirza, M., & Oliva, A. (2024). ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs. Advances in Neural Information Processing Systems 37, 22927–22946. https://doi.org/10.52202/079017-0721

Can Biases in ImageNet Models Explain Generalization?

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) / Jun 16, 2024

Gavrikov, P., & Keuper, J. (2024). Can Biases in ImageNet Models Explain Generalization? 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22184–22194. https://doi.org/10.1109/cvpr52733.2024.02094

Comparison Visual Instruction Tuning

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) / Jun 11, 2025

Lin, W., Mirza, M. J., Doveh, S., Feris, R., Giryes, R., Hochreiter, S., & Karlinsky, L. (2025). Comparison Visual Instruction Tuning. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2964–2974. https://doi.org/10.1109/cvprw67362.2025.00280

Preprint site arXiv is banning computer-science reviews: here’s why

Nature / Nov 07, 2025

Castelvecchi, D. (2025). Preprint site arXiv is banning computer-science reviews: here’s why. Nature. https://doi.org/10.1038/d41586-025-03664-7

TTT-KD: Test-Time Training for 3D Semantic Segmentation Through Knowledge Distillation From Foundation Models

2025 International Conference on 3D Vision (3DV) / Mar 25, 2025

Weijler, L., Mirza, M. J., Sick, L., Ekkazan, C., & Hermosilla, P. (2025). TTT-KD: Test-Time Training for 3D Semantic Segmentation Through Knowledge Distillation From Foundation Models. 2025 International Conference on 3D Vision (3DV), 1264–1274. https://doi.org/10.1109/3dv66043.2025.00120

Exploring Modality Guidance to Enhance VFM-Based Feature Fusion for UDA in 3D Semantic Segmentation

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) / Jun 11, 2025

Spoecklberger, J., Lin, W., Hermosilla, P., Doveh, S., Possegger, H., & Mirza, M. J. (2025). Exploring Modality Guidance to Enhance VFM-Based Feature Fusion for UDA in 3D Semantic Segmentation. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 4789–4798. https://doi.org/10.1109/cvprw67362.2025.00465

ARXİV SƏNƏDLƏRİNİN MÜHAFİZƏXANALARDA MÜHAFİZƏ QAYDALARI

ADMİU Elmi Əsərlər / Jan 01, 2025

ARXİV SƏNƏDLƏRİNİN MÜHAFİZƏXANALARDA MÜHAFİZƏ QAYDALARI. (2025). ADMİU Elmi Əsərlər. https://doi.org/10.52094/2221-7584.2025.37.31

Test-Time Adversarial Detection and Robustness for Localizing Humans Using Ultra Wide Band Channel Impulse Responses

2023 31st European Signal Processing Conference (EUSIPCO) / Sep 04, 2023

Kolli, A., Mirza, M. J., Possegger, H., & Bischof, H. (2023). Test-Time Adversarial Detection and Robustness for Localizing Humans Using Ultra Wide Band Channel Impulse Responses. 2023 31st European Signal Processing Conference (EUSIPCO), 1365–1369. https://doi.org/10.23919/eusipco58844.2023.10290092

Sit Back and Relax: Learning to Drive Incrementally in All Weather Conditions

2023 IEEE Intelligent Vehicles Symposium (IV) / Jun 04, 2023

Leitner, S., Mirza, M. J., Lin, W., Micorek, J., Masana, M., Kozinski, M., Possegger, H., & Bischof, H. (2023). Sit Back and Relax: Learning to Drive Incrementally in All Weather Conditions. 2023 IEEE Intelligent Vehicles Symposium (IV), 1–8. https://doi.org/10.1109/iv55152.2023.10186818

Influence Prediction in Collaboration Networks: An Empirical Study on arXiv

Sep 17, 2025

Lin, M., Schaposnik, L. P., & Wu, R. (2025). Influence Prediction in Collaboration Networks: An Empirical Study on arXiv. https://doi.org/10.21203/rs.3.rs-7401473/v1

Evaluation of Spatio-Temporal Small Object Detection in Real-World Adverse Weather Conditions

2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) / Feb 28, 2025

Van Lier, M., Van Leeuwen, M., Van Manen, B., Kampmeijer, L., & Boehrer, N. (2025). Evaluation of Spatio-Temporal Small Object Detection in Real-World Adverse Weather Conditions. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 786–797. https://doi.org/10.1109/wacvw65960.2025.00094

DRT: Detection Refinement for Multiple Object Tracking

Proceedings of the British Machine Vision Conference 2021 / Jan 01, 2021

Wang, B., Fruhwirth-Reisinger, C., Possegger, H., Bischof, H., & Cao, G. (2021). DRT: Detection Refinement for Multiple Object Tracking. Proceedings of the British Machine Vision Conference 2021. https://doi.org/10.5244/c.35.43

Detector-Free Weakly Supervised Grounding by Separation

2021 IEEE/CVF International Conference on Computer Vision (ICCV) / Oct 01, 2021

Arbelle, A., Doveh, S., Alfassy, A., Shtok, J., Lev, G., Schwartz, E., Kuehne, H., Levi, H. B., Sattigeri, P., Panda, R., Chen, C.-F., Bronstein, A., Saenko, K., Ullman, S., Giryes, R., Feris, R., & Karlinsky, L. (2021). Detector-Free Weakly Supervised Grounding by Separation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 1781–1792. https://doi.org/10.1109/iccv48922.2021.00182

Semi-Supervised Audio-Visual Action Recognition with Audio Source Localization Guided Mixup

2025 IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP) / Aug 31, 2025

Kang, S., & Kim, T. (2025). Semi-Supervised Audio-Visual Action Recognition with Audio Source Localization Guided Mixup. 2025 IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. https://doi.org/10.1109/mlsp62443.2025.11204238

Affine calibration from moving objects

Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001

Manning, R. A., & Dyer, C. R. (n.d.). Affine calibration from moving objects. Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 1, 494–500. https://doi.org/10.1109/iccv.2001.937557

Shape-Biased Texture Agnostic Representations for Improved Textureless and Metallic Object Detection and 6D Pose Estimation

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) / Feb 26, 2025

Hönig, P., Thalhammer, S., Weibel, J.-B., Hirschmanner, M., & Vincze, M. (2025). Shape-Biased Texture Agnostic Representations for Improved Textureless and Metallic Object Detection and 6D Pose Estimation. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 8806–8815. https://doi.org/10.1109/wacv61041.2025.00853

Online Continual Learning of Diffusion Models: Multi-Mode Adaptive Generative Distillation

2025 IEEE International Conference on Image Processing (ICIP) / Sep 14, 2025

Yang, R., Grard, M., Dellandrea, E., & Chen, L. (2025). Online Continual Learning of Diffusion Models: Multi-Mode Adaptive Generative Distillation. 2025 IEEE International Conference on Image Processing (ICIP), 1001–1006. https://doi.org/10.1109/icip55913.2025.11084576

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting

Advances in Neural Information Processing Systems 37 / Jan 01, 2024

Hao, Y., Tan, Y., Wang, S., Zhang, H., Zhu, B., & Zhu, X. (2024). Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting. Advances in Neural Information Processing Systems 37, 2001–2025. https://doi.org/10.52202/079017-0064

Education

TU Graz, Austria

Ph.D. in Computer Vision, Computer Vision / 2024

KIT, Germany

MS in ETIT / 2020

NUST, Pakistan

BS in EE / 2017

Graz University of Technology

Ph.D. in Computer Science / 2024

Karlsruhe Institute of Technology

M.Sc. in Electrical Engineering and Information Technology / 2020

National University of Science and Technology

B.Sc. in Electrical Engineering / 2017

Experience

Massachusetts Institute of Technology (MIT)

Postdoctoral Researcher / November, 2024December

Leading research on multimodal learning combining speech vision and language for scalable AI systems. Designing and evaluating methods to improve fine-grained reasoning in large language and vision-language models.

Graz University of Technology

Computer Vision Project Assistant / January, 2021October, 2024

Developed self-supervised and unsupervised learning techniques to improve neural network robustness to distribution shifts at test time. Conducted extensive research on LLMs and multimodal VLMs resulting in multiple publications at NeurIPS ICCV and CVPR.

Sony AI

Research Scientist Intern / May, 2024August, 2024

Designed multimodal learning methods integrating vision audio and language signals. Prototyped and evaluated models for cross-modal understanding in real-world scenarios.

Intel Labs

Master Thesis Researcher / January, 2020July, 2020

Evaluated robustness of state-of-the-art 2D and 3D object detectors for autonomous driving under adverse weather.

C++ Developer Intern / October, 2019December, 2019

Implemented state estimation using Unscented Kalman Filter in C++ and OpenCV for real-time object tracking.

Intel

Platform Application Engineer Intern / March, 2019August, 2019

Built an automation framework including PCB design and microcontroller integration to streamline internal workflows.

Join Jehanzeb on NotedSource!
Join Now

At NotedSource, we believe that professors, post-docs, scientists and other researchers have deep, untapped knowledge and expertise that can be leveraged to drive innovation within companies. NotedSource is committed to bridging the gap between academia and industry by providing a platform for collaboration with industry and networking with other researchers.

For industry, NotedSource identifies the right academic experts in 24 hours to help organizations build and grow. With a platform of thousands of knowledgeable PhDs, scientists, and industry experts, NotedSource makes connecting and collaborating easy.

For academic researchers such as professors, post-docs, and Ph.D.s, NotedSource provides tools to discover and connect to your colleagues with messaging and news feeds, in addition to the opportunity to be paid for your collaboration with vetted partners.

Expert Institutions
NotedSource has experts from Stanford University
Expert institutions using NotedSource include Oxfort University
Experts from McGill have used NotedSource to share their expertise
University of Chicago experts have used NotedSource
MIT researchers have used NotedSource
Proudly trusted by
Microsoft uses NotedSource for academic partnerships
Johnson & Johnson academic research projects on NotedSource
ProQuest (Clarivate) uses NotedSource as their industry academia platform
Slamom consulting engages academics for research collaboration on NotedSource
Omnicom and OMG find academics on notedsource
Unilever research project have used NotedSource to engage academic experts

Connect with researchers and scientists like Dr. Jehanzeb Mirza on NotedSource to help your company with innovation, research, R&D, L&D, and more.