Trustworthy and Robust AI Systems
Investigating the reliability, safety, and robustness of AI systems in software engineering and beyond.
Overview
AI systems are increasingly deployed in safety-critical settings, from autonomous vehicles to software development tools. Our research studies how to make these systems more robust, how to detect and repair their failures, and how to understand the security risks introduced by the AI toolchain itself.
This spans work on DNN testing, adversarial robustness, autonomous driving simulation, compiler-level attacks on ML models, and causal approaches to system configuration.
Key Directions
- DNN Testing & Repair: Systematic methods for finding confusion and bias errors in neural networks, and techniques for repairing them.
- Autonomous Driving: Simulation-based testing, traffic generation, and scenario-based evaluation for self-driving systems.
- Adversarial Robustness: Metric learning and multitask approaches that strengthen model robustness under adversarial and natural variations.
- ML Toolchain Security: Discovering and exploiting vulnerabilities in deep learning compilers and infrastructure.
- Causal Configuration: Using causal reasoning to understand and optimize the performance of configurable systems.
Impact
DeepTest was a pioneering work in automated testing of autonomous driving systems and has been highly cited. Our work on compiler backdoors was accepted at IEEE S&P 2026, revealing a new attack surface in the ML supply chain. The Unicorn and Cameo systems brought causal reasoning to system performance optimization.
Contributors
Baishakhi Ray
Simin Chen
Ira Ceka
Ziyuan Zhong
Yuchi Tian
Rahul Krishna
Saikat Chakraborty
Selected Publications
Trustworthy AI Software Engineers
A Aleti, B Ray, R Hoda, S Chen · Preprint, 2026
Your compiler is backdooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers
S Chen, J Peng, Y He, J Yang, B Ray · IEEE Symposium on Security and Privacy (S&P) 2026
DyCodeEval: Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination
S Chen, P Pusarla, B Ray · ICML 2025
Towards causal deep learning for vulnerability detection
MM Rahman, I Ceka, C Mao, S Chakraborty, B Ray, W Le · ICSE 2024
Language-guided traffic simulation via scene-level diffusion
Z Zhong, D Rempe, Y Chen, B Ivanovic, Y Cao, D Xu, M Pavone, B Ray · Conference on Robot Learning, 144-177
Guided conditional diffusion for controllable traffic simulation
Z Zhong, D Rempe, D Xu, Y Chen, S Veer, T Che, B Ray, M Pavone · ICRA 2023, 3560-3566
Cameo: A causal transfer learning approach for performance optimization of configurable computer systems
MS Iqbal, Z Zhong, I Ahmad, B Ray, P Jamshidi · ACM Symposium on Cloud Computing 2023, 555-571
Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles
Z Zhong, G Kaiser, B Ray · IEEE Transactions on Software Engineering 49(4), 1860-1875
Detecting multi-sensor fusion errors in advanced driver-assistance systems
Z Zhong, Z Hu, S Guo, X Zhang, Z Zhong, B Ray · ISSTA 2022
Automatic map generation for autonomous driving system testing
Y Tang, Y Zhou, K Yang, Z Zhong, B Ray, Y Liu, P Zhang, J Chen · Preprint, 2022
Unicorn: Reasoning about configurable system performance through the lens of causality
MS Iqbal, R Krishna, MA Javidian, B Ray, P Jamshidi · EuroSys 2022, 199-217
Repairing Group-Level Errors for DNNs Using Weighted Regularization
Z Zhong, Y Tian, CJ Sweeney, V Ordonez, B Ray · Preprint, 2022
View all publications →