Trustworthy and Robust AI Systems

← Back to Projects

Trustworthy and Robust AI Systems

Investigating the reliability, safety, and robustness of AI systems in software engineering and beyond.

Overview

AI systems are increasingly deployed in safety-critical settings, from autonomous vehicles to software development tools. Our research studies how to make these systems more robust, how to detect and repair their failures, and how to understand the security risks introduced by the AI toolchain itself.

This spans work on DNN testing, adversarial robustness, autonomous driving simulation, compiler-level attacks on ML models, and causal approaches to system configuration.

Key Directions

  • DNN Testing & Repair: Systematic methods for finding confusion and bias errors in neural networks, and techniques for repairing them.
  • Autonomous Driving: Simulation-based testing, traffic generation, and scenario-based evaluation for self-driving systems.
  • Adversarial Robustness: Metric learning and multitask approaches that strengthen model robustness under adversarial and natural variations.
  • ML Toolchain Security: Discovering and exploiting vulnerabilities in deep learning compilers and infrastructure.
  • Causal Configuration: Using causal reasoning to understand and optimize the performance of configurable systems.

Impact

DeepTest was a pioneering work in automated testing of autonomous driving systems and has been highly cited. Our work on compiler backdoors was accepted at IEEE S&P 2026, revealing a new attack surface in the ML supply chain. The Unicorn and Cameo systems brought causal reasoning to system performance optimization.

Contributors

Baishakhi Ray Simin Chen Ira Ceka Ziyuan Zhong Yuchi Tian Rahul Krishna Saikat Chakraborty

Selected Publications

Trustworthy AI Software Engineers

A Aleti, B Ray, R Hoda, S Chen · Preprint, 2026

Your compiler is backdooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers

S Chen, J Peng, Y He, J Yang, B Ray · IEEE Symposium on Security and Privacy (S&P) 2026

DyCodeEval: Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination

S Chen, P Pusarla, B Ray · ICML 2025

Towards causal deep learning for vulnerability detection

MM Rahman, I Ceka, C Mao, S Chakraborty, B Ray, W Le · ICSE 2024

Language-guided traffic simulation via scene-level diffusion

Z Zhong, D Rempe, Y Chen, B Ivanovic, Y Cao, D Xu, M Pavone, B Ray · Conference on Robot Learning, 144-177

Guided conditional diffusion for controllable traffic simulation

Z Zhong, D Rempe, D Xu, Y Chen, S Veer, T Che, B Ray, M Pavone · ICRA 2023, 3560-3566

Cameo: A causal transfer learning approach for performance optimization of configurable computer systems

MS Iqbal, Z Zhong, I Ahmad, B Ray, P Jamshidi · ACM Symposium on Cloud Computing 2023, 555-571

Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles

Z Zhong, G Kaiser, B Ray · IEEE Transactions on Software Engineering 49(4), 1860-1875

Detecting multi-sensor fusion errors in advanced driver-assistance systems

Z Zhong, Z Hu, S Guo, X Zhang, Z Zhong, B Ray · ISSTA 2022

Automatic map generation for autonomous driving system testing

Y Tang, Y Zhou, K Yang, Z Zhong, B Ray, Y Liu, P Zhang, J Chen · Preprint, 2022

Unicorn: Reasoning about configurable system performance through the lens of causality

MS Iqbal, R Krishna, MA Javidian, B Ray, P Jamshidi · EuroSys 2022, 199-217

Repairing Group-Level Errors for DNNs Using Weighted Regularization

Z Zhong, Y Tian, CJ Sweeney, V Ordonez, B Ray · Preprint, 2022

View all publications →