Cross-Code and Program Translation
← Back to ProjectsCross-Code and Program Translation
Translating code across programming languages with a focus on safety and correctness.
Overview
Migrating legacy codebases to modern, safer languages is a pressing need across industry and government. Our research develops techniques for automatically translating code between programming languages, with particular emphasis on the C-to-Rust translation pipeline where memory safety guarantees are at stake.
We study both the capabilities and pitfalls of ML-based translation, building neurosymbolic systems that combine the flexibility of neural models with the rigor of formal specifications.
Key Directions
- C-to-Rust Translation: Neurosymbolic techniques that produce safer Rust code from C projects, preserving functionality while gaining memory safety.
- Multi-modal Specifications: Generating specifications in multiple modalities (types, contracts, tests) to guide and validate translations.
- Translation Quality: Empirical studies measuring the quality, correctness, and idiomatic nature of machine-translated code.
- Unsupervised Translation: Back-translation techniques that enable code translation without parallel training corpora.
Impact
C2SaferRust is among the first systems to translate real-world C projects into safer Rust using neurosymbolic techniques, addressing a key need in the software security community. Our empirical studies on ML-based translation quality have helped set realistic expectations for automated migration tools.
ARiSE Lab