Created on May 01, 2025
2025
RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals accepted to ICML 2025