Recent systems incorporate deep reinforcement learning (DRL) for policy learning (Huang et al., 2021) and symbolic planners for interpretability (Miller & Zhou, 2023). However, many suffer from and latency when scaling to complex, multi‑object scenarios—issues directly addressed by Polanski’s later work.
The architecture allows (Polanski et al., 2022) while preserving the adaptability of DRL. lena polanski joi
[ u = \pi_\theta(s, \phi(o)) ]