SRIF: A Safer Ranking Inference Framework for Diffusion Policy Models without Retraining
Published:

Abstract
Recently, diffusion policy models have been applied in the field of robotics. Most existing methods use all observations as condition inputs to the diffusion model. With the denoising of the diffusion model, Gaussian noise gradually becomes an action sequence. However, since these constraints are implicitly entered into the diffusion model, the safety of the generated action sequences cannot be guaranteed. Moreover, diffusion policy models are difficult to handle dangerous situations which are not present in the training data and the generation process is challenging to control. Therefore, we propose a framework that does not require retraining. To our best knowledge, we are the first in the field of robotics to attempt to transform the diffusion models into ranking models. The use of ranking models can naturally combine with other constraints. Specifically, in order to obtain diffusion ranking, we use the conditional density estimation provided by diffusion model to obtain the relative ranking of candidate action sequences. We can also obtain constraint ranking using other information, such as constraining action sequences in a safe space. Finally, the action sequence with the highest fusion ranking is executed. In the robot vision exploration task, we validate our proposed method in both simulated and real environments. Experimental results show that our proposed method can inherit the ability of diffusion models and generalize to unknown environments without retraining. Moreover, after combining the constraints provided by the local occupancy map, our method surpasses the most advanced visual exploration methods. Robots have the highest exploration success rate and exploration distance among the twelve challenging scenarios.
Reference
Zengmao Wang, Qifei Tang, Wei Gao*, Shuhan Shen, SRIF: A Safer Ranking Inference Framework for Diffusion Policy Models without Retraining, IEEE Robotics and Automation Letters, 2025.DOI
