Classical planning is one of the earliest subareas of AI, where one can leverage domain knowledge to perform long-horizon reasoning and address complex planning problems. Despite the impressive successes, one limitation of classical planners is the assumption that state transitions take place in closed worlds, making them less robust to unforeseen situations in open worlds. Also, classical planners assume the current world state is provided beforehand, which can be unrealistic in practice. Aiming to address those two limitations, we propose a novel framework, called DKPrompt, that visually grounds a classical planner through a vision-language model (VLM) for open-world planning. A unique feature of DKPrompt is the use of the action description knowledge of classical planners to tailor VLM prompts before and after each action, enabling active perception and situational awareness on classical planners. Results from quantitative experiments show that DKPrompt outperforms naive classical planners, pure VLM-based planners and a few other competitive baselines in task completion rate.
An overview of DKPROMPT. By simply querying the robot's current observation against the domain knowledge~(i.e., action preconditions and effects) as VQA tasks, DKPROMPT can call the classical planner to generate a new valid plan using updated world states. Note that DKPROMPT only queries about predicates. The left shows how DKPROMPT checks every precondition of the action to be executed next, and the right shows how it verifies the expected action effects are all in place after action execution. Replanning is triggered when preconditions or effects are unsatisfied after updating the planner's action knowledge.