
AMIR &CHANG
Learning action models is important when goals change. When an agent acted for a while, it can
use its accumulated knowledge about actions in the domain to make better decisions. Thus, learning
action models differs from Reinforcement Learning. It enables reasoning about actions instead of
expensive trials in the world.
Learning actions’ effects and preconditions is difcult in partially observable domains. The
difculty stems from the absence of useful conditional independence structures in such domains.
Most fully observable domains include such structures, e.g., the Markov property (independence of
the state at time t+1from the state at time t−1, given the (observed) state at time t). These are
fundamental to tractable solutions of learning and decision making.
In partially observable domains those structures fail (e.g., the state of the world at time t+1
depends on the state at time t−1because we do not observe the state at time t), and complex
approximate approaches are the only feasible path. For these reasons, much work so far has been
limited to fully observable domains (e.g., Wang, 1995; Pasula, Zettlemoyer, & Kaelbling, 2004),
hill-climbing (EM) approaches that have unbounded error in deterministic domains (e.g., Ghahra-
mani, 2001; Boyen, Friedman, & Koller, 1999), and approximate action models (Dawsey, Minsker,
& Amir, 2007; Hill, Minsker, & Amir, 2007; Kuffner. & LaValle, 2000; Thrun, 2003).
This paper examines the application of an old-new structure to learning in partially observable
domains, namely, determinism and logical formulation. It focuses on some such deterministic do-
mains in which tractable learning is feasible, and shows that a traditional assumption about the form
of determinism (the STRIPS assumption, generalized to ADL, Pednault, 1989) leads to tractable
learning and state estimation. Learning in such domains has immediate applications (e.g., explo-
ration by planning, Shahaf, Chang, & Amir, 2006; Chang & Amir, 2006) and it can also serve as
the basis for learning in stochastic domains. Thus, a fundamental advance in the application of
such a structure is important for opening the eld to new approaches of broader applicability. The
following details the technical aspects of our advance.
The main contribution of this paper is an approach called SLAF (Simultaneous Learning and
Filtering) for exact learning of actions models in partially observable deterministic domains. This
approach determines the set of possible transition relations, given an execution sequence of actions
and partial observations. For example, the input could come from watching another agent act or
from watching the results of our own actions’ execution. The approach is online, and updates a
propositional logical formula called Transition Belief Formula. This formula represents the possible
transition relations and world states at every time step. In this way, it is similar in spirit to Bayesian
learning of HMMs (e.g., Ghahramani, 2001) and Logical Filtering (Amir & Russell, 2003).
The algorithms that we present differ in their range of applicability and their computational
complexity. First, we present a deduction-based algorithm that is applicable to any nondeterministic
learning problem, but that takes time that is worst-case exponential in the number of domain uents.
Then, we present algorithms that update a logical encoding of all consistent transition relations in
polynomial time per step, but that are limited in applicability to special classes of deterministic
actions.
One set of polynomial-time algorithms that we present applies to action-learning scenarios in
which actions are ADL (Pednault, 1989) (with no conditional effects) and one of the following
holds: (a) the action model already has preconditions known, and we observe action failures (e.g.,
when we perform actions in the domain), or (b) actions execution always succeeds (e.g., when an
expert or tutor performs actions).
350