AI Magazine Volume 20 Number 2 (1999) ( AAAI)
Decision-Theoretic Planning
■ The recent advances in computer speed and algo-
decision-theoretic planners, which in general
rithms for probabilistic inference have led to a
resurgence of work on planning under uncertain-
ty. The aim is to design AI planners for environ-
In this article, I provide a brief overview of
ments where there might be incomplete or faulty
some of the recent work in decision-theoretic
information, where actions might not always have
planning. I highlight some of the advances
the same results, and where there might be trade-offs between the different possible outcomes of a
made and some technical problems that still lie
plan. Addressing uncertainty in AI, planning algo-
ahead. In the next section, I describe the prob-
rithms will greatly increase the range of potential
lem addressed by decision-theoretic planning.
applications, but there is plenty of work to be done
before we see practical decision-theoretic planning
based on classical planning algorithms and the
systems. This article outlines some of the chal-
lenges that need to be overcome and surveys someof the recent work in the area.
A planning problem in AI is usually specified as
follows: Given a description of the current
its goals. In problems where actions can lead to
state of some system, a set of actions that can
a number of different possible outcomes, or
be performed on the system and a description
where the benefits of executing a plan must be
of a goal set of states for the system, find a
weighed against the costs, the framework of
sequence of actions that can be performed to
decision theory can be used to compare alter-
transform the system into one of the goal
states. In the textbook example of the blocks
world, the system consists of a set of named
has maintained an interest in decision-theoret-
blocks, each of which can be on the ground or
ic planning, with Feldman and Sproull’s (1977)
on top of another block. The possible actions
work being one of the earliest efforts in this
move a block from one location to another,
and the goal is a particular configuration of
focused on efficient algorithms for planning
under quite restrictive assumptions, the past
When planning programs are used to provide
five years have seen a resurgence in work on
courses of action in real-world settings, such as
planning under uncertainty and decision-the-
medical treatments, high-level robot control, or
oretic planning for several reasons: The recent
disaster relief, they must take into account the
advances in Bayesian inference have made it
fact that actions can have several different out-
comes, some of which might be more desirable
expected utility of a plan under conditions of
than others. They must balance the potential of
uncertainty. The success of Markovian ap-
some plan achieving a goal state against the risk
proaches in areas such as speech recognition
of producing an undesirable state and against
and the closely related reinforcement learning
techniques has encouraged work in planning
Decision theory (Luce and Raiffa 1957) pro-
using Markov decision processes (MDPs).
vides an attractive framework for weighing the
Faster computers have made it feasible to build
strengths and weaknesses of a particular course
Copyright 1999, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1999 / $2.00
of action, with roots in probability theory and
more of these assumptions in a classical plan-
utility theory. Given a probability distribution
ner. First, we can relax the assumption that
over the possible outcomes of an action in any
actions are deterministic by specifying a set of
state, and a reasonable preference function
possible outcomes rather than the single out-
over outcomes, we can define a utility function
come in the deterministic case. Consider the
planning problem of moving my kitchen china
would prefer one plan over another, the pre-
from an old apartment to a new one. The goal
ferred plan has higher expected utility. The
is to move the china unbroken, specified as
task of the planner then seems straightfor-
(at china new-apartment) ∧ ¬ (broken china)
Figure 1 shows three increasingly detailed
descriptions of the drive-china action that can
Unfortunately, decision theory says nothing
achieve this goal. The first is deterministic and
about the task of constructing a plan with high
could be used in a plan that achieves the goal
expected utility. Therefore, AI planning and
with certainty: (put-in china), (drive-china
decision theory would appear to be comple-
mentary, and there has been interest in merg-
extra operators in figure 2. The second drive-
ing the two approaches for a considerable time
china operator has two alternative sets of
(Feldman and Sproull 1977). However, there
effects, modeling two different world states
are hard problems to overcome. It is usually
that can arise if the operator is applied. The
not feasible to search the entire space of plans
numbers on the arcs are the probabilities of
to find the MEU plan. Indeed, computing the
each state of affairs. Under this model, the
expected utility of a single plan can be prohib-
two-step plan will only achieve the goal with
itively expensive because the number of possi-
ble outcomes can grow very large. To avoid
In the third formulation of drive-china, dif-
specifying the value of each possible state, we
ferent sets of outcomes are possible based on
need to find compact specifications of utilities,
the conditions that hold when the operator is
as well as actions, and consider the interaction
applied, and the arcs now show conditional
between them. The recent growth of research
probabilities. The two-step plan still has a
in decision-theoretic planning follows the
probability of 0.7 of succeeding. The three-step
recent success of work in reasoning under
plan (pack china) (put-in china) (drive-china
uncertainty and draws on compact representa-
…) succeeds with probability 0.95.
tions such as those of belief nets (see the article
by D’Ambrosio, also in this issue).
goals are logical descriptions of world states, so
that a plan either succeeds or fails. We can
attach utilities to different world states repre-
senting degrees of desirability. For example,
Since the early planners GPS (Newell and Simon
suppose the state in which all my china is
1963) and STRIPS (Fikes and Nilsson 1971), most
moved unbroken has a utility of 10, and one in
planners that have been designed make essen-
which half my china is moved and none is bro-
tially the same assumptions about the world in
ken has a utility of 6; any state in which china
which they operate. From the point of view of
is broken has a utility of –100; and any other
decision-theoretic planing, three assumptions
state has a utility of 0. If the pack operator can
of note are (1) the goal of the planner is a log-
only pack half the china, the 2-step plan has
ical description of a world state, (2) the actions
an expected utility of –23, and the 3-step plan
taken by the planner are the only sources of
has an expected utility of 1. In this case, the
change in the world, and (3) each action can
plan to pack some of the china is slightly bet-
be described by the conditions under which it
ter than doing nothing, which has a utility of
can be applied and its effects on the world.
0, and the plan to simply move the china with-
These effects are described by a set of facts that
are either to be added to, or deleted from, theworld state in which the action is applied.
The last is known as the STRIPS assumption. I
refer to planners that share these assumptions
as the classical planners. They include PRODIGY(Veloso et al. 1995), UCPOP (Penberthy and
In the next four subsections, I describe four
Weld 1992), GRAPHPLAN (Blum and Furst 1997),
and the hierarchical task network planner NON-
uncertainty that can be viewed as extending a
classical planning approach. These are SNLP-
Here, I describe work that relaxes one or
based planners, DRIPS, WEAVER, and MAXPLAN,
(at car ?y)~(at china ?x) (at china ?y)
Nondeterministic effects with conditional probabilities
Figure 1. Three Versions of the Operator Drive-China.
although the last of these is arguably not a
classical planner. Organizing the article around
Several systems based on classical planning
specific research programs makes it easier to
ignore outcome utility beyond the binary mea-
keep in mind the specific assumptions behind
sure of goal achievement, although exceptions
each one, at the cost of making it harder to fol-
low how they address some of the issues and
PYRRHUS (Williamson and Hanks 1994). In plan-
challenges that must be faced in creating deci-
ners based on Markov decision processes, dis-
sion-theoretic planners. In this section, I
cussed later in this article, a numeric reward
briefly describe some of these issues as a way of
function is specified for an action in a state.
forming a road map for understanding the dif-
Although this is adequate from a purely deci-
sion-theoretic point of view, a more structuredrepresentation of utility is needed to make
planning tractable, one where the utility of a
As the china-moving example shows, to repre-
state is built up from smaller building blocks so
sent uncertain actions one has to represent sev-
that it is less tedious to specify, and the utility
eral alternative outcomes and their probabili-
of abstract states can be estimated. The prob-
ties, conditional on the state. Providing a richer
lem of finding such a representation has been
representation of action inevitably makes the
Some planners relax the binary measure of
planning problem harder, and planners must
goal achievement to allow partial satisfaction
use a representation that expresses the different
of goals. In the china-moving example, the full
outcomes as concisely as possible yet is still ful-
goal is to move all the china unbroken, but we
ly expressive. The representation of the final
might attach partial credit for moving some of
drive-china operator is not very concise, for
it unbroken. Given a conjunctive set of goals,
example, because the four possible outcomes
sometimes the utility from solving a subset can
are described explicitly, even though the two
be specified as the sum of some utility for each
branches only differ in their probabilities, and
goal. Care must be taken with this approach;
each outcome only differs on whether the chi-
for example, in the china-moving example,
na is broken. When different state variables are
the top-level goals are (at china new-apart-
affected independently by an action, it is pos-
ment) and (not (broken china)), but a planner
sible for the number of outcomes to grow expo-
that achieves the first goal without the second
nentially in the number of variables.
should not receive half the credit. Sometimes
of actions executed blind. In normal MDPs,the output is a policy, a mapping from state toaction that assumes that every state variablecan be observed without error, although some
work that relaxes this assumption is described
later. In the BURIDAN planner (Kushmerick,
Hanks, and Weld 1995), actions can have sens-
ing results, as well as effects in the world, andnoisy sensors can be modeled.
In an uncertain world, high-utility plans cre-
ated in advance might need to have someactions that are conditional on future observa-tions. Most of the planners discussed are capa-ble of creating such plans, employing a varietyof techniques. Note that planning agents mustbe able to make observations of their domainduring execution to use conditional plans. Figure 2. Other Operators in the Moving-House Example.
Perhaps the largest question mark hangingover the planners described here is theirtractability. Few have been applied in large,
the utility of an outcome can be made more
high-dimensional domains, although excep-
modular by summing the cost of each operator
independently (Perez and Carbonell 1994),
1996) and WEAVER (Blythe 1998). In addition, it
is rare that the planners are directly compared
with each other, although MAXPLAN is an excep-
that further assumes a deadline by which a goal
tion (Majercik and Littman 1998). The lack of
should ideally be achieved, with a separate
comparative analysis is partly because as we
penalty function that offsets the utility for the
will see, the planners have concentrated on
goal when it is achieved after the deadline.
different aspects of the planning problem.
They propose a representation that encompass-
The tractability of the planners is affected by
es most of these approaches independently of
the kind of planning algorithm used: SNLP,
any planner in Haddawy and Hanks (1998a).
PRODIGY, refinement planning, hierarchical tasknetwork (HTN) planning, and compilation to
satisfiability have all been used. It is also affect-
Whether outcome utility is multivalued or
ed by the use of search control, which has been
simply binary, one must compute or estimate
the expected utility of a plan. For the degener-ate case, this is simply the probability of suc-
cess. There are several different ways to com-pute or estimate utility, as we will see, and
SNLP maintains a plan as a partially ordered set
none completely dominates any other. For the
of actions, along with ordering constraints
problem of deciding which partial plan to
between steps, bindings for variables in steps,
expand, all that is required is a comparative
and explicit data structures, called causal links,
value—deciding which plan has higher expect-
that represent what the step is supposed to
ed utility—which is sometimes simpler than
accomplish in the plan (Weld 1994). A causal
finding the absolute value. For efficient plan-
link records the subgoal being achieved, the
ning, one must also consider the trade-off
consumer step that has this subgoal as a precon-
between time spent estimating plan value and
dition, and the provider step that has the sub-
time spent expanding a plan whose value is
goal as an effect. For example, the arrow
between drive-china and goal in figure 3 is acausal link for the subgoal (at china new-apart-
ment), with drive-china as provider and goal as
consumer. Goal is a dummy step added to the
Most of these systems distinguish between
plan to represent the top-level goals. Similarly,
observable and nonobservable domain vari-
initial is a dummy step representing the initial
ables. A classical STRIPS plan essentially assumes
state. The SNLP algorithm follows the loop
no observability because the plan is a sequence
shown in figure 4. It repeatedly picks an
(at china old-apartment)(at car old-apartment)~(broken china)
~(at car ?x) (at car ?y)~(at china ?x) (at china ?y)
Figure 3. A Plan to Move China, as Found by SNLP.
unsupported condition in the plan, adds a
cess of the plan and terminates if it is above
causal link supporting it from some step that
the given threshold. Figure 5 shows a plan
found by BURIDAN in the china-moving exam-
threats to the link by forcing any step that
ple if the third model of action is used.
negates the condition to take place either
before or after both linked steps. By consider-
because with probability 0.3, executing drive-
ing all possible choices for causal links and
china can lead to the china being broken. This
threat orderings, SNLP is complete; that is, it
step is therefore a threat to the link for (broken
will find a plan if one exists. For more details,
china) from Initial to Goal. In addition to
SNLP’s reorderings to remove threats, BURIDANcan confront a threat by decreasing the proba-
bility of an outcome in which the threatened
BURIDAN (Kushmerick, Hanks, and Weld 1995)
condition is negated. In this case, this is done
is a modified version of the SNLP planning algo-
by adding a link from the outcome with prob-
rithm that can create plans that meet a thresh-
ability 0.95 that does not add (broken china)
old probability of success when actions are
and planning for its trigger (packed china).
nondeterministic, as are the last two actions in
BURIDAN can then find the plan [pack-china,
figure 1. BURIDAN differs from SNLP by allowing
put-in china, drive-china], which succeeds
more than one causal link for each condition
in the plan. Under different execution scenar-
To this point, we haven’t discussed how a
ios, different actions can cause the condition
planner finds the probability that a plan suc-
to be true, so the links combine to increase
ceeds. A number of possible strategies are
support for the condition. SNLP’s termination
based on forward projection or on the analysis
criterion that all conditions have exactly one
of the dependencies among state descriptors.
link is no longer appropriate. Instead, BURIDAN
Empirical experiments show no clear winner
explicitly computes the probability of the suc-
■ If the current plan has no unsupported conditions, return it. ■ Pick an unsupported condition P for some step U and add a causal link to
the plan that supports it, achieving the condition with either an existingor a new step, A. (On backtracking, try all possible links.)
■ Resolve any threats to the new link. A threat comes from any step T in
the plan that can take place between A and U and can negate P. Resolvethe threat by ordering T either before A or after U if possible (onbacktracking, try both orderings). If any threat cannot be resolved, fail andbacktrack. Figure 4. An Outline of the SNLP Algorithm.
(at china old-apartment)(at car old-apartment)~(broken china)
Figure 5. A Plan to Move China, Found by BURIDAN.Features Table 1. Using Forward Projection to Evaluate the Plan for the China-Moving Problem.
Forward projection begins with the set of
possible initial states and simulates executingthe plan one step at a time, maintaining the
set of possible states and their probabilitiesafter each step is completed. When the simula-tion is finished, summing the probability of
pack-china
each state in which the goals are satisfied givesthe plan’s probability of succeeding. The top oftable 1 shows the sets of states that are built as
the three-step plan is evaluated by forwardprojection. To make the example more inter-
put-in-china
esting, we assume that the pack-china actionsucceeds with probability 0.5 and, otherwise,
has no effect. It therefore leads to two states inthe forward projection, which are propagated
drive-china
through the put-in-china step and lead to fourpossible states after the drive-china action. Theprobability of success, 0.825, is found by sum-ming the probabilities of states.
In general, the number of states considered
can clearly grow exponentially with the lengthof the plan. Many of the states created mightdiffer only on features that are irrelevant to the
plan. This observation suggests other plan-evaluation strategies that exploit the structureof the plan. One such strategy is to evaluatethe plan using a belief net (Pearl 1988). The
Figure 6. Using a Belief Net to Evaluate the Plan
STRIPS representation for action, in which the
preconditions contain sufficient information
The nodes with bold text represent actions in the plan, while the others represent
to completely determine the outcomes of the
fluent values at a given point in time.
action, is a special case of the Markov assump-tion, which is preserved in the action represen-
a random variable representing either the val-
tation used here (Wellman 1990). One can use
ue of a state descriptor at a point in time or the
this observation to create a belief net that can
outcomes of an action taken in the plan. The
be queried to determine the probability of the
final action goal is included, with two possible
success of the plan. There are several different
values: true or false. The probability that this
ways in which the belief net can be created;
node has value true is equal to the plan’s prob-
one example is shown in figure 6. Each node is
(at china old-apartment)(at car old-apartment)~(broken china)
Drive-china-around-mountain ?x ?y
Drive-china-over-mountain ?x ?yFigure 7. A Conditional Plan Represented Using Context Labels Attached to Actions.
a plan, but they can easily represent plans thatbranch and then re-merge as well as partially
CNLP was the first SNLP-based planner to repre-
sent conditional plans (Peot and Smith 1992).
C-BURIDAN refines the approach of CNLP by
Each step in the plan has a set of context labels
keeping observations distinct from effects and
associated with it that denote the conditions
under which the step will be executed. For
action to have the same observation label, thus
example, figure 7 shows a conditional plan for
modeling partial observability (Draper, Hanks,
moving the china in which the agent loads the
and Weld 1994). In C-BURIDAN, new contexts
china in the car and then listens to the weath-
er report before deciding whether to take the
actions that can be put in separate branches of
route over the mountain or around the moun-
a conditional plan. Obviously, the ability to
tain. The operator get-weather-report has two
create conditional plans can make the plan-
possible outcomes, each producing one of the
ning problem even less tractable. Most condi-
observation labels ok or bad. Each of the twodriving operators is marked with one of the
tional planners do not require a plan to be pro-
labels as its context. Thus, they must be exe-
duced for every contingency (althoughCASSANDRA
cuted after get-weather-report, and drive-chi-
na-over-mountain, for example, will only be
1996]). Onder and Pollack (1997) have studied
executed if the label ok was produced. If any
how to choose which contingencies to plan
subsequent actions had no context labels, they
for, both because of the need to improve effi-
would be executed on both branches of the
ciency and because of the observation that
conditional plan. Contexts are not as direct a
some contingencies can safely be left for
representation as putting an explicit branch in
Figure 8. Ground and Abstract Operators from the China-Moving Example That Could Be Used by the Refinement Planner DRIPS.
The thumbs-up symbol denotes an outcome in which the china is moved successfully, and the thumbs-down icon denotes an outcome inwhich the china is broken.
repeatedly choosing a more specific version ofan operator in the plan until there are no
The planners described earlier search for plans
abstract operators left. The planner begins
that exceed a given minimum expected utility.
with one of a set of skeletal plans containing
To find the plan with MEU, however, one must
somehow evaluate all possible plans to select
library. Figure 8 shows some example ground
the best one. In DRIPS, ranges of utility values
and abstract operators from the china-moving
are computed for partial plans, encompassing
example, extended with a richer model of the
the best and worst expected utilities of all pos-
sible completions of the partial plan (Haddawy
and Suwandi 1994). If the lowest value in some
tion network that describes all possible special-
partial plan’s range exceeds the highest value
izations of a skeletal plan to move the china.
in another’s, the partial plan with the lower
Solid lines denote decomposition of an opera-
range can be dropped from consideration with-
tor, and dashed lines denote possible choices
out expanding all the complete plans below it,
for an abstract operator. The skeletal plan
an approach that can lead to significant sav-
encodes four potential plans: One can choose
ings. The dropped plan is said to be dominated
whether to pack the china, and one can inde-
by the other plan, an idea also explored by
pendently choose which route to take.
Suppose DRIPS begins refining this plan by
DRIPS maintains ranges of utility values using
choosing between “pack-and-load” and “load”
to refine the “load-up-china” operator. The
abstraction hierarchy of operators (Friedland
planner will typically add both refinements to
and Iwasaki 1985). In this approach, a partial
a queue of partial plans, but first it computes
plan is a sequence of operators, one or more of
ranges of expected utility for each one. To do
which can be an abstraction of a number of
this, it explicitly considers each set of possible
Figure 9. Ground and Abstract Operators from the China-Moving Example That Could Be Used by the Refinement Planner DRIPS.
outcomes, or chronicle, for each plan, as shown
in table 2. Each plan has two possible chroni-
search control within the DRIPS framework.
cles, either the china gets broken, or it does
DRIPS has two choice points: (1) which abstract
not. For each chronicle, DRIPS computes ranges
plan to specialize and (2) which abstract action
for the duration, utility, and probabilities
in a plan to specialize. The plan with the cur-
based on the ranges or values found in the
rent highest upper bound on expected utility
actions in the plan. A series of simple linear
must be expanded to provably find the opti-
programming problems are solved to compute
mal plan; so, choosing such a plan to expand
ranges for the expected utility of each abstract
gives a search heuristic similar to A* (Goodwin
plan. In this case, the ranges are [–17.6 0.7] for
and Simmons 1998; Haddawy, Doan, andGoodwin 1995). The representation of utilities
plan A and [–37 –23] for plan B. In this case,
in DRIPS allows for goal deadlines and mainte-
DRIPS will not add plan B to the list of plans to
refine because no refinement can do better
(1998a). The system has been applied in chal-
than any refinement of plan A. DRIPS deter-
mined this without exploring the refinements
results (Haddawy, Doan, and Kahn 1996).
of plan B, an ability that can lead to large com-putational savings in more complicateddomains.
The use of skeletal refinement planning in
DRIPS trades planning power for simplicity andthe ability to compute utility ranges for partial
WEAVER is a probabilistic planner based on
plans. In some complex domains, it can be
PRODIGY (Veloso et al. 1995). Like C-BURIDAN, it
burdensome to represent the range of possible
has no representation of utilities. It has no
plans as skeletal plans that can be refined in
explicit model of observability but assumes the
this way. However, this form allows utility
world is completely observable at plan-execu-
ranges to be computed for partial plans based
tion time. However, it has an explicit represen-
on the ranges computed for abstract actions. It
tation for uncertain exogenous events, in oth-
is not clear how to compute utility ranges for,
er words, for events that take place outside the
say, partial BURIDAN plans because they include
volition of the planning agent and that can be
no commitment for the action or actions used
modeled as occurring with some conditional
probability given the world state. External
Plan A: Pack and LoadChina Chronicle Duration Plan B: Load China, Don't Pack Table 2. A Summary of the Possible Outcomes for Each Refinement of the Skeletal Plan Found by Refining the Load-China Abstract Action.
events often play a significant role in real
domain. For example, the drive-china actions
domains and should be considered by a deci-
could include effects describing the landlord’s
sion-theoretic planner because (1) they can be
whereabouts, as could the pack-china operator
influenced at least indirectly by altering the
if it takes a significant amount of time. In prac-
world-state conditions they depend on and (2)
tice, modeling exogenous events in this way
contingent plans can be built based on their
can be extremely inefficient because the effects
of these events are duplicated in every operator
in the domain, and conversely, each operator
suppose that to unload the china at the new
apartment, the landlord must be waiting there
with the key when the car arrives. Initially, the
there are exogenous events using a three-step
landlord is waiting, but as time passes, he
process: First, a plan is built and partially eval-
might get restless and leave, in which case he
uated ignoring exogenous events. In the chi-
might come back later. A planning agent can
na-moving domain, the same three-step plan
reduce the probability that the landlord will
that is found by BURIDAN will be found: (1)
leave by calling when the car is close to the
pack-china, (2) put-in china, and (3) drive-chi-
apartment. If he is not waiting when the car
na. Second, the goal structure for this plan is
arrives, the agent can search for him. Figure 10
shows an exogenous event specification mod-
events, which are those whose effects can alter
eling the landlord’s whereabouts over time as
the domain features required by the plan or
well as the operators call-landlord and search-
can influence other relevant events. Third, the
for-landlord. Assuming a discrete-time model,
plan is improved by either adding steps to
the landlord-moves events mean that if at any
reduce the probability of an undesirable effect
time point, the landlord is at the apartment
or adding a conditional branch. The plan-
and has not been called, he is at the apartment
improvement operations are similar to those
at the next time point with probability 0.95,
of C-BURIDAN (Draper, Hanks, and Weld 1994)
and his whereabouts are unknown with prob-
but use the PRODIGY planner (Veloso et al.
ability 0.05. If he has been called, the probabil-
1995). The main difference is that the exoge-
ity of his leaving drops to 0.02. If his where-
nous events that affect the plan evaluation are
abouts are unknown, he can reappear at the
also considered as well as the actions in the
apartment with probability 0.01. These events
example, a possible improvement is to add a
conditional branch to search for the landlord if
In theory, the use of exogenous events does
not increase the representational power of the
planner because the effects of these events can
The belief net on the left in figure 11 shows
be modeled in the effects of the actions in the
an evaluation structure for the plan before
Figure 10. Two Exogenous Events Modeling the Landlord’s Whereabouts and Operators to Call and Search for the Landlord.Figure 11. Two Time-Slice Bayes’s Nets That Can Be Used to Evaluate the Plan under Exogenous Events.
In the first, exogenous events are represented directly; in the second, their effects are compiled into Markov chains that are used to computeprobabilities over the time intervals of interest. The numbers along the x-axis represent time points, while nodes for fluents, actions, andevents are arranged along the y-axis.
improvement, after the relevant events gov-
abouts. The Markov chains are used to com-
pute links in the belief net on the right in fig-
been added. It is possible to make this evalua-
ure 11. Such a belief net can be much more
tion more efficient by taking advantage of the
efficiently evaluated, especially when some
actions have long durations compared with
Markov process. Rather than explicitly model-
those of the exogenous events. WEAVER uses a
ing the events’ effects at each time step, inde-
sound algorithm to create these streamlined
belief nets automatically (Blythe 1996).
describe parts of the world state’s evolution
Search heuristics for planning under uncer-
over time—in this case, the landlord’s where-
tainty have also been studied in WEAVER. When
steps are added to a plan to increase the num-
ber of chronicles in which it succeeds, a useful
(DPLL) algorithm (Davis, Logemann, and Love-
strategy is to seek steps that succeed in cases
land 1962). This algorithm systematically finds
that are complementary to those in which the
the best assignment by constructing a binary
original plan succeeds. These steps can be
tree in which each node represents a variable,
found using a local analysis of the possible
found by conditioning on the variable. DPLL
improve performance both in WEAVER and BURI-
either maximizes or computes the weighted
DAN (Blythe 1995). WEAVER also makes use of
average of the subtrees depending on whether
analogy between different branches of a condi-
the node is a choice or a chance variable. At
tional plan (Blythe and Veloso 1997). These
each node, the algorithm eliminates all irrele-
vant variables (that only appear in satisfied
applied to problems requiring several rounds
clauses), forced variables (that appear alone in
of improvement in plans some tens of steps in
an unsatisfied clause), and “pure” variables
length (Blythe 1998). The results are encourag-
(that appear either always negated or always
ing in the search for tractable planning under
not negated). The authors have experimented
uncertainty but leaves plenty of room for fur-
with different orderings on the variables.
Also building on a recent classical planning
algorithm is work to create a probabilistic
planner based on GRAPHPLAN (Smith and Weld
1998; Weld, Anderson, and Smith 1998).
One way to improve the efficiency of planning
under uncertainty can come from inspiration
from recently developed classical planning
There is a growing literature on extending clas-
algorithms, such as SATPLAN (Kautz and Selman
uncertainty. I apologize to those whose work I
into a logical satisfiability problem. After this
step, standard algorithms can be used to solve
Goldman and Boddy (1994) explore the idea
of relaxing the assumption that the outcomes
of different actions in a plan are probabilisti-
probabilistic planner that is similar in spirit to
cally independent, using a knowledge-based
SATPLAN. Littman, Goldsmith, and Mundhenk
(1998) show that under reasonable conditions,
Breese, and Goldman 1992). This assumption
probabilistic planning is NPPP-complete and
is made in all the planners discussed previous-
introduce an NPPP-complete decision problem
ly, although the causal influences used by
E-MAJSAT. MAXPLAN compiles a probabilistic plan-
Goldman and Boddy bear similarities to the
ning problem into an instance of E-MAJSAT and
solves it. Just as GRAPHPLAN and SATPLAN are
more efficient than classical planners in some
using a rich language that can express do-
domains, so MAXPLAN should have an advan-
while loops and include arbitrary Lisp code.
tage in some domains, as preliminary experi-
The system selects a plan from a user-defined
library and applies transformations to improve
An instance of E-MAJSAT is a satisfiability
its performance. Because of the rich plan rep-
problem, a Boolean formula whose variables
resentation, it is hard to evaluate plans analyt-
are divided into two types: (1) choice variables
and (2) chance variables. The chance variables
tion scenarios called projections to compare
are each assigned an independent probability
of being true. The task is to find an assignment
CYPRESS (Wilkins et al. 1995) is a framework
of truth values to the choice variables that
for probabilistic planning built around three
components: (1) SIPE-2 is an HTN planner, (2)
Boolean formula being true. MAXPLAN trans-
PRS-CL is a reactive plan execution system, and
forms a planning problem into an instance of
(3) GISTER-CL is an uncertain reasoning system.
E-MAJSAT whose solution yields a plan with
CYPRESS is broader than the planners described
maximum probability of success, equal to the
here in that it allows planning and execution
maximum probability of the E-MAJSAT problem.
to be done concurrently. The HTN planning
The number of clauses in the formula is poly-
style, where domain operators describe a prob-
nomial in the size of the planning problem.
lem decomposition rather than specify a set of
The resulting instance of E-MAJSAT is solved
subgoals, is successful in applications of classi-
(1996) and Boutilier, Dean, and Hanks (1995). An MDP M is a tuple M = <S, A, Φ, R>, where SPolicy-Iteration(S, A, Φ, R,
is a finite set of states of the system. Ꮽ is a finite
1. For each s ∈ S, π(s) = RandomElement(A)
set of actions. Φ : A × S → ∏(S) is the state tran-
sition function, mapping an action and a state
to a probability distribution over S for the pos-
sible resulting state. The probability of reach-
(a, s, u)V (u) > V (s )
ing state s’ by performing action a in state s is
Set π'(s) = a if such an a exists;
written Φ(a, s, s’). R : S × A → is the reward
function. R(s, a) is the reward the system
receives if it takes action a in state s.
7. If π'(s) ≠ π(s) for some s ∈ S, go to 2.
A policy for an MDP is a mapping π : S → A
that selects an action for each state. Given apolicy, we can define its finite-horizon valuefunction Vπ : S → , where Vπ (s) is the expect-
ed value of applying the policy π for n steps
Figure 12. The Policy Iteration Algorithm.
starting in state s. The value function is
cal planning. It typically offers a reduced
defined inductively with Vπ(s) = R(s, π(s)) and
search space at the cost of requiring moreinformation to be specified as part of the
V (s) = R(s,π(s)) + ∑ Φ π
Over an infinite horizon, a discounted mod-
el is frequently used to ensure policies have a
bounded expected value. For some β chosen so
Almost every approach that has been used to
that β < 1, the value of any reward from the
solve classical planning problems has been
transition after the next is discounted by a fac-
adapted for decision-theoretic planning. This
tor of β and the one after that by a factor of β2,
section described decision-theoretic planners
and so on. Thus, if Vπ(s) is the discounted
based on SNLP, skeletal refinement planning,
expected value in state s following policy π for-
PRODIGY, and compilation to satisfiability. The
systems described have necessarily been only a
small sample of the work being done in this
(s) = R(s,π(s)) + β∑ Φ π
which yields a set of linear equations in the
Each described system concentrates on a dif-
ferent aspect of planning under uncertainty
A solution to an MDP is a policy that maxi-
and although each has been successful, their
mizes its expected value. For the discounted
relatively narrow focus has made meaningful
infinite-horizon case with any given discount
comparison difficult. Future work should lead
to broader systems that would allow us to com-
β, there is a policy V* that is optimal
regardless of the starting state (Howard 1960)
pare alternative strategies directly.
research questions. For example, is it possible
to exploit utility ranges to search for optimal
Two popular methods for solving this equa-
plans in SNLP or PRODIGY? Can plan compilation
tion and finding an optimal policy for an MDP
approaches deal with structured utility mod-
are (1) value iteration and (2) policy iteration
els? These approaches also have the potential
In policy iteration, the current policy is
although this area has not yet been explored.
repeatedly improved by finding some action ineach state that has a higher value than theaction chosen by the current policy for the
state. The policy is initially chosen at random,
and the process terminates when no improve-ment can be found. The algorithm is shown in
In this section, I review approaches to plan-
figure 12. This process converges to an optimal
aim is to give the flavor of MDP approaches,
In value iteration, optimal policies are pro-
the problems they engender, and solutions
duced for successively longer finite horizons
being currently explored and compare these
until they converge. It is relatively simple to
approaches with those of the previous section.
find an optimal policy over n steps π*(.), with
value function V*(.) using the recurrence rela-
Value-Iteration(S, A, Φ, R, β, ε):
1. For each s ∈ S, V
arg max {R(s, a) + β
with starting condition V*(.) = 0∀s ∈ S, where
V* is derived from the policy π* as described
t(s, a) = R(s, a) + β Σu∈ S Φ (a, s, u) Vt–1(u)
earlier. Figure 13 shows the value iteration
πt(s) = argmaxaQt(s, a)
algorithm, which takes an MDP, a discount
8. Vt(s) = Qt(s, πt(s))
value β, and a parameter ε and produces suc-
cessive finite-horizon optimal policies, termi-
t s) – Vt–1 (s) | ≥ ∈), go to 3.
between the current and previous value func-tions is below ε. It can also be shown that thealgorithm converges to the optimal policy forthe discounted infinite case in a number of
Figure 13. The Value Iteration Algorithm.
steps that is polynomial in | S |, | A |, log maxs,a
Policy iteration and value iteration can find
| R(s, a) | and 1 / (1 – β).
optimal policies in polynomial time in the size
of the state space of the MDP. However, this
state space is usually exponentially large in theinput to a planning problem, which includes a
Solving an MDP is essentially the same prob-
set of literals whose cross-product describes the
lem as planning under uncertainty that was
state space. Attempts to build on these and
discussed in the first part of this article, with
other techniques for solving MDPs have con-
some minor differences. The standard algo-
centrated on ways to gain leverage from the
rithms for MDPs find a policy, which chooses
structure of the planning problem to reduce
an action for every possible state of the under-
lying system, and methods based on classicalplanners expect a set of possible initial states as
input to the problem and find a sequence of
actions (possible branching) based on them. The difference is in part that the MDP solution
Dean et al. (1993) used policy iteration in a
methods emphasized the use of dynamic pro-
restricted state space called an envelope. A subset
gramming and in part that the AI planning
of the states is selected, and each transition in
methods emphasized the use of structured rep-
the MDP that leaves the subset is replaced with
resentations for the state space and actions.
a transition to a new state Out with zero reward.
However, the improvements to policy and val-
No transitions leave the Out state. They devel-
ue iteration discussed here exploit structured
oped an algorithm that alternated between solv-
ing the restricted-space MDP with policy itera-
Another difference is in the specification of
tion and expanding the envelope by including
goals. MDPs attach a reward to an action in a
the n most likely elements of the state space to
state, and classical planning takes a logical
be reached by the optimal policy that were not
description of a set of states as a goal. Some of
in the envelope. The algorithm converges to an
the planners described in the last section
optimal policy considerably more quickly than
attach utilities to goals, however, and again, as
standard policy iteration on the whole state
the MDP algorithms described later make use
space, but as Dean et al. (1995) point out, it
of structure, their objective functions become
makes some assumptions that limit its applica-
bility, including a sparse MDP in which each
The standard MDP algorithms seek a policy
state has only a small number of outward tran-
with MEU. Half the planners in the last section
sitions. Tash and Russell (1994) extend the idea
also seek a plan with MEU, and half seek a plan
of an envelope with an initial estimate of dis-
that passes a threshold probability of success.
tance to goal for each state and a model that
Again, this difference is minor because given a
takes the time of computation into account.
thresholding planner, one could produce an
optimizing planner by repeatedly increasingthe threshold until no plan is found. Con-
versely, one could terminate policy iteration
ignores portions of the state space, other tech-
S | linear equations (step 2 in figure 12,
π by a series of value functions V0,
there is a set of observation labels OP(o | a, s), o ∈ O, a ∈ A, s ∈ S, such that
als that can directly or indirectly affect
state s with action a, it receives the
observation label o with probability
value function is again built in a series
P(o | a, s). Cassandra, Kaelbling, and
chies for classical planners. This subset
size is exponential in the set of literals,
function R. In this way, the algorithm
labels using Bayes’s rule. A form of val-
states to be considered in each one.
ferent literals can be relevant in differ-
ed structure in the state-utility descrip-
tion, it is also possible to exploit it in
technique called structured policy itera-tion that uses a structured action repre-
styles and attack different pieces of the
decision-theoretic-planning problem. Intelligence Planning Systems, ed. B. Drabble,
27–34. Menlo Park, Calif.: AAAI Press.
–1028. Menlo Park, Calif.: American Associ-
Blythe, J. 1995. The Footprint Principle for
Heuristics for Probabilistic Planners. In New
Davis, M.; Logemann, G.; and Loveland, D. Directions in AI Planning, eds. M. Ghallab
Proving. Communications of the ACM
Dean, T., and Givan, R. 1997. Model Mini-mization in Markov Decision Processes. In
Events. In Proceedings of the Tenth Conferenceon Uncertainty in Artificial Intelligence, eds. R.
L. de Mantaras and D. Poole, 94–101. San
Blythe, J., and Veloso, M. 1997. Using Anal-
Association for Artificial Intelligence.
Dean, T., and Lin, S.-H. 1995. Decomposi-
ings of the Fourteenth National Conference
tion Techniques for Planning in Stochastic
on Artificial Intelligence, 668–673. Menlo
Domains. In Proceedings of the Fourteenth
Park, Calif.: American Association for Arti-
International Joint Conference on Artificial
Boutilier, C., and Dearden, R. 1994. Using
Calif.: International Joint Conferences on
Abstractions for Decision-Theoretic Plan-
ning with Time Constraints. In Proceedings
Artificial Intelligence, 1016–1022. Menlo
Constraints in Stochastic Domains. Artifi-
Park, Calif.: American Association for Arti-
cial Intelligence 76(1–2): 35–74.
Dean, T.; Kaelbling, L. P.; Kirman, J.; and
and typically impossible. Relatively lit-
Boutilier, C.; Brafman, R.; and Geib, C.
Nicholson, A. 1993. Planning with Dead-lines in Stochastic Domains. In Proceedings
1998. Structured Reachability Analysis forMarkov Decision Processes. In Proceedingsof the Fourteenth Conference on Uncertainty in
Artificial Intelligence, 574–579. Menlo
Artificial Intelligence, 24–32. San Francisco,
Park, Calif.: American Association for Arti-
Boutilier, C.; Brafman, R. I.; and Geib, C.
Draper, D.; Hanks, S.; and Weld, D. 1994.
thesis of Classical and Decision Theoretic
Proceedings of the Second International Con-
Planning. In Proceedings of the Fifteenth
ference on Artificial Intelligence Planning Sys-
International Joint Conference on Artificial
tems, ed. K. Hammond, 31–37. Menlo Park,
Intelligence. Menlo Park, Calif.: Interna-
one of the largest challenges ahead.
tional Joint Conferences on Artificial Intel-
Feldman, J. A., and Sproull, R. F. 1977. Deci-
sion Theory and Artificial Intelligence II:
Boutilier, C.; Dean, T.; and Hanks, S. 1995.
The Hungry Monkey. Cognitive Science1:158–192.
Planning under Uncertainty: StructuralAssumptions and Computational Leverage.
Fikes, R. E., and Nilsson, N. J. 1971. STRIPS: A
In New Directions in AI Planning, eds. M.
Ghallab and A. Milani, 157–172. Amster-
rem Proving to Problem Solving. Artificial
lie beyond our current capabilities.
Boutilier, C.; Dearden, R.; and Goldszmidt,
Friedland, P. E., and Iwasaki, Y. 1985. The
Construction. In Proceedings of the Four-
Plans. Journal of Automated Reasoning 1(2):
Artificial Intelligence, 1104–1111. Menlo
Givan, R., and Dean, T. 1997. Model Mini-
Park, Calif.: International Joint Confer-
Blum, A., and Furst, M. 1997. Fast Planning
STRIPS Planning. In Proceedings of the Fif-
through Planning Graph Analysis. Artificial
Brafman, R. 1997. A Heuristic Variable Grid
Intelligence 90(1–2): 281–300.
Artificial Intelligence. Menlo Park, Calif.:
Blythe, J. 1998. Planning under Uncertain-
ings of the Fourteenth National Conference
International Joint Conferences on Artifi-
ty in Dynamic Domains. Ph.D. dissertation,
on Artificial Intelligence, 727–733. Menlo
Park, Calif.: American Association for Arti-
Goldman, R. P., and Boddy, M. S. 1994. Ep-
silon-Safe Planning. In Proceedings of the
Blythe, J. 1996. Decompositions of Markov
Tenth Conference on Uncertainty in ArtificialIntelligence, eds. R. L. de Mantaras and D.
Change in Planners. In Proceedings of the
Partially Observable Stochastic Domains. In
Poole, 253–261. San Francisco, Calif.: Mor-
Third International Conference on Artificial
Search Control of Plan Generation in Deci-
plexity of Probabilistic Planning. Journal of
mant GRAPHPLAN. In Proceedings of the Fif-
sion-Theoretic Planning. In Proceedings ofArtificial Intelligence Research 9:1–36. the Fourth International Conference on Artifi-
Luce, R. D., and Raiffa, H. 1957. Games and
Intelligence, 889–896. Menlo Park, Calif.:
cial Intelligence Planning Systems, eds. R. Decisions: Introduction and Critical Survey.
American Association for Artificial Intelli-
Tash, J. K., and Russell, S. 1994. Control
Haddawy, P., and Hanks, S. 1998a. Utility
Strategies for a Stochastic Planner. In Pro-
retic Planners. Computational Intelligence
of Computer Science, Yale University.
ence on Artificial Intelligence, 1079–1085.
Majercik, S. M., and Littman, M. L. 1998.
Haddawy, P., and Hanks, S., eds. 1998b.
Planning. In Proceedings of the Fourth Inter-
Tate, A. 1977. Generating Project Networks. national Conference on Artificial Intelligence
In Proceedings of the Fifth International
Planning Systems, eds. R. Simmons and S.
Joint Conference on Artificial Intelligence,
888–893. Menlo Park, Calif.: International
Joint Conferences on Artificial Intelligence.
Newell, A., and Simon, H. A. 1963. GPS: A
Veloso, M.; Carbonell, J.; Pérez, A.; Borrajo,
Inheritance Abstraction. In Proceedings of
Program That Simulates Human Thought.
D.; Fink, E.; and Blythe, J. 1995. Integrating
the Second International Conference on Artifi-
In Computers and Thought, eds. E. A. Feigen-
Planning and Learning: The PRODIGY Archi-
cial Intelligence Planning Systems, ed. K.
baum and J. Feldman, 279–293. New York:
tecture. Journal of Experimental and Theoret-
Hammond. Menlo Park, Calif.: AAAI Press.
Haddawy, P.; Doan, A.; and Goodwin, R.
gency Selection in Plan Generation. Paper
Commitment Planning. AI Magazine 15(4):
ning: Techniques and Empirical Analysis.
In Proceedings of the Eleventh Conference on
Weld, D.; Anderson, C.; and Smith, D. E. Uncertainty in Artificial Intelligence, eds. P.
Besnard and S. Hanks, 229–326. San Fran-
Parr, R., and Russell, S. 1995. Approximat-
ing Optimal Policies for Partially Observ-
ceedings of the Fifteenth National Confer-
Haddawy, P.; Doan, A.; and Kahn, C. E.
able Stochastic Domains. In Proceedings of
ence on Artificial Intelligence, 897–904.
1996. Decision-Theoretic Refinement Plan-
the Fourteenth International Joint Confer-
ence on Artificial Intelligence, 1088–1094.
ment of Acute Deep Venous Thrombosis.
Menlo Park, Calif.: International Joint Con-
Wellman, M. P. 1990. Formulation of Trade-Medical Decision Making 16(4): 315–325.
ferences on Artificial Intelligence. Offs in Planning under Uncertainty. New York:
Howard, R. A. 1960. Dynamic Programming
Pearl, J. 1988. Probabilistic Reasoning inand Markov Processes. Cambridge, Mass.:
Intelligent Systems. San Francisco, Calif.:
Wellman, M. P.; Breese, J. S.; and Goldman,
R. P. 1992. From Knowledge Bases to Deci-
Kautz, H. A., and Selman, B. 1996. Pushing
Penberthy, J. S., and Weld, D. S. 1992.
sion Models. Knowledge Engineering Review
the Envelope: Planning, Propositional Log-
ic, and Stochastic Search. In Proceedings of
Wilkins, D. E.; Myers, K. L.; Lowrance, J. D.;
Third International Conference on Princi-
Artificial Intelligence. Menlo Park, Calif.:
and Wesley, L. P. 1995. Planning and React-
ples of Knowledge Representation and Rea-
American Association for Artificial Intelli-
ments. Journal of Experimental and Theoreti-
Peot, M. A., and Smith, D. E. 1992. Condi-
erating Abstractions for Problem Solving.
Williamson, M., and Hanks, S. 1994. Opti-
tional Nonlinear Planning. In Proceedings of
mal Planning with a Goal-Directed Utility
the First International Conference on Artificial
Department, Carnegie Mellon University. Intelligence Planning Systems, ed. J. Hendler,
Model. In Proceedings of the Second Interna-
Kushmerick, N.; Hanks, S.; and Weld, D. tional Conference on Artificial Intelligence
1995. An Algorithm for Probabilistic Plan-
ning. Artificial Intelligence 76(1–2): 239–286.
176–181. Menlo Park, Calif.: AAAI Press.
Pérez, M. A., and Carbonell, J. 1994. Con-
Kushmerick, N.; Hanks, S.; and Weld, D.
trol Knowledge to Improve Plan Quality. In
1994. An Algorithm for Probabilistic Least-
Proceedings of the Second International Con-ference on Artificial Intelligence Planning Sys-
the Twelfth National Conference on Artifi-
tems, ed. K. Hammond, 323–328. Menlo
Jim Blythe is a research scientist at the Uni-
cial Intelligence, 1073–1078. Menlo Park,
versity of Southern California’s Informa-
Calif.: American Association for Artificial
Pryor, L., and Collins, G. 1996. Planning for
Journal of Artificial Intelligence
Sequential Decision Making. Ph.D. disserta-
Puterman, M. 1994. Markov Decision
acquisition, reasoning under uncertainty,
Processes: Discrete Stochastic Dynamic Pro-
and machine learnig. His e-mail address is
Dr. Ella Woods, DAOM, LA, Dipl. OM Acupuncture & Herbal Medicine PERSONAL INFORMATION TODAY'S DATE: HEIGHT : WEIGHT: DATE OF BIRTH : AGE: GENDER : OCCUPATIONAL INFORMATION OCCUPATIONAL STRESS (PHYSICAL/CHEMICAL/PSYCHOLOGICAL) PHYSCIAN INFORMATION PRIMARY DOCTOR : PHONE: INSURANCE INFORMATION MISSED APPOINTMENT POLICY If you need to change or c
H4 ELECTRONICS INDUSTRY H4.1 Machinery for Electronic Components In exercise of the powers conferred by sub-section(1) of section01.03.2002 25 of the Customs Act,1962 (52 of 1962), the Central Government,being satisfied that it is necessary in the public interest so to do, hereby exempts the goods specified in column(2) of the Table below, and falling under Chapters 69, 82,84,85 or 90 of t