Decision-theoretic planning

AI Magazine Volume 20 Number 2 (1999) ( AAAI) Decision-Theoretic
■ The recent advances in computer speed and algo- decision-theoretic planners, which in general rithms for probabilistic inference have led to a resurgence of work on planning under uncertain- ty. The aim is to design AI planners for environ- In this article, I provide a brief overview of ments where there might be incomplete or faulty some of the recent work in decision-theoretic information, where actions might not always have planning. I highlight some of the advances the same results, and where there might be trade-offs between the different possible outcomes of a made and some technical problems that still lie plan. Addressing uncertainty in AI, planning algo- ahead. In the next section, I describe the prob- rithms will greatly increase the range of potential lem addressed by decision-theoretic planning.
applications, but there is plenty of work to be done before we see practical decision-theoretic planning based on classical planning algorithms and the systems. This article outlines some of the chal- lenges that need to be overcome and surveys someof the recent work in the area.
A planning problem in AI is usually specified as follows: Given a description of the current its goals. In problems where actions can lead to state of some system, a set of actions that can a number of different possible outcomes, or be performed on the system and a description where the benefits of executing a plan must be of a goal set of states for the system, find a weighed against the costs, the framework of sequence of actions that can be performed to decision theory can be used to compare alter- transform the system into one of the goal states. In the textbook example of the blocks world, the system consists of a set of named has maintained an interest in decision-theoret- blocks, each of which can be on the ground or ic planning, with Feldman and Sproull’s (1977) on top of another block. The possible actions work being one of the earliest efforts in this move a block from one location to another, and the goal is a particular configuration of focused on efficient algorithms for planning under quite restrictive assumptions, the past When planning programs are used to provide five years have seen a resurgence in work on courses of action in real-world settings, such as planning under uncertainty and decision-the- medical treatments, high-level robot control, or oretic planning for several reasons: The recent disaster relief, they must take into account the advances in Bayesian inference have made it fact that actions can have several different out- comes, some of which might be more desirable expected utility of a plan under conditions of than others. They must balance the potential of uncertainty. The success of Markovian ap- some plan achieving a goal state against the risk proaches in areas such as speech recognition of producing an undesirable state and against and the closely related reinforcement learning techniques has encouraged work in planning Decision theory (Luce and Raiffa 1957) pro- using Markov decision processes (MDPs).
vides an attractive framework for weighing the Faster computers have made it feasible to build strengths and weaknesses of a particular course Copyright 1999, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1999 / $2.00 of action, with roots in probability theory and more of these assumptions in a classical plan- utility theory. Given a probability distribution ner. First, we can relax the assumption that over the possible outcomes of an action in any actions are deterministic by specifying a set of state, and a reasonable preference function possible outcomes rather than the single out- over outcomes, we can define a utility function come in the deterministic case. Consider the planning problem of moving my kitchen china would prefer one plan over another, the pre- from an old apartment to a new one. The goal ferred plan has higher expected utility. The is to move the china unbroken, specified as task of the planner then seems straightfor- (at china new-apartment) ∧ ¬ (broken china) Figure 1 shows three increasingly detailed descriptions of the drive-china action that can Unfortunately, decision theory says nothing achieve this goal. The first is deterministic and about the task of constructing a plan with high could be used in a plan that achieves the goal expected utility. Therefore, AI planning and with certainty: (put-in china), (drive-china decision theory would appear to be comple- mentary, and there has been interest in merg- extra operators in figure 2. The second drive- ing the two approaches for a considerable time china operator has two alternative sets of (Feldman and Sproull 1977). However, there effects, modeling two different world states are hard problems to overcome. It is usually that can arise if the operator is applied. The not feasible to search the entire space of plans numbers on the arcs are the probabilities of to find the MEU plan. Indeed, computing the each state of affairs. Under this model, the expected utility of a single plan can be prohib- two-step plan will only achieve the goal with itively expensive because the number of possi- ble outcomes can grow very large. To avoid In the third formulation of drive-china, dif- specifying the value of each possible state, we ferent sets of outcomes are possible based on need to find compact specifications of utilities, the conditions that hold when the operator is as well as actions, and consider the interaction applied, and the arcs now show conditional between them. The recent growth of research probabilities. The two-step plan still has a in decision-theoretic planning follows the probability of 0.7 of succeeding. The three-step recent success of work in reasoning under plan (pack china) (put-in china) (drive-china uncertainty and draws on compact representa- …) succeeds with probability 0.95.
tions such as those of belief nets (see the article by D’Ambrosio, also in this issue).
goals are logical descriptions of world states, so that a plan either succeeds or fails. We can attach utilities to different world states repre- senting degrees of desirability. For example, Since the early planners GPS (Newell and Simon suppose the state in which all my china is 1963) and STRIPS (Fikes and Nilsson 1971), most moved unbroken has a utility of 10, and one in planners that have been designed make essen- which half my china is moved and none is bro- tially the same assumptions about the world in ken has a utility of 6; any state in which china which they operate. From the point of view of is broken has a utility of –100; and any other decision-theoretic planing, three assumptions state has a utility of 0. If the pack operator can of note are (1) the goal of the planner is a log- only pack half the china, the 2-step plan has ical description of a world state, (2) the actions an expected utility of –23, and the 3-step plan taken by the planner are the only sources of has an expected utility of 1. In this case, the change in the world, and (3) each action can plan to pack some of the china is slightly bet- be described by the conditions under which it ter than doing nothing, which has a utility of can be applied and its effects on the world.
0, and the plan to simply move the china with- These effects are described by a set of facts that are either to be added to, or deleted from, theworld state in which the action is applied.
The last is known as the STRIPS assumption. I refer to planners that share these assumptions as the classical planners. They include PRODIGY(Veloso et al. 1995), UCPOP (Penberthy and In the next four subsections, I describe four Weld 1992), GRAPHPLAN (Blum and Furst 1997), and the hierarchical task network planner NON- uncertainty that can be viewed as extending a classical planning approach. These are SNLP- Here, I describe work that relaxes one or based planners, DRIPS, WEAVER, and MAXPLAN, (at car ?y)~(at china ?x) (at china ?y) Nondeterministic effects with conditional probabilities Figure 1. Three Versions of the Operator Drive-China. although the last of these is arguably not a classical planner. Organizing the article around Several systems based on classical planning specific research programs makes it easier to ignore outcome utility beyond the binary mea- keep in mind the specific assumptions behind sure of goal achievement, although exceptions each one, at the cost of making it harder to fol- low how they address some of the issues and PYRRHUS (Williamson and Hanks 1994). In plan- challenges that must be faced in creating deci- ners based on Markov decision processes, dis- sion-theoretic planners. In this section, I cussed later in this article, a numeric reward briefly describe some of these issues as a way of function is specified for an action in a state.
forming a road map for understanding the dif- Although this is adequate from a purely deci- sion-theoretic point of view, a more structuredrepresentation of utility is needed to make planning tractable, one where the utility of a As the china-moving example shows, to repre- state is built up from smaller building blocks so sent uncertain actions one has to represent sev- that it is less tedious to specify, and the utility eral alternative outcomes and their probabili- of abstract states can be estimated. The prob- ties, conditional on the state. Providing a richer lem of finding such a representation has been representation of action inevitably makes the Some planners relax the binary measure of planning problem harder, and planners must goal achievement to allow partial satisfaction use a representation that expresses the different of goals. In the china-moving example, the full outcomes as concisely as possible yet is still ful- goal is to move all the china unbroken, but we ly expressive. The representation of the final might attach partial credit for moving some of drive-china operator is not very concise, for it unbroken. Given a conjunctive set of goals, example, because the four possible outcomes sometimes the utility from solving a subset can are described explicitly, even though the two be specified as the sum of some utility for each branches only differ in their probabilities, and goal. Care must be taken with this approach; each outcome only differs on whether the chi- for example, in the china-moving example, na is broken. When different state variables are the top-level goals are (at china new-apart- affected independently by an action, it is pos- ment) and (not (broken china)), but a planner sible for the number of outcomes to grow expo- that achieves the first goal without the second nentially in the number of variables.
should not receive half the credit. Sometimes of actions executed blind. In normal MDPs,the output is a policy, a mapping from state toaction that assumes that every state variablecan be observed without error, although some work that relaxes this assumption is described later. In the BURIDAN planner (Kushmerick, Hanks, and Weld 1995), actions can have sens- ing results, as well as effects in the world, andnoisy sensors can be modeled.
In an uncertain world, high-utility plans cre- ated in advance might need to have someactions that are conditional on future observa-tions. Most of the planners discussed are capa-ble of creating such plans, employing a varietyof techniques. Note that planning agents mustbe able to make observations of their domainduring execution to use conditional plans.
Figure 2. Other Operators in the Moving-House Example. Perhaps the largest question mark hangingover the planners described here is theirtractability. Few have been applied in large, the utility of an outcome can be made more high-dimensional domains, although excep- modular by summing the cost of each operator independently (Perez and Carbonell 1994), 1996) and WEAVER (Blythe 1998). In addition, it is rare that the planners are directly compared with each other, although MAXPLAN is an excep- that further assumes a deadline by which a goal tion (Majercik and Littman 1998). The lack of should ideally be achieved, with a separate comparative analysis is partly because as we penalty function that offsets the utility for the will see, the planners have concentrated on goal when it is achieved after the deadline.
different aspects of the planning problem.
They propose a representation that encompass- The tractability of the planners is affected by es most of these approaches independently of the kind of planning algorithm used: SNLP, any planner in Haddawy and Hanks (1998a).
PRODIGY, refinement planning, hierarchical tasknetwork (HTN) planning, and compilation to satisfiability have all been used. It is also affect- Whether outcome utility is multivalued or ed by the use of search control, which has been simply binary, one must compute or estimate the expected utility of a plan. For the degener-ate case, this is simply the probability of suc- cess. There are several different ways to com-pute or estimate utility, as we will see, and SNLP maintains a plan as a partially ordered set none completely dominates any other. For the of actions, along with ordering constraints problem of deciding which partial plan to between steps, bindings for variables in steps, expand, all that is required is a comparative and explicit data structures, called causal links, value—deciding which plan has higher expect- that represent what the step is supposed to ed utility—which is sometimes simpler than accomplish in the plan (Weld 1994). A causal finding the absolute value. For efficient plan- link records the subgoal being achieved, the ning, one must also consider the trade-off consumer step that has this subgoal as a precon- between time spent estimating plan value and dition, and the provider step that has the sub- time spent expanding a plan whose value is goal as an effect. For example, the arrow between drive-china and goal in figure 3 is acausal link for the subgoal (at china new-apart- ment), with drive-china as provider and goal as consumer. Goal is a dummy step added to the Most of these systems distinguish between plan to represent the top-level goals. Similarly, observable and nonobservable domain vari- initial is a dummy step representing the initial ables. A classical STRIPS plan essentially assumes state. The SNLP algorithm follows the loop no observability because the plan is a sequence shown in figure 4. It repeatedly picks an (at china old-apartment)(at car old-apartment)~(broken china) ~(at car ?x) (at car ?y)~(at china ?x) (at china ?y) Figure 3. A Plan to Move China, as Found by SNLP. unsupported condition in the plan, adds a cess of the plan and terminates if it is above causal link supporting it from some step that the given threshold. Figure 5 shows a plan found by BURIDAN in the china-moving exam- threats to the link by forcing any step that ple if the third model of action is used.
negates the condition to take place either before or after both linked steps. By consider- because with probability 0.3, executing drive- ing all possible choices for causal links and china can lead to the china being broken. This threat orderings, SNLP is complete; that is, it step is therefore a threat to the link for (broken will find a plan if one exists. For more details, china) from Initial to Goal. In addition to SNLP’s reorderings to remove threats, BURIDANcan confront a threat by decreasing the proba- bility of an outcome in which the threatened BURIDAN (Kushmerick, Hanks, and Weld 1995) condition is negated. In this case, this is done is a modified version of the SNLP planning algo- by adding a link from the outcome with prob- rithm that can create plans that meet a thresh- ability 0.95 that does not add (broken china) old probability of success when actions are and planning for its trigger (packed china).
nondeterministic, as are the last two actions in BURIDAN can then find the plan [pack-china, figure 1. BURIDAN differs from SNLP by allowing put-in china, drive-china], which succeeds more than one causal link for each condition in the plan. Under different execution scenar- To this point, we haven’t discussed how a ios, different actions can cause the condition planner finds the probability that a plan suc- to be true, so the links combine to increase ceeds. A number of possible strategies are support for the condition. SNLP’s termination based on forward projection or on the analysis criterion that all conditions have exactly one of the dependencies among state descriptors.
link is no longer appropriate. Instead, BURIDAN Empirical experiments show no clear winner explicitly computes the probability of the suc- ■ If the current plan has no unsupported conditions, return it.
■ Pick an unsupported condition P for some step U and add a causal link to the plan that supports it, achieving the condition with either an existingor a new step, A. (On backtracking, try all possible links.) ■ Resolve any threats to the new link. A threat comes from any step T in the plan that can take place between A and U and can negate P. Resolvethe threat by ordering T either before A or after U if possible (onbacktracking, try both orderings). If any threat cannot be resolved, fail andbacktrack.
Figure 4. An Outline of the SNLP Algorithm. (at china old-apartment)(at car old-apartment)~(broken china) Figure 5. A Plan to Move China, Found by BURIDAN. Features
Table 1. Using Forward Projection to Evaluate the Plan for the China-Moving Problem. Forward projection begins with the set of possible initial states and simulates executingthe plan one step at a time, maintaining the set of possible states and their probabilitiesafter each step is completed. When the simula-tion is finished, summing the probability of pack-china
each state in which the goals are satisfied givesthe plan’s probability of succeeding. The top oftable 1 shows the sets of states that are built as the three-step plan is evaluated by forwardprojection. To make the example more inter- put-in-china
esting, we assume that the pack-china actionsucceeds with probability 0.5 and, otherwise, has no effect. It therefore leads to two states inthe forward projection, which are propagated drive-china
through the put-in-china step and lead to fourpossible states after the drive-china action. Theprobability of success, 0.825, is found by sum-ming the probabilities of states.
In general, the number of states considered can clearly grow exponentially with the lengthof the plan. Many of the states created mightdiffer only on features that are irrelevant to the plan. This observation suggests other plan-evaluation strategies that exploit the structureof the plan. One such strategy is to evaluatethe plan using a belief net (Pearl 1988). The Figure 6. Using a Belief Net to Evaluate the Plan STRIPS representation for action, in which the preconditions contain sufficient information The nodes with bold text represent actions in the plan, while the others represent to completely determine the outcomes of the fluent values at a given point in time.
action, is a special case of the Markov assump-tion, which is preserved in the action represen- a random variable representing either the val- tation used here (Wellman 1990). One can use ue of a state descriptor at a point in time or the this observation to create a belief net that can outcomes of an action taken in the plan. The be queried to determine the probability of the final action goal is included, with two possible success of the plan. There are several different values: true or false. The probability that this ways in which the belief net can be created; node has value true is equal to the plan’s prob- one example is shown in figure 6. Each node is (at china old-apartment)(at car old-apartment)~(broken china) Drive-china-around-mountain ?x ?y Drive-china-over-mountain ?x ?y Figure 7. A Conditional Plan Represented Using Context Labels Attached to Actions. a plan, but they can easily represent plans thatbranch and then re-merge as well as partially CNLP was the first SNLP-based planner to repre- sent conditional plans (Peot and Smith 1992).
C-BURIDAN refines the approach of CNLP by Each step in the plan has a set of context labels keeping observations distinct from effects and associated with it that denote the conditions under which the step will be executed. For action to have the same observation label, thus example, figure 7 shows a conditional plan for modeling partial observability (Draper, Hanks, moving the china in which the agent loads the and Weld 1994). In C-BURIDAN, new contexts china in the car and then listens to the weath- er report before deciding whether to take the actions that can be put in separate branches of route over the mountain or around the moun- a conditional plan. Obviously, the ability to tain. The operator get-weather-report has two create conditional plans can make the plan- possible outcomes, each producing one of the ning problem even less tractable. Most condi- observation labels ok or bad. Each of the twodriving operators is marked with one of the tional planners do not require a plan to be pro- labels as its context. Thus, they must be exe- duced for every contingency (althoughCASSANDRA cuted after get-weather-report, and drive-chi- na-over-mountain, for example, will only be 1996]). Onder and Pollack (1997) have studied executed if the label ok was produced. If any how to choose which contingencies to plan subsequent actions had no context labels, they for, both because of the need to improve effi- would be executed on both branches of the ciency and because of the observation that conditional plan. Contexts are not as direct a some contingencies can safely be left for representation as putting an explicit branch in Figure 8. Ground and Abstract Operators from the China-Moving Example That Could Be Used by the Refinement Planner DRIPS. The thumbs-up symbol denotes an outcome in which the china is moved successfully, and the thumbs-down icon denotes an outcome inwhich the china is broken.
repeatedly choosing a more specific version ofan operator in the plan until there are no The planners described earlier search for plans abstract operators left. The planner begins that exceed a given minimum expected utility.
with one of a set of skeletal plans containing To find the plan with MEU, however, one must somehow evaluate all possible plans to select library. Figure 8 shows some example ground the best one. In DRIPS, ranges of utility values and abstract operators from the china-moving are computed for partial plans, encompassing example, extended with a richer model of the the best and worst expected utilities of all pos- sible completions of the partial plan (Haddawy and Suwandi 1994). If the lowest value in some tion network that describes all possible special- partial plan’s range exceeds the highest value izations of a skeletal plan to move the china.
in another’s, the partial plan with the lower Solid lines denote decomposition of an opera- range can be dropped from consideration with- tor, and dashed lines denote possible choices out expanding all the complete plans below it, for an abstract operator. The skeletal plan an approach that can lead to significant sav- encodes four potential plans: One can choose ings. The dropped plan is said to be dominated whether to pack the china, and one can inde- by the other plan, an idea also explored by pendently choose which route to take.
Suppose DRIPS begins refining this plan by DRIPS maintains ranges of utility values using choosing between “pack-and-load” and “load” to refine the “load-up-china” operator. The abstraction hierarchy of operators (Friedland planner will typically add both refinements to and Iwasaki 1985). In this approach, a partial a queue of partial plans, but first it computes plan is a sequence of operators, one or more of ranges of expected utility for each one. To do which can be an abstraction of a number of this, it explicitly considers each set of possible Figure 9. Ground and Abstract Operators from the China-Moving Example That Could Be Used by the Refinement Planner DRIPS. outcomes, or chronicle, for each plan, as shown in table 2. Each plan has two possible chroni- search control within the DRIPS framework.
cles, either the china gets broken, or it does DRIPS has two choice points: (1) which abstract not. For each chronicle, DRIPS computes ranges plan to specialize and (2) which abstract action for the duration, utility, and probabilities in a plan to specialize. The plan with the cur- based on the ranges or values found in the rent highest upper bound on expected utility actions in the plan. A series of simple linear must be expanded to provably find the opti- programming problems are solved to compute mal plan; so, choosing such a plan to expand ranges for the expected utility of each abstract gives a search heuristic similar to A* (Goodwin plan. In this case, the ranges are [–17.6 0.7] for and Simmons 1998; Haddawy, Doan, andGoodwin 1995). The representation of utilities plan A and [–37 –23] for plan B. In this case, in DRIPS allows for goal deadlines and mainte- DRIPS will not add plan B to the list of plans to refine because no refinement can do better (1998a). The system has been applied in chal- than any refinement of plan A. DRIPS deter- mined this without exploring the refinements results (Haddawy, Doan, and Kahn 1996).
of plan B, an ability that can lead to large com-putational savings in more complicateddomains.
The use of skeletal refinement planning in DRIPS trades planning power for simplicity andthe ability to compute utility ranges for partial WEAVER is a probabilistic planner based on plans. In some complex domains, it can be PRODIGY (Veloso et al. 1995). Like C-BURIDAN, it burdensome to represent the range of possible has no representation of utilities. It has no plans as skeletal plans that can be refined in explicit model of observability but assumes the this way. However, this form allows utility world is completely observable at plan-execu- ranges to be computed for partial plans based tion time. However, it has an explicit represen- on the ranges computed for abstract actions. It tation for uncertain exogenous events, in oth- is not clear how to compute utility ranges for, er words, for events that take place outside the say, partial BURIDAN plans because they include volition of the planning agent and that can be no commitment for the action or actions used modeled as occurring with some conditional probability given the world state. External Plan A: Pack and Load China
Plan B: Load China, Don't Pack
Table 2. A Summary of the Possible Outcomes for Each Refinement of the Skeletal Plan Found by Refining the Load-China Abstract Action. events often play a significant role in real domain. For example, the drive-china actions domains and should be considered by a deci- could include effects describing the landlord’s sion-theoretic planner because (1) they can be whereabouts, as could the pack-china operator influenced at least indirectly by altering the if it takes a significant amount of time. In prac- world-state conditions they depend on and (2) tice, modeling exogenous events in this way contingent plans can be built based on their can be extremely inefficient because the effects of these events are duplicated in every operator in the domain, and conversely, each operator suppose that to unload the china at the new apartment, the landlord must be waiting there with the key when the car arrives. Initially, the there are exogenous events using a three-step landlord is waiting, but as time passes, he process: First, a plan is built and partially eval- might get restless and leave, in which case he uated ignoring exogenous events. In the chi- might come back later. A planning agent can na-moving domain, the same three-step plan reduce the probability that the landlord will that is found by BURIDAN will be found: (1) leave by calling when the car is close to the pack-china, (2) put-in china, and (3) drive-chi- apartment. If he is not waiting when the car na. Second, the goal structure for this plan is arrives, the agent can search for him. Figure 10 shows an exogenous event specification mod- events, which are those whose effects can alter eling the landlord’s whereabouts over time as the domain features required by the plan or well as the operators call-landlord and search- can influence other relevant events. Third, the for-landlord. Assuming a discrete-time model, plan is improved by either adding steps to the landlord-moves events mean that if at any reduce the probability of an undesirable effect time point, the landlord is at the apartment or adding a conditional branch. The plan- and has not been called, he is at the apartment improvement operations are similar to those at the next time point with probability 0.95, of C-BURIDAN (Draper, Hanks, and Weld 1994) and his whereabouts are unknown with prob- but use the PRODIGY planner (Veloso et al.
ability 0.05. If he has been called, the probabil- 1995). The main difference is that the exoge- ity of his leaving drops to 0.02. If his where- nous events that affect the plan evaluation are abouts are unknown, he can reappear at the also considered as well as the actions in the apartment with probability 0.01. These events example, a possible improvement is to add a conditional branch to search for the landlord if In theory, the use of exogenous events does not increase the representational power of the planner because the effects of these events can The belief net on the left in figure 11 shows be modeled in the effects of the actions in the an evaluation structure for the plan before Figure 10. Two Exogenous Events Modeling the Landlord’s Whereabouts and Operators to Call and Search for the Landlord. Figure 11. Two Time-Slice Bayes’s Nets That Can Be Used to Evaluate the Plan under Exogenous Events. In the first, exogenous events are represented directly; in the second, their effects are compiled into Markov chains that are used to computeprobabilities over the time intervals of interest. The numbers along the x-axis represent time points, while nodes for fluents, actions, andevents are arranged along the y-axis.
improvement, after the relevant events gov- abouts. The Markov chains are used to com- pute links in the belief net on the right in fig- been added. It is possible to make this evalua- ure 11. Such a belief net can be much more tion more efficient by taking advantage of the efficiently evaluated, especially when some actions have long durations compared with Markov process. Rather than explicitly model- those of the exogenous events. WEAVER uses a ing the events’ effects at each time step, inde- sound algorithm to create these streamlined belief nets automatically (Blythe 1996).
describe parts of the world state’s evolution Search heuristics for planning under uncer- over time—in this case, the landlord’s where- tainty have also been studied in WEAVER. When steps are added to a plan to increase the num- ber of chronicles in which it succeeds, a useful (DPLL) algorithm (Davis, Logemann, and Love- strategy is to seek steps that succeed in cases land 1962). This algorithm systematically finds that are complementary to those in which the the best assignment by constructing a binary original plan succeeds. These steps can be tree in which each node represents a variable, found using a local analysis of the possible found by conditioning on the variable. DPLL improve performance both in WEAVER and BURI- either maximizes or computes the weighted DAN (Blythe 1995). WEAVER also makes use of average of the subtrees depending on whether analogy between different branches of a condi- the node is a choice or a chance variable. At tional plan (Blythe and Veloso 1997). These each node, the algorithm eliminates all irrele- vant variables (that only appear in satisfied applied to problems requiring several rounds clauses), forced variables (that appear alone in of improvement in plans some tens of steps in an unsatisfied clause), and “pure” variables length (Blythe 1998). The results are encourag- (that appear either always negated or always ing in the search for tractable planning under not negated). The authors have experimented uncertainty but leaves plenty of room for fur- with different orderings on the variables.
Also building on a recent classical planning algorithm is work to create a probabilistic planner based on GRAPHPLAN (Smith and Weld 1998; Weld, Anderson, and Smith 1998).
One way to improve the efficiency of planning under uncertainty can come from inspiration from recently developed classical planning There is a growing literature on extending clas- algorithms, such as SATPLAN (Kautz and Selman uncertainty. I apologize to those whose work I into a logical satisfiability problem. After this step, standard algorithms can be used to solve Goldman and Boddy (1994) explore the idea of relaxing the assumption that the outcomes of different actions in a plan are probabilisti- probabilistic planner that is similar in spirit to cally independent, using a knowledge-based SATPLAN. Littman, Goldsmith, and Mundhenk (1998) show that under reasonable conditions, Breese, and Goldman 1992). This assumption probabilistic planning is NPPP-complete and is made in all the planners discussed previous- introduce an NPPP-complete decision problem ly, although the causal influences used by E-MAJSAT. MAXPLAN compiles a probabilistic plan- Goldman and Boddy bear similarities to the ning problem into an instance of E-MAJSAT and solves it. Just as GRAPHPLAN and SATPLAN are more efficient than classical planners in some using a rich language that can express do- domains, so MAXPLAN should have an advan- while loops and include arbitrary Lisp code.
tage in some domains, as preliminary experi- The system selects a plan from a user-defined library and applies transformations to improve An instance of E-MAJSAT is a satisfiability its performance. Because of the rich plan rep- problem, a Boolean formula whose variables resentation, it is hard to evaluate plans analyt- are divided into two types: (1) choice variables and (2) chance variables. The chance variables tion scenarios called projections to compare are each assigned an independent probability of being true. The task is to find an assignment CYPRESS (Wilkins et al. 1995) is a framework of truth values to the choice variables that for probabilistic planning built around three components: (1) SIPE-2 is an HTN planner, (2) Boolean formula being true. MAXPLAN trans- PRS-CL is a reactive plan execution system, and forms a planning problem into an instance of (3) GISTER-CL is an uncertain reasoning system.
E-MAJSAT whose solution yields a plan with CYPRESS is broader than the planners described maximum probability of success, equal to the here in that it allows planning and execution maximum probability of the E-MAJSAT problem.
to be done concurrently. The HTN planning The number of clauses in the formula is poly- style, where domain operators describe a prob- nomial in the size of the planning problem.
lem decomposition rather than specify a set of The resulting instance of E-MAJSAT is solved subgoals, is successful in applications of classi- (1996) and Boutilier, Dean, and Hanks (1995).
An MDP M is a tuple M = <S, A, Φ, R>, where S Policy-Iteration(S, A, Φ, R, is a finite set of states of the system. Ꮽ is a finite 1. For each sS, π(s) = RandomElement(A) set of actions. Φ : A × S → ∏(S) is the state tran- sition function, mapping an action and a state to a probability distribution over S for the pos- sible resulting state. The probability of reach- (a, s, u)V (u) > V (s ) ing state s’ by performing action a in state s is Set π'(s) = a if such an a exists; written Φ(a, s, s’). R : S × A → ᏾ is the reward function. R(s, a) is the reward the system receives if it takes action a in state s.
7. If π'(s) ≠ π(s) for some sS, go to 2.
A policy for an MDP is a mapping π : S A that selects an action for each state. Given apolicy, we can define its finite-horizon valuefunction Vπ : S → ᏾, where Vπ (s) is the expect- ed value of applying the policy π for n steps Figure 12. The Policy Iteration Algorithm. starting in state s. The value function is cal planning. It typically offers a reduced defined inductively with Vπ(s) = R(s, π(s)) and search space at the cost of requiring moreinformation to be specified as part of the V (s) = R(s,π(s)) + ∑ Φ π Over an infinite horizon, a discounted mod- el is frequently used to ensure policies have a bounded expected value. For some β chosen so Almost every approach that has been used to that β < 1, the value of any reward from the solve classical planning problems has been transition after the next is discounted by a fac- adapted for decision-theoretic planning. This tor of β and the one after that by a factor of β2, section described decision-theoretic planners and so on. Thus, if Vπ(s) is the discounted based on SNLP, skeletal refinement planning, expected value in state s following policy π for- PRODIGY, and compilation to satisfiability. The systems described have necessarily been only a small sample of the work being done in this (s) = R(s,π(s)) + β∑ Φ π which yields a set of linear equations in the Each described system concentrates on a dif- ferent aspect of planning under uncertainty A solution to an MDP is a policy that maxi- and although each has been successful, their mizes its expected value. For the discounted relatively narrow focus has made meaningful infinite-horizon case with any given discount comparison difficult. Future work should lead to broader systems that would allow us to com- β, there is a policy V* that is optimal regardless of the starting state (Howard 1960) pare alternative strategies directly.
research questions. For example, is it possible to exploit utility ranges to search for optimal Two popular methods for solving this equa- plans in SNLP or PRODIGY? Can plan compilation tion and finding an optimal policy for an MDP approaches deal with structured utility mod- are (1) value iteration and (2) policy iteration els? These approaches also have the potential In policy iteration, the current policy is although this area has not yet been explored.
repeatedly improved by finding some action ineach state that has a higher value than theaction chosen by the current policy for the state. The policy is initially chosen at random, and the process terminates when no improve-ment can be found. The algorithm is shown in In this section, I review approaches to plan- figure 12. This process converges to an optimal aim is to give the flavor of MDP approaches, In value iteration, optimal policies are pro- the problems they engender, and solutions duced for successively longer finite horizons being currently explored and compare these until they converge. It is relatively simple to approaches with those of the previous section.
find an optimal policy over n steps π*(.), with value function V*(.) using the recurrence rela- Value-Iteration(S, A, Φ, R, β, ε): 1. For each sS, V arg max {R(s, a) + β with starting condition V*(.) = 0∀s ∈ S, where V* is derived from the policy π* as described t(s, a) = R(s, a) + β ΣuS Φ (a, s, u) Vt–1(u) earlier. Figure 13 shows the value iteration πt(s) = argmaxa Qt(s, a) algorithm, which takes an MDP, a discount 8. Vt(s) = Qt(s, πt(s)) value β, and a parameter ε and produces suc- cessive finite-horizon optimal policies, termi- t s)Vt–1 (s) | ≥ ∈), go to 3.
between the current and previous value func-tions is below ε. It can also be shown that thealgorithm converges to the optimal policy forthe discounted infinite case in a number of Figure 13. The Value Iteration Algorithm. steps that is polynomial in | S |, | A |, log maxs,a Policy iteration and value iteration can find | R(s, a) | and 1 / (1 – β).
optimal policies in polynomial time in the size of the state space of the MDP. However, this state space is usually exponentially large in theinput to a planning problem, which includes a Solving an MDP is essentially the same prob- set of literals whose cross-product describes the lem as planning under uncertainty that was state space. Attempts to build on these and discussed in the first part of this article, with other techniques for solving MDPs have con- some minor differences. The standard algo- centrated on ways to gain leverage from the rithms for MDPs find a policy, which chooses structure of the planning problem to reduce an action for every possible state of the under- lying system, and methods based on classicalplanners expect a set of possible initial states as input to the problem and find a sequence of actions (possible branching) based on them.
The difference is in part that the MDP solution Dean et al. (1993) used policy iteration in a methods emphasized the use of dynamic pro- restricted state space called an envelope. A subset gramming and in part that the AI planning of the states is selected, and each transition in methods emphasized the use of structured rep- the MDP that leaves the subset is replaced with resentations for the state space and actions.
a transition to a new state Out with zero reward.
However, the improvements to policy and val- No transitions leave the Out state. They devel- ue iteration discussed here exploit structured oped an algorithm that alternated between solv- ing the restricted-space MDP with policy itera- Another difference is in the specification of tion and expanding the envelope by including goals. MDPs attach a reward to an action in a the n most likely elements of the state space to state, and classical planning takes a logical be reached by the optimal policy that were not description of a set of states as a goal. Some of in the envelope. The algorithm converges to an the planners described in the last section optimal policy considerably more quickly than attach utilities to goals, however, and again, as standard policy iteration on the whole state the MDP algorithms described later make use space, but as Dean et al. (1995) point out, it of structure, their objective functions become makes some assumptions that limit its applica- bility, including a sparse MDP in which each The standard MDP algorithms seek a policy state has only a small number of outward tran- with MEU. Half the planners in the last section sitions. Tash and Russell (1994) extend the idea also seek a plan with MEU, and half seek a plan of an envelope with an initial estimate of dis- that passes a threshold probability of success.
tance to goal for each state and a model that Again, this difference is minor because given a takes the time of computation into account.
thresholding planner, one could produce an optimizing planner by repeatedly increasingthe threshold until no plan is found. Con- versely, one could terminate policy iteration ignores portions of the state space, other tech- S | linear equations (step 2 in figure 12, π by a series of value functions V0, there is a set of observation labels O P(o | a, s), o O, a A, s S, such that als that can directly or indirectly affect state s with action a, it receives the observation label o with probability value function is again built in a series P(o | a, s). Cassandra, Kaelbling, and chies for classical planners. This subset size is exponential in the set of literals, function R. In this way, the algorithm labels using Bayes’s rule. A form of val- states to be considered in each one.
ferent literals can be relevant in differ- ed structure in the state-utility descrip- tion, it is also possible to exploit it in technique called structured policy itera- tion that uses a structured action repre- styles and attack different pieces of the decision-theoretic-planning problem.
Intelligence Planning Systems, ed. B. Drabble, 27–34. Menlo Park, Calif.: AAAI Press.
–1028. Menlo Park, Calif.: American Associ- Blythe, J. 1995. The Footprint Principle for Heuristics for Probabilistic Planners. In New Davis, M.; Logemann, G.; and Loveland, D.
Directions in AI Planning, eds. M. Ghallab Proving. Communications of the ACM Dean, T., and Givan, R. 1997. Model Mini-mization in Markov Decision Processes. In Events. In Proceedings of the Tenth Conferenceon Uncertainty in Artificial Intelligence, eds. R.
L. de Mantaras and D. Poole, 94–101. San Blythe, J., and Veloso, M. 1997. Using Anal- Association for Artificial Intelligence.
Dean, T., and Lin, S.-H. 1995. Decomposi- ings of the Fourteenth National Conference tion Techniques for Planning in Stochastic on Artificial Intelligence, 668–673. Menlo Domains. In Proceedings of the Fourteenth Park, Calif.: American Association for Arti- International Joint Conference on Artificial Boutilier, C., and Dearden, R. 1994. Using Calif.: International Joint Conferences on Abstractions for Decision-Theoretic Plan- ning with Time Constraints. In Proceedings Artificial Intelligence, 1016–1022. Menlo Constraints in Stochastic Domains. Artifi- Park, Calif.: American Association for Arti- cial Intelligence 76(1–2): 35–74.
Dean, T.; Kaelbling, L. P.; Kirman, J.; and and typically impossible. Relatively lit- Boutilier, C.; Brafman, R.; and Geib, C.
Nicholson, A. 1993. Planning with Dead-lines in Stochastic Domains. In Proceedings 1998. Structured Reachability Analysis forMarkov Decision Processes. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, 574–579. Menlo Artificial Intelligence, 24–32. San Francisco, Park, Calif.: American Association for Arti- Boutilier, C.; Brafman, R. I.; and Geib, C.
Draper, D.; Hanks, S.; and Weld, D. 1994.
thesis of Classical and Decision Theoretic Proceedings of the Second International Con- Planning. In Proceedings of the Fifteenth ference on Artificial Intelligence Planning Sys- International Joint Conference on Artificial tems, ed. K. Hammond, 31–37. Menlo Park, Intelligence. Menlo Park, Calif.: Interna- one of the largest challenges ahead.
tional Joint Conferences on Artificial Intel- Feldman, J. A., and Sproull, R. F. 1977. Deci- sion Theory and Artificial Intelligence II: Boutilier, C.; Dean, T.; and Hanks, S. 1995.
The Hungry Monkey. Cognitive Science1:158–192.
Planning under Uncertainty: StructuralAssumptions and Computational Leverage.
Fikes, R. E., and Nilsson, N. J. 1971. STRIPS: A In New Directions in AI Planning, eds. M.
Ghallab and A. Milani, 157–172. Amster- rem Proving to Problem Solving. Artificial lie beyond our current capabilities.
Boutilier, C.; Dearden, R.; and Goldszmidt, Friedland, P. E., and Iwasaki, Y. 1985. The Construction. In Proceedings of the Four- Plans. Journal of Automated Reasoning 1(2): Artificial Intelligence, 1104–1111. Menlo Givan, R., and Dean, T. 1997. Model Mini- Park, Calif.: International Joint Confer- Blum, A., and Furst, M. 1997. Fast Planning STRIPS Planning. In Proceedings of the Fif- through Planning Graph Analysis. Artificial Brafman, R. 1997. A Heuristic Variable Grid Intelligence 90(1–2): 281–300.
Artificial Intelligence. Menlo Park, Calif.: Blythe, J. 1998. Planning under Uncertain- ings of the Fourteenth National Conference International Joint Conferences on Artifi- ty in Dynamic Domains. Ph.D. dissertation, on Artificial Intelligence, 727–733. Menlo Park, Calif.: American Association for Arti- Goldman, R. P., and Boddy, M. S. 1994. Ep- silon-Safe Planning. In Proceedings of the Blythe, J. 1996. Decompositions of Markov Tenth Conference on Uncertainty in Artificial Intelligence, eds. R. L. de Mantaras and D.
Change in Planners. In Proceedings of the Partially Observable Stochastic Domains. In Poole, 253–261. San Francisco, Calif.: Mor- Third International Conference on Artificial Search Control of Plan Generation in Deci- plexity of Probabilistic Planning. Journal of mant GRAPHPLAN. In Proceedings of the Fif- sion-Theoretic Planning. In Proceedings of Artificial Intelligence Research 9:1–36.
the Fourth International Conference on Artifi- Luce, R. D., and Raiffa, H. 1957. Games and Intelligence, 889–896. Menlo Park, Calif.: cial Intelligence Planning Systems, eds. R.
Decisions: Introduction and Critical Survey.
American Association for Artificial Intelli- Tash, J. K., and Russell, S. 1994. Control Haddawy, P., and Hanks, S. 1998a. Utility Strategies for a Stochastic Planner. In Pro- retic Planners. Computational Intelligence of Computer Science, Yale University.
ence on Artificial Intelligence, 1079–1085.
Majercik, S. M., and Littman, M. L. 1998.
Haddawy, P., and Hanks, S., eds. 1998b.
Planning. In Proceedings of the Fourth Inter- Tate, A. 1977. Generating Project Networks.
national Conference on Artificial Intelligence In Proceedings of the Fifth International Planning Systems, eds. R. Simmons and S.
Joint Conference on Artificial Intelligence, 888–893. Menlo Park, Calif.: International Joint Conferences on Artificial Intelligence.
Newell, A., and Simon, H. A. 1963. GPS: A Veloso, M.; Carbonell, J.; Pérez, A.; Borrajo, Inheritance Abstraction. In Proceedings of Program That Simulates Human Thought.
D.; Fink, E.; and Blythe, J. 1995. Integrating the Second International Conference on Artifi- In Computers and Thought, eds. E. A. Feigen- Planning and Learning: The PRODIGY Archi- cial Intelligence Planning Systems, ed. K.
baum and J. Feldman, 279–293. New York: tecture. Journal of Experimental and Theoret- Hammond. Menlo Park, Calif.: AAAI Press.
Haddawy, P.; Doan, A.; and Goodwin, R.
gency Selection in Plan Generation. Paper Commitment Planning. AI Magazine 15(4): ning: Techniques and Empirical Analysis.
In Proceedings of the Eleventh Conference on Weld, D.; Anderson, C.; and Smith, D. E.
Uncertainty in Artificial Intelligence, eds. P.
Besnard and S. Hanks, 229–326. San Fran- Parr, R., and Russell, S. 1995. Approximat- ing Optimal Policies for Partially Observ- ceedings of the Fifteenth National Confer- Haddawy, P.; Doan, A.; and Kahn, C. E.
able Stochastic Domains. In Proceedings of ence on Artificial Intelligence, 897–904.
1996. Decision-Theoretic Refinement Plan- the Fourteenth International Joint Confer- ence on Artificial Intelligence, 1088–1094.
ment of Acute Deep Venous Thrombosis.
Menlo Park, Calif.: International Joint Con- Wellman, M. P. 1990. Formulation of Trade- Medical Decision Making 16(4): 315–325.
ferences on Artificial Intelligence.
Offs in Planning under Uncertainty. New York: Howard, R. A. 1960. Dynamic Programming Pearl, J. 1988. Probabilistic Reasoning in and Markov Processes. Cambridge, Mass.: Intelligent Systems. San Francisco, Calif.: Wellman, M. P.; Breese, J. S.; and Goldman, R. P. 1992. From Knowledge Bases to Deci- Kautz, H. A., and Selman, B. 1996. Pushing Penberthy, J. S., and Weld, D. S. 1992.
sion Models. Knowledge Engineering Review the Envelope: Planning, Propositional Log- ic, and Stochastic Search. In Proceedings of Wilkins, D. E.; Myers, K. L.; Lowrance, J. D.; Third International Conference on Princi- Artificial Intelligence. Menlo Park, Calif.: and Wesley, L. P. 1995. Planning and React- ples of Knowledge Representation and Rea- American Association for Artificial Intelli- ments. Journal of Experimental and Theoreti- Peot, M. A., and Smith, D. E. 1992. Condi- erating Abstractions for Problem Solving.
Williamson, M., and Hanks, S. 1994. Opti- tional Nonlinear Planning. In Proceedings of mal Planning with a Goal-Directed Utility the First International Conference on Artificial Department, Carnegie Mellon University.
Intelligence Planning Systems, ed. J. Hendler, Model. In Proceedings of the Second Interna- Kushmerick, N.; Hanks, S.; and Weld, D.
tional Conference on Artificial Intelligence 1995. An Algorithm for Probabilistic Plan- ning. Artificial Intelligence 76(1–2): 239–286.
176–181. Menlo Park, Calif.: AAAI Press.
Pérez, M. A., and Carbonell, J. 1994. Con- Kushmerick, N.; Hanks, S.; and Weld, D.
trol Knowledge to Improve Plan Quality. In 1994. An Algorithm for Probabilistic Least- Proceedings of the Second International Con- ference on Artificial Intelligence Planning Sys- the Twelfth National Conference on Artifi- tems, ed. K. Hammond, 323–328. Menlo Jim Blythe is a research scientist at the Uni-
cial Intelligence, 1073–1078. Menlo Park, versity of Southern California’s Informa- Calif.: American Association for Artificial Pryor, L., and Collins, G. 1996. Planning for Journal of Artificial Intelligence Sequential Decision Making. Ph.D. disserta- Puterman, M. 1994. Markov Decision acquisition, reasoning under uncertainty, Processes: Discrete Stochastic Dynamic Pro- and machine learnig. His e-mail address is


Personal information



H4 ELECTRONICS INDUSTRY H4.1 Machinery for Electronic Components In exercise of the powers conferred by sub-section(1) of section01.03.2002 25 of the Customs Act,1962 (52 of 1962), the Central Government,being satisfied that it is necessary in the public interest so to do, hereby exempts the goods specified in column(2) of the Table below, and falling under Chapters 69, 82,84,85 or 90 of t

Copyright © 2010-2014 Drug Shortages pdf