决策理论（Decision theory）&自动规划和调度（Automated planning and scheduling）（双语）

时间：2015-08-27 23:11:22 阅读：473 评论：0 收藏：0 [点我收藏+]

译的不好，还请见谅。。。

大部分内容来自wiki

decision theory决策理论部分：

Normative and descriptive decision theory

规范和描述性决策理论

规范或规范的决策理论关心的是确定最好的决定(在实践中,有些情况下,“最好”的不一定是最大,最优可能还包括值除了最大,但在特定或近似范围),假设一个理想的决策者充分了解,能够准确无误地计算,完全理性的。这说明性的方法的实际应用(人们应该做出决定)决策分析,旨在发现工具,方法和软件帮助人们做出更好的决策。系统的、全面的软件工具最发达的以这种方式被称为决策支持系统。

What kinds of decisions need a theory?

在什么条件下需要决策？

1.Choice under uncertainty

1.在不确定性条件之下

This area represents the heart of decisiontheory. The procedure now referred to as expected value was known from the17th century. Blaise Pascal invoked it in his famous wager (see below),which is contained in his Pensées, published in 1670. The idea of expectedvalue is that, when faced with a number of actions, each of which could giverise to more than one possible outcome with different probabilities, therational procedure is to identify all possible outcomes, determine their values(positive or negative) and the probabilities that will result from each courseof action, and multiply the two to give an expected value. The action tobe chosen should be the one that gives rise to the highest total expectedvalue. In 1738, Daniel Bernoulli published an influential paper entitledExposition of a New Theory on the Measurement of Risk, in which he uses the St. Petersburg paradox toshow that expected value theory must be normatively wrong. He also givesan example in which a Dutch merchant is trying to decide whether to insure acargo being sent from Amsterdam to St Petersburg in winter,when it is known that there is a 5% chance that the ship and cargo will belost. In his solution, he defines a utility function and computes expectedutility rather than expected financial value (see[2] for a review).

期望值的理念是,当面对大量的行动,每一个都有不同的概率并可能引起多个可能的结果,合理的过程是识别所有可能的结果,确定它们的值(正面或负面)和造成每个行动结果的概率,并增加这两项产生一个预期值。

应该选择的行动是总预期值最高的。在1738年,丹尼尔·伯努利发表一篇有影响力的论文题为博览会上的一个新的理论预测风险,他使用圣彼得堡悖论表明,预期值理论是被错误规范的。他也给了一个例子, 一位荷兰商人试图决定是否在冬天投保货物从阿姆斯特丹发送往圣彼得堡,当他知道有5%的可能性,他的船和货物都将丢失。他在解决方案,定义了一个效用函数和计算期望效用,而不是预期财务值。

效用函数（effectivenessfunction; utility function; utility function used）：效用函数通常用来表示消费者在消费中所获得的效用与所消费的商品组合之间数量关系的函数，以衡量消费者从消费既定的商品组合中所获得满足的程度。

"效用函数" 在工具书中的解释

a、表示消费者在消费中所获得的效用与所消费的商品组合之间数量关系的函数。它被用以衡量消费者从消费既定的商品组合中所获得满足的程度。运用无差异曲线只能分析两种商品的组合,而运用效用函数则能分析更多种商品的组合。其表达式是:U=U(x, y,z, …)式中 x,　y,　z分别代表消费者所拥有或消费的各种商品的数量,公式左边的U为......

"效用函数" 在学术文献中的解释

b、效用函数的定义是:设f是定义在消费集合X上的偏好关系,如果对于X中任何的x,y,xfy当且仅当u(x)≥u(y),则称函数u:X→R是表示偏好关系f的效用函数

c、F(X)称为效用函数.加权P范数法的关键是权系数的确定.有2种基本的方法,一是老学习法[1,2],该方法依据目标函数的相对重要性来选取权系数

d、一个人的效用应是财富x的函数,这个函数称为效用函数,从理论上来讲,它可以通过一系列心理测试来逼近得到每个人的效用函数.不同的决策者应有不同的效用函数.首先我们寻求效用函数所满足的性质或某些特殊类效用函数所满足的性质

e、这是一种理论假设,他们运用的数学函数式所建立的模型称为“效用函数”.按照这类模型,人都能被假设成为可以决定在每一种可能的时间分配中产生一定的利益水平,并且追求利益最大化的选择

f、—第i种运输方式的费用,有时也称为效用函数,u=ao+alx一+灸xC.丁—第i种运输方式的出行时间.C—第i种运输方式的运输费用

g、为了对控制做出评价,需要一套函数作为评价指标:J(t)=∑∞k=0kγU(t+k)=U(t)+Jγ(t+1)(2)其中U(t)=U[R(t),A(t),t]用以对每步控制进行评价,称为效用函数.J(t)函数表示了从此刻开始的每步效用函数值的累积,称为费用函数。

2.Intertemporal choice

2.跨期选择

Intertemporal choice is concerned with thekind of choice where different actions lead to outcomes that are realised atdifferent points in time. If someone received a windfall of several thousanddollars, they could spend it on an expensive holiday, giving them immediatepleasure, or they could invest it in a pension scheme, giving them an income atsome time in the future. What is the optimal thing to do? The answer dependspartly on factors such as the expected ratesof interest and inflation, the person‘s lifeexpectancy, and their confidence in the pensions industry. However evenwith all those factors taken into account, human behavior again deviatesgreatly from the predictions of prescriptive decision theory, leading toalternative models in which, for example, objective interest rates are replacedby subjective discount rates.

跨时期的选择导致不同的时间点有不同的结果。如果有人收到几千美元的横财,他们可以花在昂贵的假期,让他们马上感到快乐,或者他们可以投资在一个养老计划,让他们一个在未来一段时间进行投资。最优的事情是什么?答案部分取决于预期的利率和通货膨胀等因素,人的平均寿命,和他们的养老金行业的信任。然而即使所有这些因素考虑在内,人类行为又大大背离规范性决策理论的预测,导致替换的模型,例如,目标利率被主观折扣率（subjective discount rates）所取代。

Interaction of decision makers

Some decisions are difficult because of theneed to take into account how other people in the situation will respond to thedecision that is taken. The analysis of such social decisions is more oftentreated under the label of game theory, rather than decision theory, though itinvolves the same mathematical methods. From the standpoint of game theory mostof the problems treated in decision theory are one-player games (or the oneplayer is viewed as playing against an impersonal background situation). In theemerging socio-cognitive engineering, the research isespecially focused on the different types of distributed decision-making inhuman organizations, in normal and abnormal/emergency/crisis situations.

有些决定是困难的,因为需要考虑别人将如何应对的决定的情况。这样的决策的分析往往是对博弈论的理论而言,而不是决策理论,尽管它涉及相同的数学方法。从博弈论的角度看的大部分问题在决策理论是一个玩家的游戏。

Other-regarding preferences

涉他偏好

Also called social preferences. In decisions which affectothers, people will sometimes give up some direct personal benefit or take on acost in order to achieve a fair or equal outcome. Boltonand Ockenfels (2000) and Fehr and Schmidt (1999) explore decision-makers whoare concerned with fairness of distributions and have disutilityfrom others‘ being much better off or much worse off. A closely related area ofresearch is concerned with reciprocal fairness; the decision-makers desire to rewardkind actions or intentions and punish unkind ones.

也称为社会偏好。决策影响他人,人们有时会直接放弃一些个人利益或承担成本以达到公平或平等的结果。一个密切相关的研究领域涉及相互公平;决策者希望奖励这种行为或者意图和惩罚不友善的人。

Complex decisions

复杂决策

Other areas of decision theory areconcerned with decisions that are difficult simply because of their complexity,or the complexity of the organization that has to make them. Individuals makingdecisions may be limited in resources or are boundedly rational. In such cases the issue isnot the deviation between real and optimal behaviour, but the difficulty ofdetermining the optimal behaviour in the first place. The Club ofRome, for example, developed a model of economic growth and resource usagethat helps politicians make real-life decisions in complex situations[citation needed]. Decisions are alsoaffected by whether options are framed together or separately. This is known asthe distinction bias.

Heuristics

启发

Main article: Heuristic

One method of decision-making is heuristic.The heuristic approach makes decisions based on routine thinking. While this isquicker than step-by-step processing, heuristic decision-making opens the riskof inaccuracy. Mistakes that otherwise would have been avoided in step-by-stepprocessing can be made. One common and incorrect thought process that resultsfrom heuristic thinking is the gambler‘s fallacy. The gambler‘s fallacy makesthe mistake of believing that a random event is affected by previous randomevents. For example, there is a fifty percent chance of a coin landing onheads. Gambler‘s fallacy suggests that if the coin lands on tails, the nexttime it flips, it will land on heads, as if it‘s “the coin‘s turn” to land onheads. This is simply not true. Such a fallacy is easily disproved in astep-by-step process of thinking.[5]

In another example, when choosing betweenoptions involving extremes, decision-makers may have a heuristic that moderatealternatives are preferable to extreme ones. The Compromise Effect operatesunder a mindset driven by the belief that the most moderate option, amidextremes, carries the most benefits from each extreme.[6]

决策是启发式方法之一。启发式方法使得决策基于常规思维。虽然这比一步一步处理快,启发式决策存在错误的风险，是在一步一步的处理可以避免的错误。一个常见的不正确的思维过程,启发式思维-赌徒谬论。

赌徒谬论认为未来的可能性会因为过去的事情而改变，然而事实并非这样。确定的概率，就像抛掷一枚硬币结果是国徽，是不会变的，国徽朝上的概率永远是50%，和你在前十次抛出的是反面没有关系。认为概率会有变化是常见的偏见，尤其在赌博的时候。比方说玩轮盘赌，过去的四盘都在黑色一边停下，接下来这盘就会在红色的一边停下吗？显然错了！在红色处停留的概率还是47.37%。这听起来似乎很理所当然，但是就是这个偏见让许多赌徒输了大把的钱，天真的认为概率会改变。

这种谬论是容易证明的思维是一个循序渐进的过程。

General criticism

一般的批评

Main article: Ludicfallacy（戏局谬误）

A general criticism of decision theorybased on a fixed universe of possibilities is that it considers the "knownunknowns", not the "unknownunknowns": it focuses on expected variations, not on unforeseenevents, which some argue (as in blackswan theory) have outsized impact and must be considered – significantevents may be "outside model". This line of argument, called the ludicfallacy, is that there are inevitable imperfections in modeling the realworld by particular models, and that unquestioning reliance on models blindsone to their limits.

决策理论的一般批评基于一个固定的可能性是,它认为是“已知的未知”,而不是“未知的未知”:它关注预期的变化,不是无法预料的事件,一些人认为(如黑天鹅理论)产生巨大的影响,必须考虑重大事件可能是“外模型”。这个的观点,所谓的戏局谬误,是有不可避免的缺陷由特定的模型,在建模现实世界,盲目的依赖模型使人看不到他们的限制。（戏局谬误 Ludicfallacy 过度使用统计与机率预测未来。）

黑天鹅理论：在发现澳大利亚之前，欧洲人认为所有天鹅都是白色的，还常用“黑天鹅”来指不可能存在的事物。但欧洲人这个信念却随着第一只黑天鹅的出现而崩溃。因为，黑天鹅的存在代表不可预测的重大稀有事件，意料之外却又改变一切。人们总是对一些事物视而不见，并习惯于以有限生活经验和不堪一击的信念来解释这些意料之外的重大冲击。这就是“黑天鹅理论”。

Automated planning and scheduling自动规划和调度：

Automated planning and scheduling, in therelevant literature often denoted as simply planning,

自动规划和调度,在相关文献中通常表示为简单的规划,是人工智能的一个分支,关注的实现策略或行动序列,

Planning is also related to decisiontheory.

规划还与决策理论有关（上文中的）

在已知环境中可用的模型、规划可以离线完成。可以找到解决方案和评估之前执行。在动态未知环境中,战略通常需要在线实时修改。模型和策略必须可适应

Given a description of the possible initial statesof the world, a description of the desired goals, and a description of a set ofpossible actions, the planning problem is to find a plan that is guaranteed(from any of the initial states) to generate a sequence of actions that leadsto one of the goal states.

The difficulty of planning is dependent on thesimplifying assumptions employed. Several classes of planning problems can beidentified depending on the properties the problems have in several dimensions.

给定的初始状态的描述,描述理想的目标,和描述的一组可能的行动,计划的问题是找到一个计划,保证(从任何初始状态)来生成一个操作序列,来达到目标状态。
规划的难度取决于使用的简化假设。在几个方面，规划问题的几类属性能基于特性被定义。

For nondeterministic actions, are theassociated probabilities available?

对不确定性的行为,相关的概率可用吗?

Are the state variables discrete orcontinuous?

状态变量是离散或连续的吗?

If they are discrete, do they have only afinite number of possible values?

如果他们是离散的,他们只有有限数量的可能值吗?

Can the current state be observed unambiguously?

可以观察到当前正确的状态吗?

Is there only one agent or are thereseveral agents?

只有一个代理还是可以有几个代理?

Are the agents cooperative or selfish?

代理合作的还是自私的?

Can several actions be taken concurrently,or is only one action possible at a time?

可以同时采取一些行动,还是只能进行一个行动?

Is the objective of a plan to reach adesignated goal state, or to maximize a reward function?

计划的目标是达到指定目标的状态,还是最大化回报函数?

Do all of the agents construct their ownplans separately, or are the plans constructed centrally for all agents?

所有的代理构建他们自己的计划,还是为所有代理集中组建计划?

The simplest possible planning problem,known as the Classical Planning Problem, is determined by:

尽可能简单的规划问题,被称为经典的规划问题，决定于：

a unique known initial state,

一个独特的已知的初始状态,

Directionless actions,

漫无目的行动,

deterministic actions,

确定的行动,

which can be taken only one at a time,

一次只有一个可以采取,

and a single agent.

一个代理。

Since the initial state is known unambiguously,and all actions are deterministic, the state of the world after any sequence ofactions can be accurately predicted, and the question of observability isirrelevant for classical planning.

Further, plans can be defined as sequencesof actions, because it is always known in advance which actions will be needed.

With nondeterministic actions or otherevents outside the control of the agent, the possible executions form a tree,and plans have to determine the appropriate actions for every node of the tree.

由于初始状态是已知的明确,并且所有操作是确定性的,世界任何序列操作后的状态可以准确地预测,并且可观察不相关的传统计划。此外,计划可以被定义为行动的序列,因为它总是提前知道需要哪些动作。伴随着不确定性行为之外的控制或其他事件代理,可能执行的形成一个树,并计划决定的每一个树的节点作为适当的行动。

Discrete-time Markov decision processes(MDP) are planning problems with:

离散时间马尔可夫决策过程(MDP)规划问题:

directionless actions,

nondeterministic actions withprobabilities,

full observability,

maximization of a reward function,

and a single agent.

漫无目的的行动

不确定性行为的概率,
完整的可观察性,
最大化的回报函数,
和一个代理。

When full observability is replaced bypartial observability, planning corresponds to partially observableMarkov decision process (POMDP).

If there are more than one agent, we have multi-agent planning, which is closely relatedto gametheory.

当完全可观察性取代了部分可观察性、计划对应部分可观察马尔可夫决策过程(POMDP)。
如果有不止一个的代理,我们有多智能体规划、这与博弈理论密切相关。

Planning languages

规划性语言

The most commonly used languages forrepresenting planning problems, such as STRIPS and PDDL for Classical Planning,are based on state variables. Each possible state of the world is an assignmentof values to the state variables, and actions determine how the values of thestate variables change when that action is taken. Since a set of statevariables induce a state space that has a size that is exponential in the set,planning, similarly to many other computational problems, suffers from the curse of dimensionality and the combinatorial explosion.

An alternative language for describingplanning problems is that of hierarchical task networks, in which aset of tasks is given, and each task can be either realized by a primitiveaction or decomposed into a set of other tasks. This does not necessarilyinvolve state variables, although in more realistic applications statevariables simplify also the description of task networks.

最常用的规划问题语言,如基于状态变量的STRIPS 和PDDL经典规划。世界中每个可能的状态是一个变量的赋值, 并且行动决定当采取这个行动时状态变量的值该怎么改变。自一组状态变量引起的状态空间规模指数的设置,规划,类似于许多其他计算问题,遭受维度和组合爆炸的困扰。

另一种描述分层任务网络规划问题的可选的语言,给出了一组任务,每个任务可以是实现了一个原始的行动或分解为一组其他任务。这并不一定包括状态变量,尽管在更现实的应用程序状态变量简化依然任务网络描述

Preference-based planning

基于偏好的规划

Main article: Preference-based planning

In preference-based planning, the objectiveis not only to produce a plan but also to satisfy user-specified preferences.A difference to the more common reward-based planning, for examplecorresponding to MDPs, preferences don‘t necessarily have a precise numericalvalue.

在个性化的计划,目标是不仅产生计划也要满足用户指定的偏好。与更普遍的基于奖励的计划的一点不同，比如对应,MDPs偏好并不绝对的有一个精确的数值。

MDPs：Markov Decision Process马尔科夫决策过程

Algorithms for planning

Classical planning

forwardchaining state space search, possibly enhanced with heuristics,

backwardchaining search, possibly enhanced by the use of state constraints (see STRIPS, graphplan),

partial-order planning (in contrast to Noninterleaved planning).