Bellmanequation

kenny81003
2 ℃
2020-01-11

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

BellmanequationFromWikipedia,thefreeencyclopediaABellmanequation,alsoknownasadynamicprogrammingequation,namedafteritsdiscoverer,RichardBellman,isanecessaryconditionforoptimalityassociatedwiththemathematicaloptimizationmethodknownasdynamicprogramming.Itwritesthevalueofadecisionproblematacertainpointintimeintermsofthepayofffromsomeinitialchoicesandthevalueoftheremainingdecisionproblemthatresultsfromthoseinitialchoices.Thisbreaksadynamicoptimizationproblemintosimplersubproblems,asBellman'sPrincipleofOptimalityprescribes.TheBellmanequationwasfirstappliedtoengineeringcontroltheoryandtoothertopicsinappliedmathematics,andsubsequentlybecameanimportanttoolineconomictheory.AlmostanyproblemwhichcanbesolvedusingoptimalcontroltheorycanalsobesolvedbyanalyzingtheappropriateBellmanequation.However,theterm'Bellmanequation'usuallyreferstothedynamicprogrammingequationassociatedwithdiscrete-timeoptimizationproblems.Incontinuous-timeoptimizationproblems,theanalogousequationisapartialdifferentialequationwhichisusuallycalledtheHamilton–Jacobi–Bellmanequation.贝尔曼方程从维基百科，自由的百科全书也被称为一个动态规划方程，它的发现者，理查德·贝尔曼的名字命名的，是一个Bellman方程，最优的数学优化方法被称为asdynamic编程的必要条件。它在某一个时间点的值写入决策问题的回报，从最初的选择，余下的决策问题的价值，从这些最初的选择的结果。这打破了一个动态的优化问题转化为简单的子问题，Bellman的最优性原则的规定。Bellman方程适用于工程控制论和应用数学中的其他主题，后来成为在经济理论中的一个重要工具。几乎任何利用最优控制理论可以解决也可以解决的问题，通过分析相应的Bellman方程。然而，术语“Bellman方程”通常是指与离散时间的优化问题相关联的动态规划方程。在连续时间的优化问题，类似的方程是一个通常被称为偏微分方程的Hamilton-Jacobi-Bellman方程。Contents[hide]1Analyticalconceptsindynamicprogramming2DerivingtheBellmanequationo2.1Adynamicdecisionproblemo2.2Bellman'sPrincipleofOptimalityo2.3TheBellmanequationo2.4TheBellmanequationinastochasticproblem3Solutionmethods4Applicationsineconomics5Example6Seealso7References内容[隐藏]•动态规划的分析概念•2Bellman方程推导2.1的动态决策问题2.2贝尔曼最优化原理2.3贝尔曼方程Ø2.4Bellman方程的随机问题•解决方法•在经济学中的应用•5例•6•7参考文献[edit]AnalyticalconceptsindynamicprogrammingTounderstandtheBellmanequation,severalunderlyingconceptsmustbeunderstood.First,anyoptimizationproblemhassomeobjective–minimizingtraveltime,minimizingcost,maximizingprofits,maximizingutility,etcetera.Themathematicalfunctionthatdescribesthisobjectiveiscalledtheobjectivefunction.Dynamicprogrammingbreaksamulti-periodplanningproblemintosimplerstepsatdifferentpointsintime.Therefore,itrequireskeepingtrackofhowthedecisionsituationisevolvingovertime.Theinformationaboutthecurrentsituationwhichisneededtomakeacorrectdecisioniscalledthestate(SeeBellman,1957,Ch.III.2).[1][2]Forexample,todecidehowmuchtoconsumeandspendateachpointintime,peoplewouldneedtoknow(amongotherthings)theirinitialwealth.Therefore,wealthwouldbeoneoftheirstatevariables,buttherewouldprobablybeothers.Thevariableschosenatanygivenpointintimeareoftencalledthecontrolvariables.Forexample,giventheircurrentwealth,peoplemightdecidehowmuchtoconsumenow.Choosingthecontrolvariablesnowmaybeequivalenttochoosingthenextstate;moregenerally,thenextstateisaffectedbyotherfactorsinadditiontothecurrentcontrol.Forexample,inthesimplestcase,today'swealth(thestate)andconsumption(thecontrol)mightexactlydeterminetomorrow'swealth(thenewstate),thoughtypicallyotherfactorswillaffecttomorrow'swealthtoo.Thedynamicprogrammingapproachdescribestheoptimalplanbyfindingarulethattellswhatthecontrolsshouldbe,givenanypossiblevalueofthestate.Forexample,ifconsumption(c)dependsonlyonwealth(W),wewouldseekarulethatgivesconsumptionasafunctionofwealth.Sucharule,determiningthecontrolsasafunctionofthestates,iscalledapolicyfunction(SeeBellman,1957,Ch.III.2).[1]Finally,bydefinition,theoptimaldecisionruleistheonethatachievesthebestpossiblevalueoftheobjective.Forexample,ifsomeonechoosesconsumption,givenwealth,inordertomaximizehappiness(assuminghappinessHcanberepresentedbyamathematicalfunction,suchasautilityfunction),theneachlevelofwealthwillbeassociatedwithsomehighestpossiblelevelofhappiness,.Thebestpossiblevalueoftheobjective,writtenasafunctionofthestate,iscalledthevaluefunction.RichardBellmanshowedthatadynamicoptimizationproblemindiscretetimecanbestatedinarecursive,step-by-stepformbywritingdowntherelationshipbetweenthevaluefunctioninoneperiodandthevaluefunctioninthenextperiod.TherelationshipbetweenthesetwovaluefunctionsiscalledtheBellmanequation.动态规划分析概念要了解Bellman方程，必须了解几个基本概念。首先，任何优化问题有一定的目标-尽量减少出行时间，成本最小化，利润最大化，效用最大化，等等。的数学函数，该函数描述了这一目标被称为目标函数。动态规划打破了多期规划问题转化为简单的步骤在不同的时间点。因此，它需要保持跟踪的决定的情况如何随着时间的推移不断变化的。关于当前形势需要做出一个正确的决定被称为状态的信息（见行李员，1957年，章III.2）。[1][2]例如，决定消费多少，花费在每个点的时候，人们需要知道（其中包括）其初始财富。因此，财富将是他们的状态变量之一，但也有可能是其他。在任何给定时间点所选择的变量，通常被称为控制变量。例如，鉴于其目前的财富，人们可能会决定消耗多少。现在选择的控制变量可以是等同的选择下一个状态;更一般地，下一个状态由除了到电流控制的其他因素的影响。例如，在最简单的情况下，今天的财富（州）和消费（控制）可能准确地确定明天的财富（新州），但通常其他的因素会影响明天的财富。动态规划方法描述的最优计划的发现告诉我们的控制是应该的，因为任何可能的状态值的规则。例如，如果只依赖于财富（W）消费（C），我们将寻求一个规则，让消费财富的作用。这样的规则，确定控件的状态的函数，称为策略函数（见行李员，1957年。III.2）。[1]最后，通过定义，最优决策规则是一个达到最好的价值目标。例如，如果有人选择消费，财富，以最大限度地幸福（幸福H可以通过一个数学函数，如效用函数），然后每一级的财富将与一些最高级别的幸福。写的函数的状态，最好的价值目标，被称为价值功能。理查德·贝尔曼表明，在离散时间动态优化问题，可以说在一个递归的，一步一步写下之间的关系在一个时期内的价值功能和价值功能在未来的一段。这两个值函数之间的关系被称为Bellman方程。[edit]DerivingtheBellmanequation[edit]AdynamicdecisionproblemLetthestateattimebe.Foradecisionthatbeg