OptimalityInequalitiesforAverageCostMarkovDecisionProcessesandtheStochasticCashBalanceProblemEugeneA.FeinbergDepartmentofAppliedMathematics&StatisticsStateUniversityofNewYorkatStonyBrookStonyBrook,NY11794-3600efeinberg@notes.cc.sunysb.edu631-632-7189MarkE.LewisSchoolofOperationsResearch&IndustrialEngineeringCornellUniversity226RhodesHall,Ithaca,NY14853mel47@cornell.edu607-255-0757March6,2007Keywords:Markovdecisionprocess,averagecostperunittime,optimalityinequality,optimalpolicy,inventorycontrolMSC2000SubjectClassifications:Primary:90C40;Secondary:90B05OR/MSsubjectclassifications:Primary:DynamicProgramming/optimalcontrol/Markov/Infinitestate;Secondary:Inventory/production/Uncertainty/StochasticAbstractForgeneralstateandactionspaceMarkovdecisionprocesses,wepresentsufficientconditionsfortheexistenceofsolutionsoftheaveragecostoptimalityinequalities.Theseconditionsalsoimplytheconvergenceofboththeoptimaldiscountedcostvaluefunctionandpoliciestothecorrespondingobjectsfortheaveragecostsperunittimecase.Inventorymodelsarenaturalapplicationsofourresults.Wedescribestructuralpropertiesofaveragecostoptimalpoliciesforthecashbalanceproblem;aninven-torycontrolproblemwherethedemandmaybenegativeandthedecision-makercanproduceorscrapinventory.Wealsoshowtheconvergenceofoptimalthresholdsinthefinitehorizoncasetothoseundertheexpecteddiscountedcostcriterionandthoseundertheexpecteddiscountedcoststothoseundertheaveragecostsperunittimecriterion.1IntroductionInadiscrete-timeMarkovdecisionprocess(MDP)theusualmethodtostudytheaveragecostcriterionistofindasolutiontotheaveragecostoptimalityequations.Apolicythatachievestheminimuminthissystemofequationsisthenaveragecostoptimal.Whenthestateandactionspacesareinfinite,onemayberequiredtoreplacetheequationswithinequalities,yettheconclusionsarethesame;apolicythatachievestheminimum1intheinequalitiesisaveragecostoptimal.Sch¨al[27]providestwogroupsofgeneralconditionsthatimplytheexistenceofasolutiontotheaveragecostoptimalityinequalities(ACOI).Thefirstgroup,referredtoasAssumptions(W)inSch¨al[27],requireweakcontinuityofthetransitionprobabilities.Thesecondgroup,Assumptions(S),requiresetwisecontinuityofthetransitionprobabilities.Ineithercase,foreachstateacompactactionsetwasassumedin[27].ThepurposeofthispaperistoadaptSch¨al’s[27]conditionstoproblemswithnoncompactactionsets;inparticulartothoserelatedtoinventorycontrol.Aswasnotedin[12],typicalinventorycontrolmodels(withgeneraldemanddistributions)requireweakcontinuity;setwisecontinuityisnotenoughtoyieldtheconclusionsthattheACOIhaveasolution.Ontheotherhand,whenthedemanddistributionisrestrictedtobecontinuousortheinventoryisrestrictedtobeinteger,weshowthatsetwisecontinuitydoessuffice.ThebooksbySennott[28]andHern´andez-LermaandLasserre[19]dealwithcountableandgeneralstateMDPs,respectively.Hern´andez-LermaandLasserre[19,Chapter5],Hern´andez-Lerma[18],andFern´andez-Gaucherand[14]presentresultsfornon-compactactionsetsbutassumesetwisecontinuity.TheresultsofHern´andez-Lerma[18]extendSch¨al’s[27]resultsonMDP’swithsetwisecontinuoustransitionprobabilitiesfromcompactactionsetstononcompactactionsets.InthispaperwestudyMDPswithweaklycontinuoustransitionprobabilities.Themajormotivationforthisstudyistheirrelevancetoinventorycontrolproblems.Section5.7inHern´andez-LermaandLasserre[19]providesconditionsfortheexistenceofstationaryoptimalpoliciesforanMDPwithweaklycontinuoustransitionprobabilitiesbutthederivationisdonedirectly;withoutderivingtheoptimalityequationsorin-equalities.Weareinterestednotonlyintheexistenceofoptimalpoliciesbutinthevalidityoftheoptimalityinequalities.Thisisanimportantstepsincetheseinequalitiescanbeusedtoprovestructuralpropertiesofoptimalstationarypoliciesandtoproveconvergenceofdiscountedcostoptimalpoliciestoaveragecostoptimalpolicies.Werecallthat,accordingtotheexampleconstructedbyCavazos-Cadena[4],optimal-ityinequalitiesmayholdforanMDPforwhichoptimalityequalitiesdonothold.Inaddition,optimalityinequalitiesimplytheexistenceofoptimalpolicies[27,Proposition1.3].Inthispaperweconsideraclassofproblemswithnoncompactactionsetsandintroduceanadditionalcondition,Assumption(LB),thatstatesthatacertainfunction,relevanttotherelativevaluefunctions,is2locallyboundedfromabove.Thisassumptionisimportantforthefollowingreasons:(i)ifitissatisfied,noncompactactionsetscanbereducedtocompactactionsubsetsinawaythatthevaluefunctionsremainunchangedandSch¨al’sassumptions[27]hold,(ii)itcanbeverifiedeasily,and(iii)ittypicallyholdsforinventorycontrolproblems.Thus,thispaperprovidesstraightforwardtoolstoanalyzeinventorycontrolproblemswiththeaveragecostsperunittimecriterionandwithouttheassumptionthatthedemandiseitherdiscreteorcontinuous.Thoughtheoptimalityof(s;S)policiesforperiodicreviewinventorycontrolproblemsisawell-knownfact,foraveragecostproblemsitsrigorousproofwithouttheassumptionthatthedemandiseitherdiscreteorcontinuouswasestablishednotlongagobyChenandSimchi-Levi[7].Evenso,theproofprovidedin[7]isnontrivialandproblem-specific.FeinbergandLewis[13]providedastraightforwardproofofthisfactbyusingtheresultsdescri