HIERARCHICALREINFORCEMENTLEARNINGINCONTINUOUSSTATEANDMULTI-AGENTENVIRONMENTSADissertationPresentedbyMOHAMMADGHAVAMZADEHSubmittedtotheGraduateSchooloftheUniversityofMassachusettsAmherstinpartialfulfillmentoftherequirementsforthedegreeofDOCTOROFPHILOSOPHYSeptember2005ComputerSciencecCopyrightbyMohammadGhavamzadeh2005AllRightsReservedHIERARCHICALREINFORCEMENTLEARNINGINCONTINUOUSSTATEANDMULTI-AGENTENVIRONMENTSADissertationPresentedbyMOHAMMADGHAVAMZADEHApprovedastostyleandcontentby:SridharMahadevan,ChairAndrewG.Barto,MemberVictorR.Lesser,MemberWeiboGong,MemberW.BruceCroft,DepartmentChairComputerScienceTomyparents.ACKNOWLEDGMENTSImustbeginbythankingmymotherandthenproceedtoaskhertoforgivemeforyetanotherfailing:Iamabsolutelyincapableofexpressingthedepthofmygratitudeforherendlesslove,support,andencouragement.IamdeeplygratefultomyadvisorSridharMahadevan,whoseguidance,support,andpatiencewereinstrumentalinbringingthisworktofruition.Sridhargavemetremendousfreedomtoexploreandtrynewideas,whichhashadanessentialroleinmygrowthasaresearcher.ThankyouSridhar.DuringmygraduatestudiesatUMassIhavehadtheopportunitytocollaboratewithAndyBarto.IhavefoundAndyanoutstandingandvisionaryresearcher,andawonderfulhumanbeing.Itwasagreathonorandarealpleasureformetohavehimasamemberofmythesiscommittee.Iamalsoindebtedtotheothermembersofmycommitteefortheirpatienceinread-ingdraftsofmythesis,theirinsightfulcomments,andtheirstimulatingquestionsduringmydefense.IthankVictorLesserforhisconstantsupport,andforhelpingmebetterunderstandresearchdirectionsinmulti-agentsystems;andWeiboGongforinspiringcon-versations.ImustthankDoinaPrecupheartilyforherunwaveringsupportwhilealongvisadelayhadinterruptedmyresearchandalmosteveryotheraspectofmylife.Itisamazinghowone’scareeranddignitycanfallatthemercyofsuchaseeminglybanaluncertaintyasavisadelay.Iamindebtedforhersupportatsuchatime:shemadeeveryefforttomakemefeelpartofthecommunityatthecomputersciencedepartmentatMcGilluniversity.Manyothershavesharedtheirinsightsandcontributedtothedevelopmentoftheideasinthethesis.IespeciallythankBalaramanRavindranandmyoldbuddyKhashayarRohan-vimaneshformanyusefulconversationsandmoreimportantfortheirpreciousfriendship.IthankAndyFaggandMikeRosensteinforexposingmetoawidevarietyoftopicsincontinuousstateandactionreinforcementlearning.IneverforgetAndy’sfriendship,hisdown-to-earthmanner,andhistastyandfreshsalsas.IthankMikewhomadeorganizingaworkshopatAAAI-2004ajoyfulandeducationalexperienceforme.IwanttothankCaroLocusandAliM.Eydgahi,myM.S.andB.S.advisorsfromUniversityofTehran,Iran.Theytaughtmehowtobearesearcher,howtobetterexpressmyideas,andhelpedmeinwritingmyfirstresearchpapers.IalsowanttothankAbdolEsfahanianwithoutwhomitwouldnothavebeenpossibleformetopursuemyeducationintheUnitedStatesofAmerica.IwouldliketothankallthemembersoftheAutonomousLearningLaboratoryatUMass,pastandpresent,fortheirfriendship,fortheirconstantsupportandencourage-ment,forgivingusefulfeedbackduringmypracticetalksandlab-meetingpresentations,andfinallyfortakingcareofmycubicleduringmyunwantedone-yearabsence.ThankyouColinBarringer,JadDavis,AndyFagg,JeffreyJohns,AndersJonsson,GeorgeKonidaris,VictoriaManfredi,AmyMcGovern,SarahOsentoski,TedPerkins,MarcPickett,Balara-manRavindran,KhashayarRohanimanesh,MikeRosenstein,SuchiSaria,AshvinShah,Özgür¸Sim¸sek,AndrewStout,ChrisVigorito,andPippinWolfeformakingourlabsuchanexcellentandenjoyableenvironmentforresearch.IamalsogratefultothemembersofoursmallAutonomousAgentsLaboratoryatMichiganStateUniversity,withwhomIlearnedaboutnewresearchdirections,openprob-lems,andsolutiontechniquesinArtificialIntelligence,MachineLearning,andReinforce-mentLearning:NataliaHernandezGardiol,RajbalaMakar,SilviuMinut,KhashayarRo-hanimanesh,andGeorgiosTheocharous.Iamproudtobelongtoanintellectualcommunitythattreatshopeful,younggradu-atestudentswiththesamerespectasseniorresearchers.Someofthemembersofthiscommunitywhohavebeenparticularlyhelpfulandkindtome,andtheirusefulcommentsvicontributedtothequalityofthisdocumentareDavidAndre,BernhardHengst,ShieMan-nor,DoinaPrecup,RichardSutton,andPrasadTadepalli.ThematerialinthisworkisbaseduponworkcarriedoutintheAutonomousAgentsLaboratoryintheDepartmentofComputerScienceandEngineeringatMichiganStateUniversity,undertheDARPAcontractDAANO2-98-C-4025,andtheAutonomousLearn-ingLaboratoryintheDepartmentofComputerScienceatUniversityofMassachusettsAmherst,undertheNASAcontractNAg-1445#1,andtheNSFgrantECS-0218125.viiABSTRACTHIERARCHICALREINFORCEMENTLEARNINGINCONTINUOUSSTATEANDMULTI-AGENTENVIRONMENTSSEPTEMBER2005MOHAMMADGHAVAMZADEHB.Sc.,UNIVERSITYOFTEHRAN,IRANM.Sc.,UNIVERSITYOFTEHRAN,IRANPh.D.,UNIVERSITYOFMASSACHUSETTSAMHERSTDirectedby:ProfessorSridharMahadevanThisdissertationinvestigatestheuseofhierarchyandabstractionasameansofsolvingcomplexsequentialdecisionmakingproblemssuchasthosewithcontinuousstateand/orcontinuousactionspaces,anddomainswithmultiplecooperativeagents.Thisthesisde-velopsseveralnovelextensionstohierarchicalreinforcementlearning(HRL),anddesignsalgorithmsthatareappropriateforsuchproblems.Ithasb