98IEEETRANSACTIONSONEVOLUTIONARYCOMPUTATION,VOL.18,NO.1,FEBRUARY2014ANovelGraph-BasedEstimationoftheDistributionAlgorithmandItsExtensionUsingReinforcementLearningXiannengLi,StudentMember,IEEE,ShingoMabu,Member,IEEE,andKotaroHirasawa,Member,IEEEAbstract—Inrecentyears,numerousstudieshavedrawnthesuccessofestimationofdistributionalgorithms(EDAs)toavoidthefrequentbreakageofbuildingblocksoftheconventionalstochasticgeneticoperators-basedevolutionaryalgorithms(EAs).Inthispaper,anovelgraph-basedEDAcalledprobabilisticmodelbuildinggeneticnetworkprogramming(PMBGNP)isproposed.Usingthedistinguishedgraph(network)structureofagraph-basedEAcalledgeneticnetworkprogramming(GNP),PMBGNPensureshigherexpressionabilitythantheconventionalEDAstosolvesomespecificproblems.Furthermore,anextendedalgorithmcalledreinforcedPMBGNPisproposedtocombinePMBGNPandreinforcementlearningtoenhancetheperfor-manceintermsoffitnessvalues,searchspeed,andreliability.Theproposedalgorithmsareappliedtosolvetheproblemsofcontrollingtheagents’behavior.Twoproblemsareselectedtodemonstratetheeffectivenessoftheproposedalgorithms,includingthebenchmarkone,i.e.,theTileworldsystem,andarealmobilerobotcontrol.IndexTerms—Agentcontrol,estimationofdistributionalgorithm(EDA),geneticnetworkprogramming(GNP),graphstructure,reinforcementlearning(RL).I.IntroductionINTHELASTfewyears,therehasbeenasignificantdevel-opmentoftheestimationofdistributionalgorithm(EDA)inboththeoryandpractice[1]–[4].Unliketheconventionalevolutionaryalgorithms(EAs)thatusestochasticwaystosimulatethebiologicalgeneticoperatorsfornewpopulationgeneration,EDAconstructsaprobabilisticmodelusingthetechniquesofstatisticsormachinelearningtoestimatetheprobabilitydistributionofthecurrentpopulation,andsamplesthemodeltogenerateanewpopulation.ManystudieshaveinvestigatedwhetherEDAcanoutperformconventionalEAbyavoidingtheprematureconvergenceandspeedingupoftheevolutionprocessinsomeproblems[5]–[8].AlargenumberofstudieshavebeenconductedonEDAtoproposeManuscriptreceivedJuly5,2012;revisedOctober27,2012;acceptedDecember24,2012.DateofpublicationJanuary9,2013;dateofcurrentversionJanuary27,2014.TheauthorsarewiththeGraduateSchoolofInformation,ProductionandSystems,WasedaUniversity,Fukuoka808-0135,Japan(e-mail:sen-nou@asagi.waseda.jp;mabu@aoni.waseda.jp;hirasawa@waseda.jp).Thispaperhassupplementarydownloadablematerialavailableatfier10.1109/TEVC.2013.2238240numerousalgorithms.Particularly,fromtheperspectiveofindividualrepresentation,EDAcanbesimplyclassifiedintotwocategories,whichareprobabilisticmodelbuildinggeneticalgorithm(PMBGA,orgeneticalgorithm-basedEDA)[9]andprobabilisticmodelbuildinggeneticprogramming(PMBGP,orgeneticprogramming-basedEDA)[10].PMBGAemploysGA’sstringstructuretorepresentitsindividualsandismainlyappliedtosolveoptimizationproblems,whilePMBGPusesGP’streestructuretorepresentitsindividualsforprogramevolution.Inthispaper,anovelgraph-basedEDAcalledprobabilisticmodelbuildinggeneticnetworkprogramming(PMBGNP)[11]isdescribed.TheaimofdevelopingPMBGNPistoextendEDAfromthestringandtreestructurestothegraphstructure,wherethedirectedgraph(network)structureofgeneticnetworkprogramming(GNP)[12],[13]isemployed.Somepreviousresearchhasshownthesuperiorityofgraph-basedEAsintermsofhigherexpressionabilitythanthatofconventionalGP[12]–[16].GNPisonesuchgraph-basedEA,whichextendsGAandGPbyusingadirectedgraph(network)structuretorepresentitsindividuals.Differentfromtheothergraph-basedEAs,GNPisfirstdesignedforsolvingtheproblemsofcontrollingtheagents’behavior,whileinrecentyearsithasbeenextendedtomanyotherproblems,suchasmultiagentsystems[17],datamining[18],elevatorsystemcontrol[19],intrusiondetectionsystem[20],etc.Therefore,fromtheperspectiveofindividualrepresentation,PMBGNPhashigherexpressionabilitythantheconventionalEDAstoefficientlysolvesomeproblemsduetothedirectedgraphstructureofGNP.Ontheotherhand,anotherchallengeinEDAisusingittoexploremanyotherproblems.ThispaperappliesPMBGNPtosolvetheproblemsofcontrollingtheagents’behavior,wheremostofthecurrentEDAsaredesignedtosolvetheotherproblems.Twoproblemsareselectedtodemonstratetheeffectivenessofthispaper,includingthebenchmarkone,i.e.,theTileworldsystem[21],andarealmobilerobotcontrol,Kheperarobotcontrol[22],[23].Therefore,therearemainlytwoprimaryfeaturesofPMBGNP.1)EDAisextendedtograph-basedEA.2)EDAisappliedtosolvetheproblemsofcontrollingtheagents’behavior.Inaddition,weproposeanextendedalgorithmcalledrein-forcedPMBGNP(RPMBGNP)thatcombinesreinforcement1089-778Xc2013IEEE.Personaluseispermitted,butrepublication/redistributionrequiresIEEEpermission.See:NOVELGRAPH-BASEDEDAANDITSEXTENSIONUSINGREINFORCEMENTLEARNING99learning(RL)[24]andPMBGNPinordertoenhanceitsper-formanc