基于改进Q学习算法的发电商竞价策略

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

上海交通大学硕士学位论文基于改进Q学习算法的发电商竞价策略姓名:陆黎申请学位级别:硕士专业:电力系统及其自动化指导教师:蒋传文20070201Q4Q,QQIEEE-14IEEE-30QQQ5IEEE-14IEEE-30:QQ6STRATEGICBIDDINGOFELECTRICITYSUPPLIERINCOMPETITIVEMARKETBASEDONIMPROVEDQ-LEARNINGABSTRACTTheinnovationofelectricpowerindustryisbeinggraduallydeepeningintheworld.Atpresent,ourcountryisconstructingtheregionpowermarketpositively,andhadcarriedongenerationunbundlingandthecompetitionbidding.Thegenerationfieldmarketimplementscompetitionmechanism.Asahottopicresearchedinpowermarket,toresearchandtodiscussgenerationbiddingstrategyhavetheimportanttheoryandthepracticalvalue.Currentlythestrategicbiddingbehaviorsinthelongtermtradingaredifficulttobemathematicmodeled.Forthecomplexityofthestrategicinteractionamongthemultimarketparticipants,themultiagentssystemhasaninspiringoutlookinsuchresearcharea.ThisthesisdevelopedamodelbasingontheReinforcementLearningtosimulatingthelongtermtradinginanoligopolyelectricitymarket.Themodelcanbeusedtodefinetheoptimalbiddingstrategyforeachproducerand,aswell,tofindthemarketequilibriumandassessingthemarketperformances.FuzzyreasoningandreinforcementlearningarethecommonlyusedQ7methodsinthemultiagentssystem.Thethesisbasesontheplusesandminusesoffuzzyreasoningandreinforcementlearning,andhasputforwardthefuzzyQ-learningmethodtostudythebiddingstrategy.Inordertoimprovetheglobal-convergenceandergodicityofthisalgorithm,chaoticmethodisproposedinthisthesisandisusedinfuzzyQ-learningalgorithm.ThisarticlefinallycarriesonthesimulationandcomputationbyIEEE-14andIEEE-30bussystem.TheresultsverifythefuzzyQ-learningalgorithmandchaoticfuzzyQ-learningalgorithmintroducedinthispapertoresearchgenerationbiddingstrategyarefeasibleandefficient.Riskevaluationisfinallyproposedinthispaper.Afteranalyzingtheplusesandminusesofseveralevaluationmethods,newriskevaluationmodelconsideringtheuncertaintiesinthelongtermtradingisconstructedinthepaper.AnditiscarriedonthesimulationandcomputationbyIEEE-14andIEEE-30bussystem.KEYWORDS:PowerMarket,BiddingStrategy,FuzzyQ-learning,Chaotic,RiskevaluationQ2200722Q3200722200722Q11.1190220023Q21.220901.2.120701982Q3(Pool)19901990CentralElectricGenerationBureau,CEGB19904NGCNPPGNE12RegionalElectricPower,RECNGC1MW1.11.1Fig.1.1reformedpowermarketstructureinEngland30%55%15%199080%Q470%ISO2%-7%199019915NationalGridManagementCouncilNGMC1.2...............1.2Fig.1.2reformedpowermarketstructureinAustralia19901994199519841994Q5199885%PJM1.2.220801995Q61997119988199812242000200061069200012200151+66200112820021015200212292003320Q7[1]90:1998199862000120018200362[2]66[3]20041152004518[4]Q8:()1.31.3.120F.C.Schweppe(SpotpricingofElectricity)[5-12](SpotPricingofElectricity):[13-15]Q9[16-42][43-46](EMS)[47-50][51-55];1.3.2,,,,,,,Q10(Poolmarket)[16]2000[17][16][18]2000:1[19]--[20][21][21-23][23][24][25]-Q11(MCP)2[26][27]MonteCarlo[28][29]Markov[30]3[31][32][33][34]:[35]Q12[36][37][38]4[39]Markov,,,,[40]3,3,FCM(fuzzy-c-mean),,,,[41],,,;,[42],,,(NashEquilibrium,NE),NE/Q13(autonomousintelligentagents)RichterSheble()“”“”“”(exploring)(exploiting)(ReinforcementLearning,RL)Q“”1.4QQDynamicProgrammingMonteCarloMethodtemporal-differenceQQ-learningQQQQIEEE-14IEEE-30QQQIEEE-14IEEE-30Q14IEEE-14IEEE-30Q152.12.1.1Reward[56]2.1arssra2.1Fig.2.1basictypeofreinforcementlearningAgentQ16123r4r1234TDMTemporalDifferenceMethod52.1.2[56]1PolicyPolicySAπSAFπFQ17*πwhattodo2Rs,aRwhatisgood3tststaπVπ(s)Vπ(s)γ01γ≤≤2123(){......}ttttVsErrrssππγγ+++=+++==11{()}tttErVsssππγ+++==''''(,)[()aassssassaPRVsππγ+∑∑]2.1QWatkinsQQQ(s,a)sa10(,){,}{,}kttttkttkQsaERssaaErssaaπππγ∞++=======∑2.2Q184planning2.1.3πs()Vsπ()Qsπ2.32.2Fig.2.2Thepurposeofreinforcementlearningtrt(01)γγ≤≤*()Vs*(,)Qsa**()argmax()sVsπ=2.3**()argmax(,)sQsaπ=2.4findapolicyπ:S×A[0,1]thatmaximizethevalue(expectedfuturereward)foreachstate1120(){............}ttVsErrrssπγγ−=+++=(,)Qsaπ=11200{............,}ttErrrssaaγγ−+++==*()max(())VsVsππ=*(,)max((,))QsaQsaππ=Q192.2DynamicProgrammingMonteCarloMethodtemporal-differenceQQ-learning2.2.1Bellman[57]19571961Minsky[58]Sameul1989WatkinsDP2.1(,)saπ'assPSS012,,......VVV2.5111(){()}ktkttVsErVsssπγ+++←+=''''(,)[()]aakssssassaPRVsπγ←+∑∑2.5kVVπ=Bellman2.12.2.2DPQ20DPepisodesstssFVMCfirstvisitMCEVMCeveryvisitMCsstsVπ[59]MC2.2.3temporal-difference[60]MCDP[59]MCDPTDTD02.611()()[()()]tttttVsVsrVsVsαγ++←++−2.6TD11()ttrVsγ+++()tVs1()tVs+DP(){}ttVsERssππ==(2.7)=10{}ktktkErssπγ∞++==∑Q21=120{}kttktkErrssπγγ∞+++=+=∑=11{()}tttErVsssππγ+++=(2.8)MC2.7DP2.8TDMCDPTD0TDone-stepTDMCTD0MCn-stepTDTD0(1)11()ttttRrVsγ++=+1()ttVs+2123......rtttrrrrγγγ−−+++++2-stepn-stepTD(2)2122()tttttRrrVsγγ+++=++()2123......()nnttttttnRrrrVsγγγ++++=++++n-stepTDTDTDTD[61]TDnnQQ2.3QQWatkins1989QMarkovQ22QMDPMarkovdecisionprocessesQQ[62]2.3.1MDPMDPMarkovMDPMarkovMarkovSATRSATS×ATRS×ARtStsAtatr1ts+Ttsta1ts+tsta1ts+1,()ttsstPa+0{}ktkkErγ∞+=∑γ01γ≤≤PRDynamicProgrammingDPMDPPRQQ2.3.2QQQ*(,)QxaQxaπQ23(,)(,)()()xyySQxaRxaPaVyππγ∈=+∑2.9()max(,)bAVyQybππ∈=2.10(,)(,)RxaErxa=2.11TD0QQ11(,)(1)(,)[()]tttttttQxaQxarVyααγ−−←−++2.12tα11()max{(,)}ttbVyQyb−−=2.13ax2.121(,)kQxa−Q1()tttrVyγ−+tα*Q**()argmax(,)sQsa

1 / 63
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功