LEARNINGTOCONTROLpHPROCESSESATMULTIPLETIMESCALES:PERFORMANCEASSESSMENTINALABORATORYPLANT1S.Syafiie,1F.Tadeoand2E.Martinez1DepartmentofSystemsEngineeringandAutomaticControl.ScienceFaculty,UniversityofValladolid.PradodelaMagdalenas/n.,47011Valladolid.Spain.Email:{syam,fernando}@autom.uva.es2ConsejoNacionaldeInvestigacionesCientíficasyTécnicas,Avellaneda36573000,SantaFe,ArgentinaEmail:ecmarti@ceride.gov.arABSTRACTThisarticlepresentsasolutiontopHcontrolbasedonmodel-freelearningcontrol(MFLC).TheMFLCtechniqueisproposedbecausethealgorithmgivesageneralsolutionforacid-basesystem,yetissimpleenoughforimplementationinexistingcontrolhardware.MFLCisbasedonreinforcementlearning(RL),whichislearningbydirectinteractionwiththeenvironment.TheMFLCalgorithmismodelfreeandsatisfyingincrementalcontrol,inputandoutputconstraints.AnovelsolutionofMFLCusingmulti-stepactions(MSA)ispresented:actionsonmultipletimescalesconsistofseveralidenticalprimitiveactions.Thissolvestheproblemofdeterminingasuitablefixedtimescaletoselectcontrolactionssoastotradeoffaccuracyincontrolagainstlearningcomplexity.AnapplicationofMFLCtoapHprocessatlaboratoryscaleispresented,showingthattheproposedMFLClearnstocontroladequatelytheneutralizationprocess,andmaintaintheprocessinthegoalband.Also,theMFLCcontrollersmoothlymanipulatesthecontrolsignal.KEYWORDS:learningcontrol,goalseekingcontrol,intelligentcontrol,onlinelearning,pHcontrol,processcontrol,neutralizationprocess1.INTRODUCTIONControlofpHinneutralizationprocessesisaubiquitousproblemencounteredinchemicalandbiotechnologicalindustries.Forexample,pHvalueiscontrolledinchemicalprocessessuchasinfermentation,precipitation,oxidation,flotationandsolventextractionprocesses.Also,controllingthepHinfoodandbeverageproductionisanimportantissuesuchasinbread,liquor,beer,soysauce,cheese,andmilkproductionbecausetheenzymaticreactionsareaffectedbythepHvalueoftheprocessandeachhasitsoptimumpHcriticaltotheyield.AnotherpHcontrolapplicationinindustryis,forexample,inthedecompositionsectionofSucono/UOPPhenolProcess.Theacidcatalystthatisaddedinthedecompositionsectionmustbeneutralizedtopreventyieldlossduetosidereactionsandprotectagainstcorrosioninthefractionationsection(Schmidt,2005).InmostpHneutralizationprocessesthecontrolofpHisnotonlyacontrolproblembutalsoinvolvesthechemicalequilibrium,kinetic,thermodynamicandmixingproblems.Thesecharacteristicmustbeconsideredindesigningacontroller(Gustafssonetal.,1995).TheseinherentcharacteristicsofpHprocessesareaninterestingandchallengingoneforresearchestolookupforsolutions.Animportantproblemisthatiftheprocessbuffercapacityvarieswithtime,whichisunknownanddramaticallychangesprocessgain,makingdifficulttodesignacontroller.Forexample,ifeithertheconcentrationintheinletstreamorthecompositionofthefeedchanges,theshapeofthetitrationcurvewillbedrasticallyaltered.Thismeansthattheprocessnonlinearitybecomestime-varyingandthesystemmovesamongseveraltitrationcurves.Also,duetothenonlineardependenceofthepHvalueontheamountoftitratedreactanttheprocesswillbeinherentlynonlinear.Therefore,itisdifficulttodevelopanappropriatemathematicalmodelofthepHprocessfordesigningawell-performingcontroller.Manystrategiesbasedonintelligentcontrolhavebeenproposedbydifferentresearchers,applyingawidearrayoftechniquessuchfuzzycontrol,neuralnetworksordifferentcombinationofintelligentandmodel-basedmethods.Forexample,fuzzylogic(Fuenteetal.,2006)andneuralnetworks(RamirezandJackson,1999)havebeenimplementedonpHcontrol.FuzzyselftuningPIcontrol(Babuskaetal.,2002)andfuzzyinternalmodelcontrol(EdgarandPostlethwaite,2000)havealsobeenimplementedtocontrolpHprocesses.Theapproachescitedabovehaveseveraldifficultiesforpracticalapplications,andalsoaredifficultwhentacklingcontrolsystemdesign.Theresultingcontrolstructuresarecomplexanddifficulttosupervise.Theymightbeconservativeormayhavemanytuningparameters.Thus,tightandrobustpHcontrolisoftendifficulttoachieveduetotheinherentuncertain,nonlinearandtimevaryingcharacteristicsofpHneutralizationprocesses.ThispaperdiscussesanalternativeapproachtosolvethepHcontrolproblembyapplyingModel-FreeLearningControl(MFLC)(Syafiieetal.,2004;2005;2006a;2006b),basedonthereinforcementlearningframework(SuttonandBarto,1998).InstandardRLalgorithms,likeQ-learning,thereisadifficultyforProcessControlimplementations:thesealgorithmsscaleverybadlywithincreasingproblemsize,granularityofstatesorcontrolactions.Amongothers,oneintuitivereasonforthisisthatthenumberofdecisionsfromthestartstatetothegoalstateincreaseexponentially.Accordingtotheproblemsize,tokeeptractablethenumberofdecisiontobetakentoreachthegoalstate,hierarchicalapproachesbasedontemporalabstractionhavebeenproposed.Temporalabstractioncanbedefinedasanexplicitrepresentationofextendedactions,aspoliciestogetherwithaterminationcondition(Precup,2000).Theoriginalone-stepactioniscalledprimitiveaction.SemiMarkovDecisionProcesses(SMDPs)isthetheoryusedtodealwiththetemporalabstractionasaminimalextensionofRLframeworks.SMDPsisaMarkovDecisionProcesses(MDP)appropriateformodelingcontinuous-timediscrete-eventsys