Missing data methods in PCA and PLS Score calculat

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ELSEVIERChemometficsandIntelligentLaboratorySystems35(1996)45-65ChemometricsandintelligentlaboratorysystemsMissingdatamethodsinPCAandPLS:ScorecalculationswithincompleteobservationsPhilipR.C.Nelson,PaulA.Taylor*,JohnF.MacGregorDepartmentofChemicalEngineering,McMasterUniuersity,Hamilton,ON,Canada,L8S4L8Received8June1995;revised27November1995;accepted11January1996AbstractAveryimportantprobleminindustrialapplicationsofPCAandPLSmodels,suchasprocessmodellingormonitoring,istheestimationofscoreswhentheobservationvectorhasmissingmeasurements.Thealternativeofsuspendingtheapplica-tionuntilallmeasurementsareavailableisusuallyunacceptable.TheproblemtreatedinthisworkisthatofestimatingscoresfromanexistingPCAorPLSmodelwhennewobservationvectorsareincomplete.Buildingthemodelwithincompleteob-servationsisnottreatedhere,althoughtheanalysisgiveninthispaperprovidesconsiderableinsightintothisproblem.Sev-eralmethodsforestimatingscoresfromdatawithmissingmeasurementsarepresented,andanalysed:amethod,termedsin-glecomponentprojection,derivedfromtheNIPALSalgorithmformodelbuildingwithmissingdata;amethodofprojectiontothemodelplane;anddatareplacementbytheconditionalmean.Expressionsaredevelopedfortheerrorinthescorescalculatedbyeachmethod.Theerroranalysisisillustratedusingsimulateddatasetsdesignedtohighlightproblemsitua-tions.Alargerindustrialdatasetisalsousedtocomparetheapproaches.Ingeneral,allthemethodsperformreasonablewellwithmoderateamountsofmissingdata(upto20%ofthemeasurements).However,inextremecaseswherecriticalcombi-nationsofmeasurementsaremissing,theconditionalmeanreplacementmethodisgenerallysuperiortotheotherap-proaches.Keywords:PCA;PLS;Missingdata;NIPALSalgorithm;EMalgorithm1.IntroductionTherearemanyreasonswhymeasurementsmaybemissingfromadataset.Missingmeasurementsoccurperiodicallywhensensorsfailoraretakenoff-lineforroutinemaintenance.Inothersituations,measurementsareremovedfromadatasetbecausegrossmeasurementerrorsoccurorsamplesaresimplynotcollectedattherequiredtime.Inthesecases,themeasurementsaremissedatrandomtimes.Inothersituations,missingmea-surementsoccuronaveryregularbasis.Acommonexampleoccurswhensensorshavedifferentsamplingperi-ods.*Correspondingauthor.Fax.:+19055211350.0169-7439/96/$15.00Copyright©1996ElsevierScienceB.V.Allrightsreserved.PHS0169-7439(96)00007-X46P.R.C.Nelsonetal./ChemometricsandIntelligentLaboratorySystems35(1996)45-65PCAandPLShavebeenwidelyusedtodevelopmodelsfromdatasetscomposedofobservationsonlargenumbersofhighlycorrelatedvariables.Inmanyofthesesituations,particularlythoseinvolvingindustrialpro-cesses,missingmeasurementsareacommonoccurrence.Toinsistonusingonlycompletedatasetswhenbuild-ingorapplyingPCAorPLSmodelswouldentailthrowingawaylargeamountsofthedata.Therefore,itisim-portantthatefficientmethodsforhandlingmissingdatabeavailableforanalysingandbuildingmultivariatemodelsfromsuchdata.Onceamodelhasbeenbuilt,itcanbeappliedtofutureprocessdataininferentialcontrolschemestopredictprocessresponses[1,2],orinmultivariatestatisticalprocesscontrolschemestomonitoranddiagnosefuturepro-cessoperatingperformance[3-8].Sincesomefuturemultivariateobservationswillalsohavemissingmeasure-ments,theseapplicationswouldbeoflimitedvalueunlessmethodswereavailabletohandlemissingdata.Inthispaperweconsiderthesecondproblem,thatofusingfuturemultivariateobservationswithmissingdatatoestimatelatentvariablescoresandtopredictresponsesfromanexistingPCAorPLSmodel.Weanalysethepropertiesofvariousalgorithmsforhandlingmissingdatawhentheunderlyingmodelcanbeassumedtobefixedandknown.Theadditionalissuesinvolvediniterativelybuildingthesemodelsfromdatasetswithmissingmeasurementswillbetreatedinasubsequentpaper.Duringmodelbuilding,wheretheloadingvectorsareunknown,missingdataisoftentreatedusingtheNI-PALSalgorithmwhichcomputesonevectoratatime[9,10].Onceamodelhasbeenbuilt,andtheloadingvec-torsdefined,missingdatacanbetreatedinPCAbysimultaneouslyaccountingfortheireffectsinalllatentvari-abledimensions[9,10]byprojectiontothemodelplane.Inthispaper,wefirstdiscussamethodderivedfromtheNIPALSalgorithmformodelbuildingwithmissingdata.Wedesignateitthesinglecomponentprojectionmethod.Wedevelopexpressionsforthescoreestimationerrorarisingfromthemissingdatawiththisalgorithm.Thisanalysisrevealshowerrorsenterandpropagateinthesinglecomponentprojectionmethod,therebyprovid-ingjustificationforusingsimultaneousprojectionmethodsandsupplyinginsightintothesourcesoferrorthatariseduringmodelbuildingwithsequentialmethods.Twoapproacheswhicharenotlimitedtoconsideringasingledirectionatatimearethentreated:(i)projectiontothemodelplaneand(ii)datareplacementusingtheconditionalmean.Themeansquaredscoreestimationerrorsarecalculatedforeachofthemethodswhenap-pliedtosimulationexamplescarefullyconstructedtoaccentuatetheeffectsofcertaintypesoferrors.Finally,themethodsareappliedtoanindustrialsettoillustratehowthemethodsworkinpractice.2.NomenclatureLowercaseboldvariables,bothRomanandGreek,arecolumnvectorsanduppercaseones,matrices.Asu-perscriptasteriskindicatesavectorormatrixwithrowscorrespondingtomissingmea

1 / 21
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功