an efficeitn rough feature selection algorithm wit

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

InternationalJournalofApproximateReasoning53(2012)912–926ContentslistsavailableatSciVerseScienceDirectInternationalJournalofApproximateReasoningjournalhomepage:ficientroughfeatureselectionalgorithmwithamulti-granulationviewJiyeLianga,∗,FengWanga,b,ChuangyinDangb,YuhuaQianaaKeyLaboratoryofComputationalIntelligenceandChineseInformationProcessingofMinistryofEducation,SchoolofComputerandInformationTechnology,ShanxiUniversity,Taiyuan030006,Shanxi,ChinabDepartmentofSystemEngineeringandEngineeringManagement,CityUniversityofHongKong,HongKongARTICLEINFOABSTRACTArticlehistory:Received15April2011Receivedinrevisedform27February2012Accepted29February2012Availableonline13March2012Keywords:FeatureselectionMulti-granulationviewRoughsettheoryLarge-scaledatasetsFeatureselectionisachallengingprobleminmanyareassuchaspatternrecognition,ma-chinelearninganddatamining.Roughsettheory,asavalidsoftcomputingtooltoanalyzevarioustypesofdata,hasbeenwidelyappliedtoselecthelpfulfeatures(alsocalledattributereduction).Inroughsettheory,manyfeatureselectionalgorithmshavebeendevelopedintheliteratures,however,theyareverytime-consumingwhendatasetsareinalargescale.Toovercomethislimitation,weproposeinthispaperanefficientroughfeatureselectionalgo-rithmforlarge-scaledatasets,whichisstimulatedfrommulti-granulation.Asub-tableofadatasetcanbeconsideredasasmallgranularity.Givenalarge-scaledataset,thealgorithmfirstselectsdifferentsmallgranularitiesandthenestimateoneachsmallgranularitythereductoftheoriginaldataset.Fusingalloftheestimatesonsmallgranularitiestogether,thealgorithmcangetanapproximatereduct.Becauseofthatthetotaltimespentoncomputingreductsforsub-tablesismuchlessthanthatfortheoriginallarge-scaleone,thealgorithmyieldsinamuchlessamountoftimeafeaturesubset(theapproximatereduct).Accordingtoseveraldecisionperformancemeasures,experimentalresultsshowthattheproposedalgorithmisfeasibleandefficientforlarge-scaledatasets.©2012ElsevierInc.Allrightsreserved.1.IntroductionAsacommontechniquefordatapreprocessinginpatternrecognition,machinelearninganddatamining,featureselectionhasattractedmuchattentioninrecentyears[5–7,20,21,23,26,30,40].Inpractices,databasesincreasequicklynotonlyintherows(objects)butalsointhecolumn(features)nowadays.Tens,hundredseventhousandsoffeaturesarestoredindatabasesinsomereal-worldapplications,whichhasresultedindatawithhighdimension.However,onlyalimitedamountoffeaturesisusefulinpractice,thatis,anexcessiveamountoffeaturesmaycauseasignificantslowdowninthelearningprocessandirrelevantorredundantfeaturesmaydeterioratetheperformanceoflearningalgorithms[12,13,38].Toeasethissituation,itisdesirabletoreduceredundantfeaturesandselectinformativefeaturesfordecreasingthecostofmeasuring,storingandtransmitting,shorteningtheprocesstimeandgainingmorecompactclassificationmodelswithabettergeneralization.Roughsettheory,proposedbyPawlak[31–33],isarelativelynewsoftcomputingtoolfortheanalysisofavaguedescriptionofanobject,andhasbecomeapopularmathematicalframeworkforpatternrecognition,imageprocessing,featureselection,ruleextraction,neuro-computing,conflictanalysis,decisionsupporting,granularcomputing,dataminingandknowledgediscoveryfromlargedatasets[3,4,8,28,36,50,51].Inroughsettheory,animportantconceptisattributereduction(orapproximatereduct),whichcanbeconsideredakindofspecificfeatureselection.Inotherwords,basedonroughsettheory,∗Correspondingauthor.Tel./fax:+8603517018176.E-mailaddresses:ljy@sxu.edu.cn(J.Liang),sxuwangfeng@126.com(F.Wang),mecdang@cityu.edu.hk(C.Dang),jinchengqyh@126.com(Y.Qian).0888-613X/$-seefrontmatter©2012ElsevierInc.Allrightsreserved.(2012)912–926913onecanselectusefulfeaturesfromagivendatatable.Attributereductiondoesnotattempttomaximizetheclassseparabilitybutrathertoretainthediscernibleabilityoforiginalfeaturesfortheobjectsfromtheuniverse[15,16,41,44,52].Asoneofthemostimportantresearchtopicsalongwiththefastdevelopmentofroughsettheory,attributereductionhasarousedwideconcernandstudy,andmanyattributereductiontechniqueshavebeendevelopedinlasttwentyyears.Applyingdiscernibilitymatrix,Skowron[42]proposedanattributereductionalgorithmbycomputingdisjunctivenormalform,whichisabletoobtainallattributereductsofagiventablewhereasfindingtheminimalreductofadecisiontableisanNP-hardproblem.KryszkiewiczandLasek[22]proposedanapproachtocomputingtheminimalsetofattributesthatfunctionallydetermineadecisionattribute.Thesetwoattributereductionalgorithmsareusuallycomputationallyveryexpensive,especiallyfordealingwithlarge-scaledatasetsofhighdimensions.Therefore,toovercomethisdifficulty,manyheuristicattributereductionalgorithmshavebeendevelopedinroughsettheory[11,13,24,25,39,35,43,45,46,48].Aheuristicattributereductionalgorithmcanextractasinglereductfromagiventableinarelativelyshorttime.Inordertofurtherreducecomputationaltime,basedonfourkindsofcommonheuristicreductionalgorithms,Qianetal.[37]developedacommonacceleratortoimprovethetimeefficiencyofaheuristic

1 / 15
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功