决策树课件(1)

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

SomeBasicIssuesinCI/KDD/DM/ML/PRandDecision-treealgorithmsCollegeofMathematics&ComputerScienceFuzhouUniversity2013-02-28智能技术课程AbriefintroductiontoCI/KDD/DM/ML/PR•Acquireintelligence(Knowledge)fromdatabycomputationratherthanbyexperts•Anextremelysimpledataset:2-AND2013-02-28智能技术课程x1x2y111100010000Function?(analyticlaw):y=x1x2y=sgn(x1+x2-1.5)Rules?(descriptivelaw):Ifx1=0,theny=0;Ifx2=0,theny=0;Ifx1=1∧x2=1,theny=1;ormorecompactIfx1=1∧x2=1,theny=1,otherwisey=0Classifier?(geometry/PR):f(x1,x2)=x1+x2-1.5f(x1,x2)=0astheseparatingline2013-02-28智能技术课程Completevs.Incomplete•2-ANDiscompletein2-dimBooleanspace;•Adatasetin3-dimBooleanspacewhichisnowincomplete:x1x2x3y1111100001000000Function?(analyticlaw):y=x1x2x3y=sgn(x1+x2+x3-1.5)Rules?(descriptivelaw):Ifx1=1∧x2∧x2=1,theny=1,otherwisey=0;Classifier?(geometry/PR):f(x1,x2,x3)=x1+x2+x3-1.5f(x1,x2,x3)=0astheseparatingsuperlinex1x2x3y111110000100000001112013-02-28智能技术课程Linearlyvs.Nonlinearlyseparable•2-XORproblem,completeinformationin2-dimBooleanspace,butdifferentfrom2-AND!x1x2y110101011000Function?(analyticlaw):y=|x1-x2|Rules?(descriptivelaw):??Classifier?(geometry/PR):f(x1,x2)=?f(x1,x2)=0astheseparatingcurve?×PatternCompactness•Compactnessassumptionforpatterns:a)patternsbelongingtothesamepatternclassformacompactsetinthepatternspace;foranytwopatternswithinthesameclass,thetransitionfromonepatterntotheothercouldbemadealmostevenlydistributed.b)asmallperturbationtoapatterndoesnotalteritsmembershiptothepatternclassConsiderasetin3-Dspace:000,001,010,011100,101,110,1112013-02-28智能技术课程2013-02-28智能技术课程A1={111,101,110,011}A2={000,010,100,001}Easytoseparate,oneplanex1+x2+x3–c=0,1c2;Critical/InteriorpointsOneinteriorpointforeachclassA1={111,001,010,100}A2={000,011,110,101}Hardtoseparate,threeplanes?Findthem!Nointeriorpoints!Allcritical!Any2-D:XORproblem2013-02-28智能技术课程Compact,fewcriticalpointsLesscompact,somemorecriticalpointsToomanycriticalpoints!Propertiesofcompactness:1.Smallproportionofcriticalpoints,largeproportionofinteriorpoints;2.Anytwointeriorpointsinthesameclasscanbesmoothlyconnectedviaaseriesofpointsintheclass;3.Anyinteriorpointhasanappropriateneighborhoodinwhichallpointsbelongtothesameclass.Inreal-worldapplications,thisassumptionisnotalwaystrue!Difficulty!OtherImportantIssues:Featureselection??Featureextraction??Similaritymeasure??Metric(distance)??Transformation(mapping)ofonefeaturespacetoanotherinwhichcompactnessassumptionholdsfortransformedpatterns??VC(Vapnik–Chervonenkis)dimension•ameasureofthecapacityofastatisticalclassificationalgorithm,definedasthecardinalityofthelargestsetofpointsthatthealgorithmcanshatter.•Aclassificationmodelfwithsomeparametervectorsissaidtoshatterasetofdatapointsif,forallassignmentsoflabelstothosepoints,thereexistsassuchthatthemodelfmakesnoerrorswhenevaluatingthatsetofdatapoints.•TheVCdimensionofamodelfisthemaximumnumberofpointsthatcanbearrangedsothatfshattersthem.Moreformally,itish*whereh*isthemaximumhsuchthatsomedatapointsetofcardinalityhcanbeshatteredbyf.2013-02-28智能技术课程),,,(21nxxx2013-02-28智能技术课程Model:astraightlineintheEuclideanplane;VC=3Thelineshouldseparatepositivedatapointsfromnegativedatapoints.Thereexistsetsof3pointsthatcanindeedbeshatteredusingthismodel(any3pointsthatarenotcollinearcanbeshattered).However,nosetof4pointscanbeshattered:byRadon'stheorem,anyfourpointscanbepartitionedintotwosubsetswithintersectingconvexhulls,soitisnotpossibletoseparateoneofthesetwosubsetsfromtheother.Thus,theVCdimensionofthisparticularclassifieris3!Ingeneral,VCofahyper-planeinddimensionsisd+1(anyd+2pts,partitionedinto2subsetswithintersectingconvexhulls!)2013-02-28智能技术课程RadonTheorem•Anysetofd+2pointsinRdcanbepartitionedintotwo(disjoint)setswhoseconvexhullsintersect.ApointintheintersectionofthesehullsiscalledaRadonpointoftheset.2013-02-28智能技术课程Proof:•Consideranysetofd+2pointsind-dimensionalspace.Thenthereexistsasetofmultipliersa1,...,ad+2,notallofwhicharezero,solvingthesystemoflinearequations:•becausethereared+2unknowns(themultipliers)butonlyd+1equationsthattheymustsatisfy(oneforeachcoordinateofthepoints,togetherwithafinalequationrequiringthesumofthemultiplierstobezero).Fixsomeparticularnonzerosolutiona1,...,ad+2.LetIbethesetofpointswithpositivemultipliers,andletJbethesetofpointswithmultipliersthatarenegativeorzero.ThenIandJformtherequiredpartitionofthepointsintotwosubsetswithintersectingconvexhulls.2013-02-28智能技术课程TheconvexhullsofIandJmustintersect,becausetheybothcontainthepointwhereThelefthandsideoftheformulaforpexpressesthispointasaconvexcombinationofthepointsinI,andtherighthandsideexpressesitasaconvexcombinationofthepointsinJ.Therefore,pbelongstobothconvexhulls,completingtheproof.VCofarectangular(//axis)inaplane2013-02-28智能技术课程Note:allpositivepatternsarewithintherectangularwhilenegativesoutside.VC(rectangular//axis)=4why?Onepossiblecase2013-02-28智能技术课程IDHeadacheTemperaturecoughCTBPGetCold?1alittle37YNormalNormalY2Yes37.5Y?HighY3Hurt37.3YabnormalHighN4Hurt39N?NormalY5No37.4Y?AbitN6No36.9N?HighN7Yes37.5Y?HighN8alittle37.3NNormalNormalY9alittle39.5Y?NormalY

1 / 53
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功