web数据挖掘__4监督学习2.ppt

kyan_7
1 ℃
2020-01-29

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

SupervisedLearning(2)RoadMap2BasicconceptsDecisiontreeinductionEvaluationofclassifiersClassificationusingassociationrulesNaïveBayesianclassificationSupportvectormachinesK-nearestneighborEnsemblemethods:BaggingandBoostingSummaryBayesianTheorem:Basics假设X是未知分类标号的样本数据H代表某种假设，例如X属于分类CP(H|X):给定样本数据X，假设H成立的概率例如，假设样本数据由各种水果组成，每种水果都可以用形状和颜色来描述。如果用X代表红色并且是圆的，H代表X属于苹果这个假设，则P(H|X)表示，已知X是红色并且是圆的，则X是苹果的概率。3BayesianTheorem:BasicsP(H):拿出任一个水果，不管它什么颜色，也不管它什么形状，它属于苹果的概率P(X):拿出任一个水果，不管它是什么水果，它是红色并且是圆的概率P(X|H):一个水果，已知它是一个苹果，则它是红色并且是圆的概率。4BayesianTheorem:Basics现在的问题是，知道数据集里每个水果的颜色和形状，看它属于什么水果，求出属于每种水果的概率，选其中概率最大的。也就是要算：P(H|X)但事实上，其他三个概率，P(H)、P(X)、P(X|H)都可以由已知数据得出，而P(H|X)无法从已知数据得出Bayes理论可以帮助我们：)()()|()|(XPHPHXPXHP5NaïveBayesClassifier每个数据样本用一个n维特征向量表示，描述由属性对样本的n个度量。假定有m个类。给定一个未知的数据样本X（即，没有类标号），分类法将预测X属于具有最高后验概率（条件X下）的类。即，朴素贝叶斯分类将未知的样本分配给类Ci，当且仅当：这样，我们最大化。其最大的类Ci称为最大后验假定。根据贝叶斯定理:.1)|()|(ijmjXCPXCPji)|(XCPi)()()|()|(XPiCPiCXPXCPi6NaïveBayesClassifier由于P(X)对于所有类为常数，只需要最大即可。如果类的先验概率未知，则通常假定这些类是等概率的；即，。并据此只对最大化。否则，我们最大化。类的先验概率可以用计算；其中，si是类C中的训练样本数，而s是训练样本总数。)()|(iiCPCXP)(...)()(21mCPCPCP)|(iCXP)()|(iiCPCXPssCPii)()()()|()|(XPiCPiCXPXCPi7Conditionalindependenceassumption8AllattributesareconditionallyindependentgiventheclassC=cj.Formally,weassume,Pr(A1=a1|A2=a2,...,A|A|=a|A|,C=cj)=Pr(A1=a1|C=cj)andsoonforA2throughA|A|.I.e.,||1||||11)|Pr()|,...,Pr(AijiiiAAcCaAcCaAaAFinalnaïveBayesianclassifierWearedone!HowdoweestimateP(Ai=ai|C=cj)?Easy!.||1||1||1||||11)|Pr()Pr()|Pr()Pr(),...,|Pr(CrAiriirAijiijAAjcCaAcCcCaAcCaAaAcC9Classifyatestinstance10Ifweonlyneedadecisiononthemostprobableclassforthetestinstance,weonlyneedthenumeratorasitsdenominatoristhesameforeveryclass.Thus,givenatestexample,wecomputethefollowingtodecidethemostprobableclassforthetestinstance||1)|Pr()Pr(maxargAijiijccCaAccjAnexample11AnExample(cont…)12ForC=t,wehaveForclassC=f,wehaveC=tismoreprobable.tisthefinalclass.252525221)|Pr()Pr(21jjjtCaAtC251525121)|Pr()Pr(21jjjfCaAfCTrainingdatasetageincomestudentcredit_ratingbuys_computer=30highnofairno=30highnoexcellentno30…40highnofairyes40mediumnofairyes40lowyesfairyes40lowyesexcellentno31…40lowyesexcellentyes=30mediumnofairno=30lowyesfairyes40mediumyesfairyes=30mediumyesexcellentyes31…40mediumnoexcellentyes31…40highyesfairyes40mediumnoexcellentnoClass:C1:buys_computer=‘yes’C2:buys_computer=‘no’DatasampleX=(age=30,Income=medium,Student=yesCredit_rating=Fair)13NaïveBayesianClassifier:AnExampleComputeP(X|Ci)foreachclassP(buys_computer=“yes”)=9/14=0.643P(buys_computer=“no”)=5/14=0.357P(age=“30”|buys_computer=“yes”)=2/9=0.222P(age=“30”|buys_computer=“no”)=3/5=0.6P(income=“medium”|buys_computer=“yes”)=4/9=0.444P(income=“medium”|buys_computer=“no”)=2/5=0.4P(student=“yes”|buys_computer=“yes)=6/9=0.667P(student=“yes”|buys_computer=“no”)=1/5=0.2P(credit_rating=“fair”|buys_computer=“yes”)=6/9=0.667P(credit_rating=“fair”|buys_computer=“no”)=2/5=0.4X=(age=30,income=medium,student=yes,credit_rating=fair)P(X|Ci):P(X|buys_computer=“yes”)=0.222x0.444x0.667x0.667=0.044P(X|buys_computer=“no”)=0.6x0.4x0.2x0.4=0.019P(X|Ci)*P(Ci):P(X|buys_computer=“yes”)*P(buys_computer=“yes”)=0.044x0.643=0.028P(X|buys_computer=“no”)*P(buys_computer=“no”)=0.019x0.357=0.007Therefore,Xbelongstoclass“buys_computer=yes”ageincomestudentcredit_ratingbuys_computer=30highnofairno=30highnoexcellentno30…40highnofairyes40mediumnofairyes40lowyesfairyes40lowyesexcellentno31…40lowyesexcellentyes=30mediumnofairno=30lowyesfairyes40mediumyesfairyes=30mediumyesexcellentyes31…40mediumnoexcellentyes31…40highyesfairyes40mediumnoexcellentno14Additionalissues15Numericattributes:NaïveBayesianlearningassumesthatallattributesarecategorical.Numericattributesneedtobediscretized.Zerocounts:Anparticularattributevalueneveroccurstogetherwithaclassinthetrainingset.Weneedsmoothing.Missingvalues:IgnoredijijjiinnncCaA)|Pr(Avoidingthe0-ProbabilityProblemNaïveBayesianpredictionrequireseachconditionalprob.benon-zero.Otherwise,thepredictedprob.willbezeroEx.Supposeadatasetwith1000tuples,income=low(0),income=medium(990),andincome=high(10),UseLaplaciancorrection(orLaplacianestimator)Adding1toeachcaseProb(income=low)=1/1003Prob(income=medium)=991/1003Prob(income=high)=11/1003The“corrected”prob.estimatesareclosetotheir“uncorrected”counterpartsnkCixkPCiXP1)|()|(16OnnaïveBayesianclassifier17Advantages:EasytoimplementVeryefficientGoodresultsobtainedinmanyapplicationsDisadvantagesAssumption:classconditionalindependence,thereforelossofaccuracywhentheassumptionisseriouslyviolated(thosehighlycorrelateddatasets)RoadMap18BasicconceptsDecisiontreeinductionEvaluationofclassifiersClassificationusingassociationrulesNaïveBayesianclassificationSupportvectormachinesK-nearestneighborEnsemblemethods:BaggingandBoostingSummaryIntroduction19SupportvectormachineswereinventedbyV.Vapnikandhisco-workersin1970sinRussiaandbecameknowntotheWestin1992.SVMsarelinearclassifiersthatfindahyperplanetosepar