080122491657MergePDF

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

MachineLearningBasicTheoryLectureNotes1:BasicTheoryProfessor:ZhihuaZhangScribe:ChengChen,LuoLuo,CongXie1IntroductionInmachinelearning,dataistypicallyexpressedinamatrixform.Supposewehavensamples,pvariables(orfeatures).ThenwehaveX=x11x12x1px21x22x2p......xn1xn2xnpnpTheithsamplecanbedenotedasXi=(X11;X12;:::;X1p)T.Machinelearningismainlytosolvethefollowingproblems:(1)DimensionReduction:Dimensionreductionistheprocessofreducingthenumberofrandomvariables(orfeatures)underconsideration.Formally,letXi2Rp,wewanttofindZi2Rq(qp)topresentXi.Ifweuselineartransformation,thenweneedtofindamatrixAsuchthatZi=AXi.NotethatAshouldbefullrowrank.Ifweusenonlineartransformation,thenweneedtofindanonlinearfunctionfsuchthatZi=f(Xi).(2)Clustering:Clusteringisthetaskofgroupingasetofobjectsinsuchawaythatobjectsinthesamegroup(calledacluster)aremoresimilar(insomesenseoranother)toeachotherthantothoseinothergroups(clusters).Wecanviewnsamplesasnpoints,andourobjectistoclusterthemintokclusters.(3)Classi cation:Classificationistheproblemofidentifyingtowhichofasetofcat-egoriesanewobservationbelongs,onthebasisofatrainingsetofdatacontainingobservations(orinstances)whosecategorymembershipisknown.Formally,inthetrainingset,wehavealabelYiforeachXi,whereYi2C,Cisanon-emptyfiniteset.IfYi2f1;1gorf0;1g,it’sabinaryclassificationproblem.IfYi2f1;2;:::;kg,it’samulti-classclassificationproblem.Therearealsoproblemsthatoneobservationbelongstomorethanonecategoryandtheyarecalledmulti-labelormulti-outputclassification.(4)Regression:RegressionisaparticularclassificationprobleminwhichthelabelYi2R.(5)Ranking:alsocalledisotonicregression(IR).Isotonicregressioninvolvesfindingaweightedleast-squaresfitx2Rntoavectora2Rnwithweightsvectorw2Rnsubjecttoasetofnon-contradictoryconstraintsofkindxixj.1-1Notethat(1),(2)areunsupervisedlearning,(3),(4),(5)aresupervisedlearning.Unsu-pervisedlearningisthatoftryingtofindhiddenstructureinunlabelleddata.Supervisedlearningisthemachinelearningtaskofinferringafunctionfromlabelledtrainingdata.Forsupervisedlearning,thedataisusuallysplitintotwoorthreeparts.(1)Trainingdata:Asetofexamplesusedforlearning,thatistofittheparameters(e.g.,weightsforneuralnetworks)ofthemodel.(2)Validationdata:Sometimes,wealsoneedavalidationsettotunethemodel,forexampletochoosethenumberofhiddenunitsinaneuralnetworkorforpruningadecisiontree.Itisusuallyusedtopreventoverfittingandenhancethegeneralizationability.(3)Testdata:Thisdatasetisusedonlyfortestingthefinalsolutioninordertoconfirmtheactualperformance.1.1Frequentist'sviewvs.Bayesianview1.1.1Frequentist'sviewThefrequentisticapproachviewsthemodelparametersasunknownconstantsandestimatesthembymatchingthemodeltothetrainingdatausinganappropriatemetric.Example1.1Supposewehavenpairsofsamplesf(xi;yi)gni=1,xi2Rp,yi2Randwewantto talinearfunctionxTia(Morestrictly,itshouldbexTia+borincludeaconstantvariable1inxi)topredictyj.Usingleastsquares,wehavelossfunctionL=∑ni=1(yixTia)2,whereaisanunknown xedparameter.Wecansolveabyminimizingthelossfunction.Usingmaximumlikelihoodestimation,letyiN(xTia;2),namely,p(yijxi)=1(2)12e(yixTia)222:Sotheloglikelihoodis(assumingthesamplesareindependent)l=logn∏i=1p(yijxi):Wecansolveabymaximizingthejointlikelihood.Undertheaboveconditions,youcanprovethatmaximumlikelihoodestimationisthesameasleastsquares.1.1.2BayesianviewTheBayesianapproachviewsthemodelparametersasarandomvariableandestimatesthembyusingBayes’theorem.Example1.2Let'scontinueexample1.1,letyiN(xTia;2)again.Hereaandarerandomvariables,notconstants.LetaN(0;2),2Γ( ; ).OurinterestistheposteriorprobabilityP(ajxi;yi)/P(xi;yija)P(a).WecanusemaximumposteriorestimationorBayesianestimationtosolvea.1-21.2Parametricsvs.NonparametricsInaparametricalmodel,thenumberofparametersisfixedonceandforall,independenttothenumberofthetrainingdata.Inanonparametricalmodel,thenumberofparameterscanchangeaccordingtothenumberoftrainingdata.Example1.3InNearestNeighbormethod,thenumberofparametersisthenumberoftrainingsamples.Sothismodelisnonparametricalmodel.InLogisticRegression,thenumberofparametersisthedimensionofthetrainingsamples.Sothismodelisparametricalmodel.2RandomVariablesandBasicProperties2.1RandomVariablesDe nition2.1LetX=(X1;;Xp)Tbearandomvector.Thecumulativedistributionfunction(C.D.F.)associatedwithXis:FX:FX(x)=Pr(Xx)=Pr(X1x1;;Xpxp)Therearetwokindsofrandomvariables:AbsolutelycontinuousDiscrete2.1.1AbsolutelyContinuousRandomVariablesDe nition2.2Xisaabsolutelycontinuousrandomvariableifthereexistsaprobabilitydensityfunction(P.D.F.)fX(x)suchthatFX(x)=∫x1fX(u)duforabsolutelycontinuousvariableswhereFX(1)=∫+11f(u)du=1.2.1.2DiscreteRandomVariablesDe nition2.3Xisadiscreterandomvariableifthereexistsaprobabilitymassfunction(P.M.F.)suchthatXtakescountable(or nite)setofpointsfXj;j=1;ginwhichP.M.F.is{fX(xj)=P(Xj=xj)fX(x)=0;otherwisewhereP(x2D)=∫Df(u)duandDRp.2.1.3SupportSetDe nition2.4SupportsetS=fx2Rp;fX(x)0g1-32.1.4MarginalandConditionalDistributionForx=(x1x2)wherex12Rk,x22Rpk,De nition2.5Marginaldistribution:IfthereexistsPr(X1x1)=FX(x1;;xk;1k+1

1 / 88
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功