080122491657MergePDF

syhui51
0 ℃
2019-09-30

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

MachineLearningBasicTheoryLectureNotes1:BasicTheoryProfessor:ZhihuaZhangScribe:ChengChen,LuoLuo,CongXie1IntroductionInmachinelearning,dataistypicallyexpressedinamatrixform.Supposewehavensamples,pvariables(orfeatures).ThenwehaveX=x11x12x1px21x22x2p......xn1xn2xnpnpTheithsamplecanbedenotedasXi=(X11;X12;:::;X1p)T.Machinelearningismainlytosolvethefollowingproblems:(1)DimensionReduction:Dimensionreductionistheprocessofreducingthenumberofrandomvariables(orfeatures)underconsideration.Formally,letXi2Rp,wewanttoﬁndZi2Rq(qp)topresentXi.Ifweuselineartransformation,thenweneedtoﬁndamatrixAsuchthatZi=AXi.NotethatAshouldbefullrowrank.Ifweusenonlineartransformation,thenweneedtoﬁndanonlinearfunctionfsuchthatZi=f(Xi).(2)Clustering:Clusteringisthetaskofgroupingasetofobjectsinsuchawaythatobjectsinthesamegroup(calledacluster)aremoresimilar(insomesenseoranother)toeachotherthantothoseinothergroups(clusters).Wecanviewnsamplesasnpoints,andourobjectistoclusterthemintokclusters.(3)Classication:Classiﬁcationistheproblemofidentifyingtowhichofasetofcat-egoriesanewobservationbelongs,onthebasisofatrainingsetofdatacontainingobservations(orinstances)whosecategorymembershipisknown.Formally,inthetrainingset,wehavealabelYiforeachXi,whereYi2C,Cisanon-emptyﬁniteset.IfYi2f1;1gorf0;1g,it’sabinaryclassiﬁcationproblem.IfYi2f1;2;:::;kg,it’samulti-classclassiﬁcationproblem.Therearealsoproblemsthatoneobservationbelongstomorethanonecategoryandtheyarecalledmulti-labelormulti-outputclassiﬁcation.(4)Regression:RegressionisaparticularclassiﬁcationprobleminwhichthelabelYi2R.(5)Ranking:alsocalledisotonicregression(IR).Isotonicregressioninvolvesﬁndingaweightedleast-squaresﬁtx2Rntoavectora2Rnwithweightsvectorw2Rnsubjecttoasetofnon-contradictoryconstraintsofkindxixj.1-1Notethat(1),(2)areunsupervisedlearning,(3),(4),(5)aresupervisedlearning.Unsu-pervisedlearningisthatoftryingtoﬁndhiddenstructureinunlabelleddata.Supervisedlearningisthemachinelearningtaskofinferringafunctionfromlabelledtrainingdata.Forsupervisedlearning,thedataisusuallysplitintotwoorthreeparts.(1)Trainingdata:Asetofexamplesusedforlearning,thatistoﬁttheparameters(e.g.,weightsforneuralnetworks)ofthemodel.(2)Validationdata:Sometimes,wealsoneedavalidationsettotunethemodel,forexampletochoosethenumberofhiddenunitsinaneuralnetworkorforpruningadecisiontree.Itisusuallyusedtopreventoverﬁttingandenhancethegeneralizationability.(3)Testdata:Thisdatasetisusedonlyfortestingtheﬁnalsolutioninordertoconﬁrmtheactualperformance.1.1Frequentist'sviewvs.Bayesianview1.1.1Frequentist'sviewThefrequentisticapproachviewsthemodelparametersasunknownconstantsandestimatesthembymatchingthemodeltothetrainingdatausinganappropriatemetric.Example1.1Supposewehavenpairsofsamplesf(xi;yi)gni=1,xi2Rp,yi2RandwewanttotalinearfunctionxTia(Morestrictly,itshouldbexTia+borincludeaconstantvariable1inxi)topredictyj.Usingleastsquares,wehavelossfunctionL=∑ni=1(yixTia)2,whereaisanunknownxedparameter.Wecansolveabyminimizingthelossfunction.Usingmaximumlikelihoodestimation,letyiN(xTia;2),namely,p(yijxi)=1(2)12e(yixTia)222:Sotheloglikelihoodis(assumingthesamplesareindependent)l=logn∏i=1p(yijxi):Wecansolveabymaximizingthejointlikelihood.Undertheaboveconditions,youcanprovethatmaximumlikelihoodestimationisthesameasleastsquares.1.1.2BayesianviewTheBayesianapproachviewsthemodelparametersasarandomvariableandestimatesthembyusingBayes’theorem.Example1.2Let'scontinueexample1.1,letyiN(xTia;2)again.Hereaandarerandomvariables,notconstants.LetaN(0;2),2Γ(;).OurinterestistheposteriorprobabilityP(ajxi;yi)/P(xi;yija)P(a).WecanusemaximumposteriorestimationorBayesianestimationtosolvea.1-21.2Parametricsvs.NonparametricsInaparametricalmodel,thenumberofparametersisﬁxedonceandforall,independenttothenumberofthetrainingdata.Inanonparametricalmodel,thenumberofparameterscanchangeaccordingtothenumberoftrainingdata.Example1.3InNearestNeighbormethod,thenumberofparametersisthenumberoftrainingsamples.Sothismodelisnonparametricalmodel.InLogisticRegression,thenumberofparametersisthedimensionofthetrainingsamples.Sothismodelisparametricalmodel.2RandomVariablesandBasicProperties2.1RandomVariablesDenition2.1LetX=(X1;;Xp)Tbearandomvector.Thecumulativedistributionfunction(C.D.F.)associatedwithXis:FX:FX(x)=Pr(Xx)=Pr(X1x1;;Xpxp)Therearetwokindsofrandomvariables:AbsolutelycontinuousDiscrete2.1.1AbsolutelyContinuousRandomVariablesDenition2.2Xisaabsolutelycontinuousrandomvariableifthereexistsaprobabilitydensityfunction(P.D.F.)fX(x)suchthatFX(x)=∫x1fX(u)duforabsolutelycontinuousvariableswhereFX(1)=∫+11f(u)du=1.2.1.2DiscreteRandomVariablesDenition2.3Xisadiscreterandomvariableifthereexistsaprobabilitymassfunction(P.M.F.)suchthatXtakescountable(ornite)setofpointsfXj;j=1;ginwhichP.M.F.is{fX(xj)=P(Xj=xj)fX(x)=0;otherwisewhereP(x2D)=∫Df(u)duandDRp.2.1.3SupportSetDenition2.4SupportsetS=fx2Rp;fX(x)0g1-32.1.4MarginalandConditionalDistributionForx=(x1x2)wherex12Rk,x22Rpk,Denition2.5Marginaldistribution:IfthereexistsPr(X1x1)=FX(x1;;xk;1k+1