TheElementofStatisticalLearning{Chapter2oxstar@SJTUJanuary3,2011Ex.2.1SupposeeachofK-classeshasanassociatedtargettk,whichisavectorofallzeros,exceptaoneinthekthposition.Showthatclassifyingtothelargestelementof^yamountstochoosingtheclosesttarget,minkktk ^yk,iftheelementsof^ysumtoone.ProofOurgoalisprovingargmaxk(^yk)=argminkktk ^yk.Infact,argminkktk ^yk=argminkktk ^yk2//ktk ^ykispositive=argminkKXi=1((tk)i ^yi)2//expanding=argminkKXi=1((tk)i2 2(tk)i^yi+^yi2)=argminkKXi=1((tk)i2 2(tk)i^yi)//item^yi2doesn'tcontainvariablek=argminkKXi=1(tk)i2 2KXi=1(tk)i^yi!=argmink1 2KXi=1(tk)i^yi!//denitionoftk=argmink 2KXi=1(tk)i^yi!=argmaxkKXi=1(tk)i^yi!=argmaxk(^yk)//wheni=k,tk=1,elsetk=0Ex.2.2ShowhowtocomputetheBayesdecisionboundaryforthesimulationexampleinFigure2.5.AnswerThebookshowshowtogeneratethetrainingdata.ThismodelcontainstwoclasseswhicharelabeledBLUEandORANGErespectively.Letk2fBLUE;ORANGEg,andweuseGktorepresentclassk.Eachclassarecodedtokviadummyvariables,whereBLUE=(1;0)T(1)ORANGE=(0;1)T(2)Firstly,10meansmfromabivariateGaussiandistributionN(k;I)aregenerated.SowehaveP(mkjGk)=f(mk;k;I)(3)1wheref(x;;2)=1p22e (x )2=(22)(4)Thenforeachclass100observationsaregeneratedasfollows:foreachobservation,an(mk)iatrandomwithprobability1/10ispicked,andthengeneratedaN((mk)i;I=5),thusleadingtoamixtureofGaussianclustersforeachclass.SowehaveP(X=xj(mk)i;Gk)=110f(x;(mk)i;I=5)(5)P(X=xjmk;Gk)=10Xi=1110f(x;(mk)i;I=5)(6)Thevaluesofmkisunknown,soweshouldmarginalizethemout.P(X=xjGk)=ZP(X=xjmk;Gk)P(mkjGk)dmk(7)Fromequation(2.23)intextbook,theBayes-optimaldecisionboundarycanbecalculatedbythevaluesofxthatsatisfyP(GBLUEjX=x)=P(GORANGEjX=x)(8)Becauseeveryclassescontainthesameamountoftrainingdata,soP(GBLUE)=P(GORANGE)(9)Bayes'formulatellusP(GkjX=x)=P(X=xjGk)P(Gk)PkP(X=xjGk)P(Gk)(10)Accordingtoequation(8)-(10),wehaveP(X=xjGBLUE)=P(X=xjGORANGE)(11)ThuswecancalculateBayes-optimaldecisionboundarybyequation(1)-(11).Ex.2.4Theedgeeectproblemdiscussedonpage23isnotpeculiartouniformsamplingfromboundeddomains.ConsiderinputsdrawnfromasphericalmultinormaldistributionXN(0;Ip).Thesquareddistancefromanysamplepointtotheoriginhasa2pdistributionwithmeanp.Con-siderapredictionpointx0drawnfromthisdistribution,andleta=x0=kx0kbeanassociatedunitvector.Letzi=aTxibetheprojectionofeachofthetrainingpointsonthisdirection.ShowthattheziaredistributedN(0;1)withexpectedsquareddistancefromtheorigin1,whilethetargetpointhasexpectedsquareddistancepfromtheorigin.Henceforp=10,randomlydrawntestpointisabout3.1standarddeviationsfromtheorigin,whileallthetrainingpointsareonaverageonestandarddeviationalongdirectiona.Somostpre-dictionpointsseethemselvesaslyingontheedgeofthetrainingset.Proof*xiN(0;Ip))(xi)jN(0;1),wherej2f1;2;:::;pg)aj(xi)jN(0;a2j))zi=Ppj=1aj(xi)jN(0;Ppj=1a2j)=N(0;1))SquareddistancefromtheoriginE((zi 0)2)=Var(zi)=1*Thesquareddistancedfromanysamplepointtotheoriginhasa2pdistributionwithmeanp)TargetpointhasexpectedsquareddistanceE(d)fromtheorigin,andE(d)=p*Thereisarandomlydrawntestpointxtand(xt 0)2=dthatVar(xt)=E((xt 0)2)=E(d)=p2)Forp=10;Std(xt)=pVar(xt)=pp=p103:1andStd(zi)=pVar(zi)=1Ex.2.6Consideraregressionproblemwithinputsxiandoutputsyi,andaparameterizedmodelf(x)tobetbyleastsquares.Showthatifthereareobservationswithtiedoridenticalvaluesofx,thenthetcanbeobtainedfromareducedweightedleastsquaresproblem.Proof\Iftherearemultipleobservationpairsxi;yi`;`=1;:::;Niateachvalueofxi,theriskislimited.(Page32)Weshouldestimatetheparametersinfbyminimizingtheresidualsum-of-squares,i.e.calculateargminPiPNi`=1(f(xi) yi`)2,whileargminXiNiX`=1(yi` f(xi))2=argminXiNiX`=1(y2i` 2yi`f(xi)+f(xi)2)=argminXiNiX`=1y2i` 2NiPNi`=1yi`Nif(xi)+NiX`=1f(xi)2!=argminXiNiX`=1y2i` 2Niyif(xi)+Nif(xi)2!=argminXiNiX`=1y2i` Niyi2!+Niyi2 2Niyif(xi)+Nif(xi)2!=argminXi(yi2 2yif(xi)+f(xi)2)=argminXi(yi f(xi))2=argminRSS()\Inthiscase,thesolutionspassthroughtheaveragevaluesoftheyiateachxi.Sowecanuseleastweightedsquarestoestimatetheparametersinf.Ex.AdditionalDeriveequation(2.47).EPEk(x0)=2+hf(x0) 1kkX`=1f(x(`))i2+2kAnswer\SupposethedataarisefromamodelY=f(X)+,f(X)andareindependent.SoE(f(X))=E(f(X))E()(12)ErrorsareonlyinandE()=0;Var()=2,sowehaveVar(Y)=Var()=E[( E())2]=E(2),andVar(Y)=E(2)=2(13)Nearestneighborstox0alsoarisefromthismodel,henceVar(f(x(`)))=2,w.r.t.k-nearest-neighborVar(^fk(x0))=Var(f(x(`)))=Var(f(x(`)))k=2k(14)E(^fk(x0))=E(f(x(`)))=f(x(`))(15)Thereforewecanassumethat^fk(x0)=f0(x0)+0,andE(0)=0(16)Var(0)=E(02)=2k(17)E[f0(x0)]=E(^fk(x0))=f(x(`))(18)3Nowwecanderiveequation(2.47)asfollowsEPEk(x0)=E[(Y ^fk(x0))2jX=x0]=E[(f(x0)+ ^fk(x0))2]=E[f(x0)2+2f(x0)+2 2f(x0)^fk(x0) 2^fk(x0)+^fk(x0)2]=E[f(x0)2+2 2f(x0)^fk(x0)+^fk(x0)2]//E()=0and(12)=E[2+(f(x0) ^fk(x0))2]=E[2+(f0(x0)+0 f(x0))2]=E[2+(f(x0) f0(x0))2+02]//expandingand(12)(16)=2+(f(x0) f(x(`)))2+2k//(13)(17)(18)=2+hf(x0) 1kkX`=1f(x(`))i2+2k4