TheElementofStatisticalLearning{Chapter4oxstar@SJTUJanuary6,2011Ex.4.1ShowhowtosolvethegeneralizedeigenvalueproblemmaxaTBasubjecttoaTWa=1bytransformingtoastandardeigenvalueproblem.AnswerWisthecommoncovariancematrix,andit'spositive-semidenite,sowecandeneb=W12a;a=W 12b;aT=bTW 12Hencethegeneralizedeigenvalueproblemmaxa(aTBa)=maxb(bTW 12BW 12b)subjecttoaTWa=bTW 12WW 12b=1Sotheproblemistransformedtoastandardeigenvalueproblem.Ex.4.2Supposewehavefeaturesx2Rp,atwo-classresponse,withclasssizesN1;N2,andthetargetcodedas N=N1;N=N2.1.ShowthattheLDAruleclassiestoclass2ifxT^ 1(^2 ^1)12^T2^ 1^2 12^T1^ 1^1+logN1N logN2N;andclass1otherwise.2.ConsiderminimizationoftheleastsquarescriterionNXi=1(yi 0 Txi)2:Showthatthesolution^satises(N 2)^+N1N2N^B=N(^2 ^1)(aftersimplication),where^B=(^2 ^1)(^2 ^1)T.3.Henceshowthat^Bisinthedirection(^2 ^1)andthus^/^ 1(^2 ^1)ThereforetheleastsquaresregressioncoecientisidenticaltotheLDAcoecient,uptoascalarmultiple.4.Showthatthisresultholdsforany(distinct)codingofthetwoclasses.15.Findthesolution^0,andhencethepredictedvalues^f=^0+^Tx.Considerthefollowingrule:classifytoclass2if^yi0andclass1otherwise.ShowthisisnotthesameastheLDAruleunlesstheclasseshaveequalnumbersofobservations.(Fisher,1936;Ripley,1996)Proof1.Considerthelog-ratioofeachclassdensity(equation4.9intextbook)logPr(G=2jX=x)Pr(G=1jX=x)=log21 12(2+1)T 1(2 1)+xT 1(2+1)Whenit0,theLDArulewillclassifyxtoclass2,meanwhile,weneedtoestimatetheparametersoftheGaussiandistributionsusingourtrainingdataxT^ 1(^2 ^1)12(^2+^1)T^ 1(^2 ^1)+log(1) log(2)=12^T2^ 1^2 12^T1^ 1^1+logN1N logN2N2.Let0=(;0)TandcomputethepartialdeviationoftheRSS(0),thenwehave@RSS(0)@0= 2NXi=1(yi 0 Txi)=0(1)@RSS(0)@= 2NXi=1xi(yi 0 Txi)=0(2)Wecanalsoderivethat0=1NNXi=1(yi Txi)//from(1)(3)NXi=1xi[T(xi x)]=NXi=1xiyi 1NNXj=1yj//from(2)(3)(4)=2Xk=1Xgi=kxiyk N1y1+N2y2N(5)=2Xk=1Nk^kNyk (N1y1+N2y2)N//Xgi=kxi=Nk^k(6)=N1N2N(y2 y1)(^2 ^1)(7)=N(^2 ^1)//y1=N=N1;y2=N=N2(8)2Wealsohave(N 2)^=2Xk=1Xgi=k(xi ^k)(xi ^k)T(9)=2Xk=1Xgi=k(xixTi 2xi^Tk+^k^Tk)//xTi^k=xi^Tk(10)=2Xk=1Xgi=kxixTi Nk^k^Tk//Xgi=kxi=Nk^k(11)=NXi=1xixTi (N1^1^T1+N2^2^T2)(12)NXi=1xixT=2Xk=1Xgi=kxkP2k=1Pgi=kxkNT(13)=1N(N1^1+N2^2)(N1^1+N2^2)T(14)MeanwhileNXi=1xi[T(xi x)]=NXi=1xi[(xi x)T]=NXi=1xixTi NXi=1xixT(15)=h(N 2)^+(N1^1^T1+N2^2^T2) 1N(N1^1+N2^2)(N1^1+N2^2)Ti//from(12)(14)(16)=(N 2)^+N1N2N(^2^T2 ^1^T2 ^2^T1+^1^T1)(17)=(N 2)^+N1N2N^B=N(^2 ^1)//from(8)(18)3.^B=(^2 ^1)(^2 ^1)T(^2 ^1)Tisascalar,therefore^Bisinthedirection(^2 ^1),and^=1N 2^ 1N(^2 ^1) N1N2N^B//from(18)(19)=1N 2N N1N2N(^2 ^1)T^ 1(^2 ^1)(20)/^ 1(^2 ^1)(21)4.ByreplacingNwithN1N2N(y2 y1)(from(7)and(8))andfrom(20),westillhave^=N1N2N(N 2)(y2 y1) (^2 ^1)T^ 1(^2 ^1)/^ 1(^2 ^1)5.Theboundaryconditionisyi=0,sofrom(3)wehave^0=1NNXi=1^Txi=^T^=^T^3Whentheclasseshaveequalnumbersofobservations,i.e.N1=N2=N=2^=12(^1+^2)(22)Thenwehavepredictedvalues^f=(x ^)T^/(x ^)T^ 1(^2 ^1)(23)/xT^ 1(^2 ^1) 12(^1+^2)T^ 1(^2 ^1)//from(22)(24)WhiletheLDAruleindicatethatthelog-ratioofeachclassdensity=zeroistheboundarycondition(N1=N2so1=2)logPr(G=2jX=x)Pr(G=1jX=x)= 12(2+1)T 1(2 1)+xT 1(2+1)(25)Compare(24)with(25),wecanndthattheyarethesamerule.ButwhenN16=N2,theserulesareobviouslydierent.Ex.4.3SupposewetransformtheoriginalpredictorsXto^Yvialinearregression.Indetail,let^Y=X(XTX) 1XTY=X^B,whereYistheindicatorresponsematrix.Similarlyforanyinputx2Rp,wegetatransformedvector^y=^BTx2RK.ShowthatLDAusing^YisidenticaltoLDAintheoriginalspace.ProofTransformedvector^y=^BTx,sowehave^^yk=Pgi=k^yiNk=Pgi=k^BTxiNi=^BT^xk(26)^^y`=Pgi=`^yiN`=Pgi=`^BTxiNi=^BT^x`(27)^^y=PNk=1Pgi=k(^yi ^^yk)(^yi ^^yk)TN K(28)=PNk=1Pgi=k^BT(xi ^xk)(xi ^xk)T^BN K(29)=^BT^x^B(30)Substitute(26)(27)(30)forgiandinequation(4.9)logPr(G=kj^Y=^y)Pr(G=`j^Y=^y)=logk` 12(^xk+^x`)T^B(^BT^x^B) 1^BT(^xk ^x`)+xT^B(^BT^x^B) 1^BT(^xk ^x`)Infact,if^B(^BT^x^B) 1^BT(^xk ^x`)=^ 1x(^xk ^x`),LDAusing^YisidenticaltoLDAintheoriginalspace.Sowewillproveitbellow.Yisaindicatorresponsematrix,thereforeNk^xk=Xgi=kxi=XTyk(31)^x=1N KNXi=1xixTi KXk=1Nk^xk(^xk)T//from(12)(32)=1N KXTX XTKXk=1(ykyTk)X(33)=1N K(XTX XTYYTX)(34)4WeneedaprojectionmatrixH=^x^B(^BT^x^B) 1^BTwhichsatisesHH=H.WehireHXTY=^x^B(^BT^x^B) 1^BTXTYforassistance,where(bythedenitionof^B)^BT=((XTX) 1XTY)T=YTX(XTX) 1^x^B=1N K(XTX XTYYTX)(XTX) 1XTY=1N KXTY(I YTX(XTX) 1XTY)(^BT^x^B) 1=(YTX(XTX) 11N KXTY(I YTX(XTX) 1XTY)) 1=(N K)(YTX(XTX) 1XTY(I YTX(XTX) 1XTY)) 1LetP=^BTXTY=YTX(XTX) 1XTY,wehave^x^B=1N KXTY(I P)(^BT^x^B) 1=(N K)(P(I P)) 1P(I P)isinvertible,soPandI Pareinvertible,hence(^BT^x^B) 1=(N K)(I P) 1P 1HXTY=1N KXTY(I P)(N K)(I P) 1P 1YTX(XTX) 1XTY=XTYHerewecanproveHXTY=XTY)HXTyk=XTyk)HNk^xk=Nk^xk//from(31))^x^B(^BT^x^B) 1^BT^xk=^xk//denitionofH)^B(^BT^x^B) 1^BT^xk=^ 1x^xk)^B(^BT^x^B) 1^BT(^xk ^x`)=^ 1x(^xk ^x`)5