10-701/15-781MachineLearning-MidtermExam,Fall2010AartiSinghCarnegieMellonUniversity1.Personalinfo:Name:Andrewaccount:E-mailaddress:2.Thereshouldbe15numberedpagesinthisexam(includingthiscoversheet).3.Youcanuseanymaterialyoubrought:anybook,classnotes,yourprintoutsofclassmaterialsthatareontheclasswebsite,includingannotatedslidesandrelevantreadings,andAndrewMoore'stutorials.Youcannotusematerialsbroughtbyotherstudents.Calculatorsarenotnecessary.Laptops,PDAs,phonesandInternetaccessarenotallowed.4.Ifyouneedmoreroomtoworkoutyouranswertoaquestion,usethebackofthepageandclearlymarkonthefrontofthepageifwearetolookatwhat'sontheback.5.Workeciently.Somequestionsareeasier,somemoredicult.Besuretogiveyourselftimetoansweralloftheeasyones,andavoidgettingboggeddowninthemoredicultonesbeforeyouhaveansweredtheeasierones.6.Youhave90minutes.7.Goodluck!QuestionTopicMax.scoreScore1Shortquestions202BayesOptimalClassication153LogisticRegression184Regression165SVM166Boosting15Total10011ShortQuestions[20pts]ArethefollowingstatementsTrue/False?Explainyourreasoninginonly1sentence.1.Densityestimation(usingsay,thekerneldensityestimator)canbeusedtoperformclassication.True:EstimatethejointdensityP(Y;X),thenuseittocalculateP(YjX).2.ThecorrespondencebetweenlogisticregressionandGaussianNaveBayes(withiden-tityclasscovariances)meansthatthereisaone-to-onecorrespondencebetweentheparametersofthetwoclassiers.False:EachLRmodelparametercorrespondstoawholesetofpossibleGNBclassifierparameters,thereisnoone-to-onecorrespondencebecauselogisticregressionisdiscrimi-nativeandthereforedoesn’tmodelP(X),whileGNBdoesmodelP(X).3.Thetrainingerrorof1-NNclassieris0.True:Eachpointisitsownneighbor,so1-NNclassifierachievesperfectclassificationontrainingdata.4.Asthenumberofdatapointsgrowstoinnity,theMAPestimateapproachestheMLEestimateforallpossiblepriors.Inotherwords,givenenoughdata,thechoiceofpriorisirrelevant.False:Asimplecounterexampleisthepriorwhichassignsprobability1toasinglechoiceofparameter.5.Crossvalidationcanbeusedtoselectthenumberofiterationsinboosting;thispro-ceduremayhelpreduceovertting.True:Thenumberofiterationsinboostingcontrolsthecomplexityofthemodel,therefore,amodelselectionprocedurelikecrossvalidationcanbeusedtoselecttheappropriatemodelcomplexityandreducethepossibilityofoverfitting.6.ThekerneldensityestimatorisequivalenttoperformingkernelregressionwiththevalueYi=1nateachpointXiintheoriginaldataset.False:Kernelregressionpredictsthevalueofapointastheweightedaverageofthevaluesatnearbypoints,thereforeifallofthepointshavethesamevalue,thenkernelregressionwillpredictaconstant(inthiscase,1n)forallvalues.7.Welearnaclassierfbyboostingweaklearnersh.Thefunctionalformoff'sdecisionboundaryisthesameash's,butwithdierentparameters.(e.g.,ifhwasalinearclassier,thenfisalsoalinearclassier).False:Forexample,thefunctionalformofadecisionstumpisasingleaxis-alignedsplitoftheinputspace,butthefunctionalformoftheboostedclassifierislinearcombinationsofdecisionstumpswhichcanformamorecomplex(piecewiselinear)decisionboundary.28.Thedepthofalearneddecisiontreecanbelargerthanthenumberoftrainingexamplesusedtocreatethetree.False:Eachsplitofthetreemustcorrespondtoatleastonetrainingexample,therefore,iftherearentrainingexamples,apathinthetreecanhavelengthatmostn.Note:Thereisapathologicalsituationinwhichthedepthofalearneddecisiontreecanbelargerthannumberoftrainingexamplesn-ifthenumberoffeaturesislargerthannandthereexisttrainingexampleswhichhavesamefeaturevaluesbutdifferentlabels.Pointshavebeengivenifyouansweredtrueandprovidedthisexplanation.Forthefollowingproblems,circlethecorrectanswers:1.Considerthefollowingdataset:Circlealloftheclassiersthatwillachievezerotrainingerroronthisdataset.(Youmaycirclemorethanone.)(a)Logisticregression(b)SVM(quadratickernel)(c)Depth-2ID3decisiontrees(d)3-NNclassierSolution:SVM(quadkernel)andDepth-2ID3decisiontrees32.Forthefollowingdataset,circletheclassierwhichhaslargerLeave-One-OutCross-validationerror.a)1-NNb)3-NNSolution:1-NNsince1-NNCVerr:5/10,3-NNCVerr:1/1042BayesOptimalClassication[15pts]Inclassication,thelossfunctionweusuallywanttominimizeisthe0/1loss:`(f(x);y)=1ff(x)6=ygwheref(x);y2f0;1g(i.e.,binaryclassication).Inthisproblemwewillconsidertheeectofusinganasymmetriclossfunction:`;(f(x);y)=1ff(x)=1;y=0g+1ff(x)=0;y=1gUnderthislossfunction,thetwotypesoferrorsreceivedierentweights,determinedby;0.1.[4pts]DeterminetheBayesoptimalclassier,i.e.theclassierthatachievesminimumriskassumingP(x;y)isknown,fortheloss`;where;0.Solution:WecanwriteargminfE`;(f(x);y)=argminfEX;Y[1ff(X)=1;Y=0g+1ff(X)=0;Y=1g]=argminfEX[EYjX[1ff(X)=1;Y=0g+1ff(X)=0;Y=1g]]=argminfEX[Zy1ff(X)=1;y=0g+1ff(X)=0;y=1gdP(yjx)]=argminfZx[1ff(x)=1gP(y=0jx)+1ff(x)=0gP(y=1jx)]dP(x)Wemayminimizetheintegrandateachxbytaking:f(x)=(1P(y=1jx)P(y=0jx)0P(y=0jx)P(y=1jx):2.[3pts]Supposethattheclassy=0isextremelyuncommon(i.e.,P(y=0)issmall).Thismeansthattheclassierf(x)=1forallxwillhavegoodrisk.Wemaytrytoputthetwoclassesonevenfootingbyconsideringtherisk:R=P(f(x)=1jy=0)+P(f(x