CS189Summer2019IntroductiontoMachineLearningMidtermPleasedonotopentheexambeforeyouareinstructedtodoso.Theexamisclosedbook,closednotesexceptyourtwo-pagecheatsheet.Electronicdevicesareforbiddenonyourperson,includingcellphones,iPods,headphones,andlaptops.Turnyourcellphoneoandleaveallelectronicsatthefrontoftheroom,orriskgettingazeroontheexam.Youhave3hours.Pleasewriteyourinitialsatthetoprightofeachpageafterthisone(e.g.,write“MK”ifyouareMarcKhoury).Finishthisbytheendofyour3hours.Markyouranswersontheexamitselfinthespaceprovided.Donotattachanyextrasheets.Thetotalnumberofpointsis150.Thereare26multiplechoicequestionsworth3pointseach,and5writtenquestionsworthatotalof72points.Formultipleanswerquestions,fillinthebubblesforALLcorrectchoices:theremaybemorethanonecorrectchoice,butthereisalwaysatleastonecorrectchoice.NOpartialcreditonmultipleanswerquestions:thesetofallcorrectanswersmustbechecked.FirstnameLastnameSIDFirstandlastnameofstudenttoyourleftFirstandlastnameofstudenttoyourright1Q1.[60pts]MultipleAnswerFillinthebubblesforALLcorrectchoices:theremaybemorethanonecorrectchoice,butthereisalwaysatleastonecorrectchoice.NOpartialcredit:thesetofallcorrectanswersmustbechecked.(a)[3pts]LetXBernoulli(11+exp)forsome2R.WhatistheMLEestimatorof?X10Doesnotexist.l(;1)=11+expwhichhasnomaximizerinR.l(;0)=1 11+expwhichhasnomaximizerinR.(b)[3pts]LetYN(X;In)forsomeunknown2RdandsomeknownX2Rndthathasfullcolumnrankanddn.WhatistheMLEestimatorof?(XX) 1XYX(XX) 1YY+Z8Z2Null(X)Doesnotexist.MaximizingthelikelihoodfunctionisequivalenttominimizingkY Xk22,whichwecandowithlinearleastsquares.Interestingly,thismeansthattheMLEestimatorofwheny=X+whereN(0;In)isthestandardleastsqauressolution.(c)[3pts]Letf(x)= Pni=1xilogxi.ForsomexsuchthatPni=1xi=1andxi0,theHessianoffis:positivedefinitenegativedefinitepositivesemidefinitenegativesemidefiniteindefinite(neitherpositivesemidefinitenornega-tivesemidefinite)invertiblenonexistentNoneoftheabove.rxf(x)= ~1 log(x)r2xf(x)=diag( 1x1;:::; 1xn)(d)[3pts]Whichofthefollowingstatementsaboutoptimizationalgorithmsarecorrect?Newton’smethodalwaysrequiresfeweriterationsthangradientdescent.Stochasticgradientdescentalwaysrequiresfeweriterationsthangradientdescent.Stochasticgradientdescent,evenwithsmallstepsize,sometimesincreasesthelossinsomeiterationforconvexproblems.Gradientdescent,regardlessthestepsize,decreasesthelossineveryiterationforconvexproblems.Argumentslikeonereasonableoptimizationalgorithmdominatesanotherforallproblemsareingeneralwrong.GradientdescentcanworkforsomelossthatNewtondoesnotevenconverge.2SGDduetostochasticitydoesnotnecessarilydecreasethelossineachiteration.Onecanconstructcasewheretherecouldbeadatapointwhichissortofcontradictingwithallothers,sooptimizingthisparticulardatapointmayincreasetheoverallloss.Gradientdescentwithextremelylargestepsizemaynotbeabletoreachthesmallneighborhoodoftheminimum,mayjustjumparoundoutsidetheneighborhood.(e)[3pts]Assumewerunthehard-marginSVMalgorithmon100d-dimensionalpointsfrom2dierentclasses.Thealgorithmoutputsasolution.Afterwhichtransformationtothetrainingdatawouldthealgorithmstilloutputasolution?CenteringthedatapointsTransformingeachdatapointfromxtoAxforsomematrixA2RdxdDividingallentriesofeachdatapointbysomenegativeconstantcAddinganadditionalfeatureIfallentriesanAare0,thedatapointsalllandontheorigin.Ifasetofdatapointsarelinearlyseparableinddimensions,itislinearlyseparableind+1dimensionsregardlessofwhatthed+1featureis.(f)[3pts]WhichofthefollowingholdstruewhenrunninganSVMalgorithm?Increasingordecreasingvalueonlyallowsthedecisionboundarytotranslate.Givenn-dimensionalpoints,theSVMalgorithmfindsahyperplanepassingthroughtheorigininthe(n+1)-dimensionalspacethatseparatesthepointsbytheirclass.Decisionboundaryrotatesifwechangethecon-strainttowTx+3.ThesetofweightsthatfulfilltheconstraintsoftheSVMalgorithmisconvex.(g)[3pts]Considerthesetfx2Rd:(x )(x )=1ggivensomevector2Rdandmatrix2Rdxd.Whichofthefollowingaretrue?Ifistheidentitymatrixscaledbysomeconstantc,thenthesetisisotropic.Increasingtheeigenvaluesofincreasestheradiioftheellipsoid.Increasingtheeigenvaluesofdecreasestheradiioftheellipsoid.Asingularproducesanellipsoidwithaninfiniteradius.(h)[3pts]Considerthelinearregressionproblemwithfullrankdesignmatrix,whichofthefollowingregularizationingeneralencouragemoresparsitythannon-regularizedobjective:L0regularization(numberofthenon-zerocoordi-nates)L1regularizationL2(Tikhonov)regularizationL3regularizationL4regularizationL1regularization(themaximumabsolutevalueacrossallcoordinates)Thinkaboutthecorrespondingequivalentconstrainedoptimizationproblem.(i)[3pts]Whichofthefollowingstatementsarecorrect?Inridgeregression,theregularizationparameterisusuallysetas0:1.SVMingeneraldoesnotenforcesparsityovertheparameterswand.Inbinarylinearclassification,thesupportvectorsofSVMmightcontainsamplesfromonlyoneclasseveniftrainingdatahasbothclasses.3Inbinarylinearclassification,suppose1fwx+0gisonemaximummarginlinearclassifier,thenthemarginonlydependsonwbutnot.Basicdefinitions.(j)[3