15-781FinalExam,Fall20021.Writeyournameandyourandrewemailaddressbelow.Name:AndrewID:2.Thereshouldbe17pagesinthisexam(excludingthiscoversheet).3.Ifyouneedmoreroomtoworkoutyouranswertoaquestion,usethebackofthepageandclearlymarkonthefrontofthepageifwearetolookatwhat'sontheback.4.Youshouldattempttoanswerallofthequestions.5.Youmayuseanyandallnotes,aswellastheclasstextbook.6.Allquestionsareworthanequalamount.Theyarenotallequallydicult.7.Youhave3hours.8.Goodluck!11ComputationalLearningTheory1.1PAClearningforDecisonListsAdecisionlistisalistofif-thenruleswhereeachconditionisaliteral(avariableoritsnegation).Itcanbethoughtofasadecisiontreewithjustonepath.Forexample,saythatIliketogoforawalkifit'swarmorifit'ssnowingandIhaveajacket,aslongasit'snotraining.Wecoulddescribethisasthefollowingdecisionlist:ifrainythennoelseifwarmthenyeselseifnot(have-jacket)thennoelseifsnowythenyeselseno.(a)DescribeanalgorithmtolearnDLsgivenadataset,forexampleabcclass100+011-111+000-110+Youralgorithmshouldhavethecharacteristicthatitshouldalwaysclassifyexam-plesthatithasalreadyseencorrectly(ie,itshouldbeconsistentwiththedata).Ifit'snotpossibletocontinuetoproduceadecisionlistthat'sconsistentwiththedata,youralgorithmshouldterminateandannouncethatithasfailed.(b)Findthesizeofthehypothesisspace,jHj,fordecisionslistsofkattributes.(c)Findanexpressionforthenumberofexamplesneededtolearnadecisionlistofkattributeswitherroratmost.10withprobability90%.(d)Whatifthelearneristryingtolearnadecisionlist,buttherepresentationthatitisusingisaconjunctionofkliterals?Findtheexpressionforthenumberofexamplesneededtolearnthedecisionlistwitherroratmost.10with90%probability.22K-meansandGaussianMixtureModels(a)Whatistheeectonthemeansfoundbyk-means(asopposedtothetruemeans)ofoverlappingclusters?(b)Runk-meansmanuallyforthefollowingdataset.Circlesaredatapointsandsquaresaretheinitialclustercenters.Drawtheclustercentersandthedecisionboundariesthatdeneeachcluster.Useasmanypicturesasyouneeduntilconvergence.Note:Executethealgorithmsuchthatifameanhasnopointsassignedtoit,itstayswhereitisforthatiteration.3(c)Nowdraw(approximately)whataGaussianmixturemodelofthreegaussianswiththesameinitialcentersasforthek-meansproblemwouldconvergeto.AssumethatthemodelputsnorestrictionsontheformofthecovariancematricesandthatEMupdatesboththemeansandcovariancematrices.(d)Istheclassicationgivenbythemixturemodelthesameastheclassicationgivenbyk-means?Whyorwhynot?43HMMsAndrewlivesasimplelife.Somedayshe'sAngryandsomedayshe'sHappy.Buthehideshisemotionalstate,andsoallyoucanobserveiswhetherhesmiles,frowns,laughs,oryells.Westartonday1intheHappystate,andthere'sonetransitionperday.0.80.80.20.2Happyp(smile)=0.5p(frown)=0.1p(laugh)=0.2p(yell)=0.2Angryp(smile)=0.1p(frown)=0.5p(laugh)=0.2p(yell)=0.2Denitions:qt=stateondayt.Ot=observationondayt.(a)WhatisP(q2=Happy)?(b)WhatisP(O2=frown)?(c)WhatisP(q2=HappyjO2=frown)?(d)WhatisP(O100=yell)?(e)AssumethatO1=frown,O2=frown,O3=frown,O4=frown,andO5=frown.Whatisthemostlikelysequenceofstates?54BayesianInference(a)Consideradatasetover3booleanattributes,X,Y,andZ.Ofthesesetsofinformation,whicharesucienttospecifythejointdistribution?Circleallthatapply.A.P(XjZ)P(XjZ)P(YjX^Z)P(YjX^Z)P(YjX^Z)P(YjX^Z)P(Z)B.P(XjZ)P(XjZ)P(YjX^Z)P(YjX^Z)P(YjX^Z)P(YjX^Z)P(Z)C.P(XjZ)P(XjZ)P(YjX^Z)P(YjX^Z)P(YjX^Z)P(YjX^Z)P(Z)D.P(XjZ)P(XjZ)P(YjX^Z)P(YjX^Z)P(YjX^Z)P(YjX^Z)P(Z)6Giventhisdatasetof16records:ABC001001001010011011011011100100100100100110110111(b)Writedowntheprobabilitiesneededtomakeajointdensitybayesclassier(c)Writedowntheprobabilitiesneededtomakeanaivebayesclassier.(d)WritetheclassicationthatthejointdensitybayesclassierwouldmakeforCgivenA=0,B=1.(e)WritetheclassicationthatthenaivebayesclassierwouldmakeforCgivenA=0,B=1.75SupportVectorMachinesThispictureshowsadatasetwithtworeal-valuedinputs(x1andx2)andonecategoricaloutputclass.Thepositivepointsareshownassoliddotsandthenegativepointsaresmallcircles.00123456123456X1x2(a)SupposeyouareusingalinearSVMwithnoprovisionfornoise(i.e.aLinearSVMthatistryingtomaximizeitsmarginwhileensuringalldatapointsareontheircorrectsidesofthemargin).Drawthreelinesontheabovediagram,showingtheclassicationboundaryandthetwosidesofthemargin.Circlethesupportvector(s).(b)UsingthefamiliarLSVMclassiernotationofclass=sign(w:x+b),calculatethevaluesofwandblearnedforpart(a)(c)Assumeyouareusinganoise-tolerantLSVMwhichtriestominimize12w:w+CRXk=1k(1)usingthenotationofyournotesandtheBurgespaper.Question:isitpossibletoinventadatasetandapositivevalueofCinwhich(a)thedatasetislinearlyseparablebut(b)theLSVMwouldneverthelessmisclassifyatleastonetrainingpoint?Ifitispossibletoinventsuchanexample,pleasesketchtheexampleandsuggestavalueforC.Ifitisnotpossible,explainwhynot.86Instance-basedlearningThispictureshowsadatasetwithonereal-valuedinputxandonereal-valuedoutputy.Thereareseventrainingpoints.yX00123456123456Supposeyouaretrainingusingkernelregressionusingsomeunspeciedkernelfunction.Theonlythingyouknowaboutthekernelfunctionisthatitisamonotonicallydecreasingfun