机器学习题库一、极大似然1、MLestimationofexponentialmodel(10)AGaussiandistributionisoftenusedtomodeldataontherealline,butissometimesinappropriatewhenthedataareoftenclosetozerobutconstrainedtobenonnegative.Insuchcasesonecanfitanexponentialdistribution,whoseprobabilitydensityfunctionisgivenby1xbpxebGivenNobservationsxidrawnfromsuchadistribution:(a)Writedownthelikelihoodasafunctionofthescaleparameterb.(b)Writedownthederivativeoftheloglikelihood.(c)GiveasimpleexpressionfortheMLestimateforb.2、换成Poisson分布:|,0,1,2,...!xepxyx1111log|loglog!loglog!NNiiiiNNiiiilpxxxxNx二、贝叶斯1、贝叶斯公式应用假设在考试的多项选择中,考生知道正确答案的概率为p,猜测答案的概率为1-p,并且假设考生知道正确答案答对题的概率为1,猜中正确答案的概率为1m,其中m为多选项的数目。那么已知考生答对题目,求他知道正确答案的概率。:,|11pknowncorrectppknowncorrectpknownppm2、ConjugatepriorsGivenalikelihood|pxforaclassmodelswithparametersθ,aconjugatepriorisadistribution|pwithhyperparametersγ,suchthattheposteriordistribution|,|||pXpXpp与先验的分布族相同(a)Supposethatthelikelihoodisgivenbytheexponentialdistributionwithrateparameterλ:|xpxeShowthatthegammadistribution1|,Gammae_isaconjugatepriorfortheexponential.Derivetheparameterupdategivenobservations1,,Nxxandthepredictiondistribution11|,,NNpxxx.(b)Showthatthebetadistributionisaconjugatepriorforthegeometricdistribution1|1kpxkwhichdescribesthenumberoftimeacoinistosseduntilthefirstheadsappears,whentheprobabilityofheadsoneachtossisθ.Derivetheparameterupdateruleandpredictiondistribution.(c)Suppose|pisaconjugatepriorforthelikelihood|px;showthatthemixtureprior11|,...,|MMmmmpwpisalsoconjugateforthesamelikelihood,assumingthemixtureweightswmsumto1.(d)Repeatpart(c)forthecasewherethepriorisasingledistributionandthelikelihoodisamixture,andthepriorisconjugateforeachmixturecomponentofthelikelihood.somepriorscanbeconjugateforseveraldifferentlikelihoods;forexample,thebetaisconjugatefortheBernoulliandthegeometricdistributionsandthegammaisconjugatefortheexponentialandforthegammawithfixedα(e)(Extracredit,20)Explorethecasewherethelikelihoodisamixturewithfixedcomponentsandunknownweights;i.e.,theweightsaretheparameterstobelearned.三、判断题(1)给定n个数据点,如果其中一半用于训练,另一半用于测试,则训练误差和测试误差之间的差别会随着n的增加而减小。(2)极大似然估计是无偏估计且在所有的无偏估计中方差最小,所以极大似然估计的风险最小。(3)回归函数A和B,如果A比B更简单,则A几乎一定会比B在测试集上表现更好。(4)全局线性回归需要利用全部样本点来预测新输入的对应输出值,而局部线性回归只需利用查询点附近的样本来预测输出值。所以全局线性回归比局部线性回归计算代价更高。(5)Boosting和Bagging都是组合多个分类器投票的方法,二者都是根据单个分类器的正确率决定其权重。(6)Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorofthecombinedclassifiervaryroughlyinconcert(F)Whilethetrainingerrorofthecombinedclassifiertypicallydecreasesasafunctionofboostingiterations,theerroroftheindividualdecisionstumpstypicallyincreasessincetheexampleweightsbecomeconcentratedatthemostdifficultexamples.(7)OneadvantageofBoostingisthatitdoesnotoverfit.(F)(8)Supportvectormachinesareresistanttooutliers,i.e.,verynoisyexamplesdrawnfromadifferentdistribution.(F)(9)在回归分析中,最佳子集选择可以做特征选择,当特征数目较多时计算量大;岭回归和Lasso模型计算量小,且Lasso也可以实现特征选择。(10)当训练数据较少时更容易发生过拟合。(11)梯度下降有时会陷于局部极小值,但EM算法不会。(12)在核回归中,最影响回归的过拟合性和欠拟合之间平衡的参数为核函数的宽度。(13)IntheAdaBoostalgorithm,theweightsonallthemisclassifiedpointswillgoupbythesamemultiplicativefactor.(T)(14)True/False:Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltycannotdecreasetheL2errorofthesolutionwˆonthetrainingdata.(F)(15)True/False:Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltyalwaysdecreasestheexpectedL2errorofthesolutionwˆonunseentestdata(F).(16)除了EM算法,梯度下降也可求混合高斯模型的参数。(T)(20)Anydecisionboundarythatwegetfromagenerativemodelwithclass-conditionalGaussiandistributionscouldinprinciplebereproducedwithanSVMandapolynomialkernel.True!Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecisionboundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanorequaltotwo.(21)AdaBoostwilleventuallyreachzerotrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeencombined.False!Ifthedataisnotseparablebyalinearcombinationoftheweakclassifiers,AdaBoostcan’tachievezerotrainingerror.(22)TheL2penaltyinaridgeregressionisequivalenttoaLaplacepriorontheweights.(F)(23)Thelog-likelihoodofthedatawillalwaysincreasethroughsuccessiveiterationsoftheexpectationmaximationalgorithm.(F)(24)Intrainingalogisticregressionmodelbymaximizingthelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions.(F)四、回归1、考虑回归一个正则化回归问题。在下图中给出了惩罚函数为二次正则函数,当正则化参数C取不同值时,在训练集和测试集上的log似然(meanlog-probability)。(10分)(1)说法“随着C的增加,图2中训练集上的log似然永远不会增加”是否正确,并说明理由。(2)解释当C取较大值时,图2中测试集上的log似然下降的原因。2、考虑线性回归模型:201~,yNwwx,训练数据如下图所示。(10分)(1)用极大似然估计参数,并在图(a)中画出模型。(3分)(2)用正则化的极大似然估计参数,即在log似然目标函数中加入正则惩罚函数212Cw,并在图(b)中画出当参数C取很大值时的模型。(3分)(3)在正则化后,高斯分布的方差2是变大了、变小了还是不变?(4分)图(a)图(b)3.考虑二维输入空间点12,Txxx上的回归问题,其中1,1,1,2jxj在单位正方形内。训练样本和测试样本在单位正方形中均匀分布,输出模型为352121212~10753,1yNxxxxxx,我们用1-10阶多项式特征,采用线性回归模型来学习x与y之间的关系(高阶特征模型包含所有低阶特征),损失函数取平方误差损失。(1)现在20n个样本上,训练1阶、2阶、8阶和10阶特征的模型,然后在一个大规模的独立的测试集上测试,则在下3列中选择合适的模型(可能有多个选项),并解释第