10.9SupportVectorMachines22020/1/30ir.dlut.edu.cnoutline10.9SupportVectorMachines◆10.9.1OptimalSeparatingHyperplane◆10.9.2TheNonseparableCase:SoftMarginHyperplane◆10.9.3KernelFunctions◆10.9.4SupportVectorMachinesforRegression32020/1/30ir.dlut.edu.cn10.9.1OptimalSeparatingHyperplane◆Letusstartagainwithtwoclassesanduselabels-1/+1forthetwoclasses.Thesampleis.◆whichcanberewrittenas◆Maximizesthemargin.◆when,thedistanceoftothediscriminantcanbewrittenas,10.9SupportVectorMachines{,}ttxr{1,1}trtx42020/1/30ir.dlut.edu.cn◆whichwewouldliketobeatleastsomevaluep.◆maximizep•butthereareaninfinitenumberofsolutionsthatwecangetbyscalingwandforauniquesolution,wefixpllwll=1◆Thus,tomaximizethemargin,weminimize|lwll.◆UsingLagrangemultipliersgetanunconstrainedproblem:10.9.1OptimalSeparatingHyperplane0(),||||tTrwxwtw0()()tTtgxrwxw52020/1/30ir.dlut.edu.cn有不等式约束问题的极值求法,问题如下:拉格朗日公式:◆和都是拉格朗日算子。如果按这个公式求解,会出现问题,因为我们求解的是最小值,而这里的已经不是0了,我们可以将调整成很大的正值,来使最后的函数结果是负无穷。因此我们需要排除这种情况,我们定义下面的函数:拉格朗日对偶min()wfw..st()0,1,...,()0,1,...,iigwikhwil11(,,)()()()kliiiiiiLwfwgwhwii()igwi,:0()max(,,)iPawLw62020/1/30ir.dlut.edu.cn得到:进而得到:问题转化:◆考虑并定义另一个问题:◆将问题转化为先求拉格朗日关于w的最小值,将和看作是固定值。之后在求最大值。对偶问题:拉格朗日对偶,:0min()min()minmax(,,)ip(,)min(,,)DwLwii(,)D,:0,:0max(,)maxmin(,,)iiDwLw,:0,:0maxmin(,,)minmax(,,)iiwwLwLw72020/1/30ir.dlut.edu.cnKKT条件:实现问题转化:拉格朗日对偶,:0,:0maxmin(,,)minmax(,,)min()ii82020/1/30ir.dlut.edu.cn拉格朗日对偶问题:将问题具体化:问题转化:10.9.1OptimalSeparatingHyperplanemin()wfw..st()0,1,...,()0,1,...,ttgwtkhwtl21()||||2fww0()1()tTttgwrwxwt=1,...,N,,:0,:0maxmin(,,)minmax(,,)min()ii92020/1/30ir.dlut.edu.cn◆SubjecttotheconstraintsthatthegradientofLpwithrespecttowandWoare0andalsothat:thedual:◆whichwemaximizewithrespecttoonly,subjecttotheconstraintsSummary:10.9.1OptimalSeparatingHyperplane0it0,1,11(maxmin)2ttttttTtTttrtrwrxwwxwx00()TtttTtwxwrxxw0,ttttyxxw102020/1/30ir.dlut.edu.cn等式约束非线性规划的拉格朗日问题:QP问题112020/1/30ir.dlut.edu.cnAsimpleexample4x3x2x1xx1=(0,0),r1=+1x2=(1,0),r2=+1x3=(2,0),r3=-1x4=(0,2),r4=-1222d1234223341234t1L()()(444)200,t122020/1/30ir.dlut.edu.cnAsimpleexample00001010201200002Q1111q1111A0b1234012013/41/41120312002144231113w,02224()3220wgxxx12340000111101012101201100021011110132020/1/30ir.dlut.edu.cnCoordinateascent◆S.t.SMO1()2tststTstdtLrrxx142020/1/30ir.dlut.edu.cnDefineslackvariables,Newmodel:UsingLagrangemultipliersgetanunconstrainedproblem:thedual:10.9.2TheNonseparableCase:SoftMarginHyperplane0t0()1tTttrwxw21min||||2ttwC..st0()10tTtttrwxw152020/1/30ir.dlut.edu.cnThechangeofKKTconditions:10.9.2TheNonseparableCase:SoftMarginHyperplane00()1ttTtrwxw0()1ttTtCrwxw00()1ttTtCrwxw()0ttgw0()1()ttTtgwrwxwt=1,...,N,162020/1/30ir.dlut.edu.cn10.9.3KernelFunctions172020/1/30ir.dlut.edu.cnFeaturemapping◆Mappingfunction:◆ThechangeofSVM:DefinethecorrespondingKerneltobe:10.9.3KernelFunctions2()(,)xxx122212();xxxxxxxxx00,Tttttwwyxxw0(),()ttttyxxw182020/1/30ir.dlut.edu.cnAsimpleexample:polynomialsofdegreeq:radial-basisfunctions:sigmoidalfunctions:10.9.3KernelFunctions(,)()()Kxyxy1122122122122()2xxxxxxxxx00(,)TttttwwyKxxw192020/1/30ir.dlut.edu.cnThevalidofkernel:◆ifthereissomefeaturemappingsothatforallx,z?◆Kernelmatrix:◆TheoremMercer10.9.3KernelFunctionskerKavalidnelthecorrespondingkernelmatrixissymmetricpositivesemi-denite.202020/1/30ir.dlut.edu.cnUsealinearmodel:UsetheE-sensitivelossfunction:Analogoustothesoftmarginhyperplane,weintroduceslackvariablestoaccountfordeviationsoutoftheE-zoneandweget10.9.4SupportVectorMachinesforRegression212020/1/30ir.dlut.edu.cn