PatternRecognition&MachineLearning模式识别与机器学习温雯wwen@gdut.edu.cn温雯广东工业大学计算机学院2IntroductiontoLogisticRegressionSimpleandmultiplelinearregressionSimplelogisticregressionThelogisticfunctionEstimationofparameters温雯广东工业大学计算机学院3SimplelinearregressionAgeSBPAgeSBPAgeSBP22131411395212823128411715410524116461375614527106471115714128114481155815329123491335915730117491286315532122501836717633995113071172351215113377178401475114481217Table1Ageandsystolicbloodpressure(SBP)among33adultwomen温雯广东工业大学计算机学院4801001201401601802002202030405060708090SBP(mmHg)Age(years)adaptedfromColtonT.StatisticsinMedicine.Boston:LittleBrown,1974Age1.22281.54SBPSimplelinearregression温雯广东工业大学计算机学院5SimplelinearregressionRelationbetween2continuousvariables(SBPandage)Regressioncoefficientb1MeasuresassociationbetweenyandxAmountbywhichychangesonaveragewhenxchangesbyoneunitLeastsquaresmethodyxxβαy11Slope温雯广东工业大学计算机学院6MultiplelinearregressionRelationbetweenacontinuousvariableandasetoficontinuousvariablesPartialregressioncoefficientsbiAmountbywhichychangesonaveragewhenxichangesbyoneunitandalltheotherxisremainconstantMeasuresassociationbetweenxiandyadjustedforallotherxiExampleSBPversusage,weight,height,etcxβ...xβxβαyii2211温雯广东工业大学计算机学院7MultiplelinearregressionDependentIndependentvariablesPredictedPredictorvariablesResponsevariableExplanatoryvariablesOutcomevariableCovariablesxβ...xβxβαyii2211温雯广东工业大学计算机学院8LogisticregressionModelstherelationshipbetweenasetofvariablesxidichotomous(eat:yes/no)categorical(socialclass,...)continuous(age,...)anddichotomousvariableYDichotomous(binary)outcomemostcommonsituationinbiologyandepidemiology温雯广东工业大学计算机学院9Logisticregression(1)AgeCDAgeCDAgeCD220400540230411551240460581270470601280480600300491621300490651320501671330510711351511771380520811Table2Ageandsignsofcoronaryheartdisease(CD)温雯广东工业大学计算机学院10Howcanweanalysethesedata?Comparisonofthemeanageofdiseasedandnon-diseasedwomenNon-diseased:38.6yearsDiseased:58.7years(p0.0001)Linearregression?温雯广东工业大学计算机学院11Dot-plot:DatafromTable2AGE(years)SignsofcoronarydiseaseNoYes020406080100温雯广东工业大学计算机学院12Logisticregression(2)Table3Prevalence(%)ofsignsofCDaccordingtoagegroupDiseasedAgegroup#ingroup#%20-2950030-39611740-49722950-59745760-69548070-792210080-8911100温雯广东工业大学计算机学院13Dot-plot:DatafromTable302040608010002468Diseased%Age(years)温雯广东工业大学计算机学院14Thelogisticfunction(1)0.00.20.40.60.81.0Probabilityofdiseasexβxαβxαe1e)xP(y温雯广东工业大学计算机学院15ln()()PyxPyxx1Thelogisticfunction(2)logitofP(y|x){Pyxeexx()1温雯广东工业大学计算机学院16Thelogisticfunction(3)AdvantagesofthelogitSimpletransformationofP(y|x)LinearrelationshipwithxCanbecontinuous(Logitbetween-to+)Knownbinomialdistribution(Pbetween0and1)DirectlyrelatedtothenotionofoddsofdiseaseβxαP-1PlneP-1Pβxα温雯广东工业大学计算机学院17FittingequationtothedataLinearregression:LeastsquaresLogisticregression:MaximumlikelihoodLikelihoodfunctionEstimatesparametersaandbwithpropertythatlikelihood(probability)ofobserveddataishigherthanforanyothervaluesPracticallyeasiertoworkwithlog-likelihoodniiiiixyxylL1)(1ln)1()(ln)(ln)(温雯广东工业大学计算机学院18MaximumlikelihoodIterativecomputingChoiceofanarbitraryvalueforthecoefficients(usually0)Computingoflog-likelihoodVariationofcoefficients’valuesReiterationuntilmaximisation(plateau)ResultsMaximumLikelihoodEstimates(MLE)forandEstimatesofP(y)foragivenvalueofx温雯广东工业大学计算机学院19ReadingsLogistic回归模型