I.J.ModernEducationandComputerScience,2013,10,9-18PublishedOnlineNovember2013inMECS()DOI:10.5815/ijmecs.2013.10.02Copyright©2013MECSI.J.ModernEducationandComputerScience,2013,10,9-18AnEfficientMachineLearningBasedClassificationSchemeforDetectingDistributedCommand&ControlTrafficofP2PBotnetsPijushBarthakur1,ManojDahal2,MrinalKantiGhose11DepartmentofComputerScienceandEngineering,SikkimManipalInstituteofTechnology,Sikkim,India2NovellIDC,BagmaneTechPark,CVRamannagar,Bangalore,Indiapijush.barthakur@gmail.com,mdahal@novell.com,mkghose2000@yahoo.comAbstract—BiggestinternetsecuritythreatistheriseofBotnetshavingmodularandflexiblestructures.Thecombinedpowerofthousandsofremotelycontrolledcomputersincreasesthespeedandseverityofattacks.Inthispaper,weprovideacomparativeanalysisofmachine-learningbasedclassificationofbotnetcommand&control(C&C)trafficforproactivedetectionofPeer-to-Peer(P2P)botnets.WecombinesomeofselectedbotnetC&Ctrafficflowfeatureswiththatofcarefullyselectedbotnetbehavioralcharacteristicfeaturesforbetterclassificationusingmachinelearningalgorithms.Oursimulationresultsshowthatourmethodisveryeffectivehavingverygoodtestaccuracyandverylittletrainingtime.WecomparetheperformancesofDecisionTree(C4.5),BayesianNetworkandLinearSupportVectorMachinesusingperformancemetricslikeaccuracy,sensitivity,positivepredictivevalue(PPV)andF-Measure.WealsoprovideacomparativeanalysisofourpredictivemodelsusingAUC(areaunderROCcurve).Finally,weproposearuleinductionalgorithmfromoriginalC4.5algorithmofQuinlan.Ourproposedalgorithmproducesbetteraccuracythantheoriginaldecisiontreeclassifier.IndexTerms—Botnet,Peer-to-Peer(P2P),WEKA,Linearsupportvectormachine,J48,Bayesnet,ROCcurve,AUCI.INTRODUCTIONABotisaprogramthatoncegetsinstalledintoacomputerleadstoestablishmentofsomesortofcommand-and-control(C&C)channelwithitsC&CserverandkeepslisteningtotheC&Cchannelforfuturecommands.C&Cserversarethecontrolcentersfromwherecommandsfromcommand-setofthebotnetareissuedtoperformcoordinatedtasksandattacks.Theattackersininternethaveincludedwormlikeabilitiestobotbyincludingpropagationcomponenttospreadthroughdifferentinfectionvectors.Thisleadstocreationofbotnetsthatgivespowertothebot-herdertoaccesstheinfectedcomputersandtocarry-outdifferenttypesofattackslikestealingofsensitiveinformation,DistributedDenialofService(DDoS)attacks,spamcampaign,clickfraudsetc.BotnetshaveevolvedthroughtimetoimitatedifferenttopographicalstructureslikeIRC,HTTP,P2Petc.IRCandHTTPbasedbotnetsarecentralizedinnatureandruntheriskofsinglepointoffailurei.e.ifC&Cheadisdetectedandtakendownthebotnetcripples.However,theyhavetheirrelativemeritsaswell.IRCbasedbotnetsbeingsimpleinsetupandmaintenance,continuetoevolve.HTTPbasedbotnetsaredifficulttodetectasitsC&Ctrafficcanhidebehindnormalwebtraffictoevadedetectionmechanisms.P2PbasedbotnetsarethenewestofitskindthatmimicPeer-to-Peer(P2P)technologicallyandthusfollowadistributedC&Cstructure.AbsenceoffixedC&CserverinP2Pbotnetsmakesitverydifficulttodetectanddismantlesuchstructures.SomeoftheknownP2PbotsareStorm[1],Nugache[2]andWaledac[3].InthispaperweproposeapayloadindependentapproachthatcandetectbotnetevenincaseofencryptedC&Ctraffic.WeusemachinelearningalgorithmsforclassificationandpredictionoflargevolumeofC&CtrafficflowsgeneratedbyP2Pbotnets.Anetworkflowprovidesessentialinformationinanetworklikewhoistalkingtowhomi.e.conversationbetweenanytwohostsinthenetworkinanyspecificmomentoftime.ThealgorithmsthatareusedtoclassifynetworkflowsduringnormalC&CoperationsofbotnetsareJ48,BayesNetandLinearSupportVectorMachines.OurapproachcanbeusedforproactivedetectionofP2Pbotnetsbycorrelatingsimilarbotflowsofthesamebotnet.Ourinvestigationisbasedonfollowingassumptions:(i)P2Pbotnetestablishesnumeroussmallsessions,(ii)aP2Pbotneedstokeepcommunicatingforhavingthemaliciousnetworkalive,(iii)toavoiddetection,aP2Pbotpassesminimalamountofinformationineachsession,(iv)ineverysessionbetweenapairofP2Pbots,dataflowhappensinboththedirections,(v)EverybotnethasitsownspecificsetofcommandsandC&Cinteractionsofbotsarepreprogrammedtothesetofcommandstheyreceive,and(vi)P2Pbotnetsusuallyrunseachsessionforaverysmallduration.Someofourassumptionsaresimilartotheoneusedinpaper[4][5].BasedontheabovementionedcharacteristicfeaturesofP2Pbotnets,weproposearulegenerationalgorithmforbotnettrafficclassification.Weusedanindirectmethodofderivingtheinitialrulesetfromdecisiontree10AnEfficientMachineLearningBasedClassificationSchemeforDetectingDistributedCommand&ControlTrafficofP2PBotnetsCopyright©2013MECSI.J.ModernEducationandComputerScience,2013,10,9-18generatedusingC4.5algorithm.Then,wefollowedastep-by-stepapproachforoptimizationoftheruleset.OurfinalrulesethasauniformstructureprovidingsignificantinsightintosimilaritieswithinP2PbotnetC&Ctraffic.Restofthepaperisorganizedasfollows:SectionIIprovidesabriefoverviewofrelatedworks.InSectionIII,wediscusstheinherentdifferencesbetweenbotnetC&Cflowsandnormalnetworkflows.InSectionIVweprovidethearchitecturaloverviewofourclassificationprocessandtheapproachwehave