ThispaperisincludedintheProceedingsofthe23rdUSENIXSecuritySymposium.August20–22,2014•SanDiego,CAISBN978-1-931971-15-7OpenaccesstotheProceedingsofthe23rdUSENIXSecuritySymposiumissponsoredbyUSENIXManvs.Machine:PracticalAdversarialDetectionofMaliciousCrowdsourcingWorkersGangWang,UniversityofCalifornia,SantaBarbara;TianyiWang,UniversityofCalifornia,SantaBarbaraandTsinghuaUniversity;HaitaoZhengandBenY.Zhao,UniversityofCalifornia,SantaBarbara:PracticalAdversarialDetectionofMaliciousCrowdsourcingWorkersGangWang†,TianyiWang†‡,HaitaoZheng†andBenY.Zhao††ComputerScience,UCSantaBarbara‡ElectronicEngineering,TsinghuaUniversity{gangw,tianyi,htzheng,ravenben}@cs.ucsb.eduAbstractRecentworkinsecurityandsystemshasembracedtheuseofmachinelearning(ML)techniquesforidentify-ingmisbehavior,e.g.emailspamandfake(Sybil)usersinsocialnetworks.However,MLmodelsaretypicallyderivedfromfixeddatasets,andmustbeperiodicallyretrained.Inadversarialenvironments,attackerscanadaptbymodifyingtheirbehaviororevensabotagingMLmodelsbypollutingtrainingdata.Inthispaper1,weperformanempiricalstudyofad-versarialattacksagainstmachinelearningmodelsinthecontextofdetectingmaliciouscrowdsourcingsystems,wheresitesconnectpayinguserswithworkerswillingtocarryoutmaliciouscampaigns.Byusinghumanwork-ers,thesesystemscaneasilycircumventdeployedse-curitymechanisms,e.g.CAPTCHAs.WecollectadatasetofmaliciousworkersactivelyperformingtasksonWeibo,China’sTwitter,anduseittodevelopML-baseddetectors.WeshowthattraditionalMLtechniquesareaccurate(95%–99%)indetectionbutcanbehighlyvulnerabletoadversarialattacks,includingsimpleeva-sionattacks(workersmodifytheirbehavior)andpower-fulpoisoningattacks(whereadministratorstamperwiththetrainingset).WequantifytherobustnessofMLclas-sifiersbyevaluatingtheminarangeofpracticaladver-sarialmodelsusinggroundtruthdata.Ouranalysispro-videsadetailedlookatpracticaladversarialattacksonMLmodels,andhelpsdefendersmakeinformeddeci-sionsinthedesignandconfigurationofMLdetectors.1IntroductionToday’scomputingnetworksandservicesareextremelycomplexsystemswithunpredictableinteractionsbe-tweennumerousmovingparts.Intheabsenceofac-curatedeterministicmodels,applyingMachineLearning1OurworkreceivedapprovalfromourlocalIRBreviewboard.(ML)techniquessuchasdecisiontreesandsupportvec-tormachines(SVMs)producespracticalsolutionstoavarietyofproblems.Inthesecuritycontext,MLtech-niquescanextractstatisticalmodelsfromlargenoisydatasets,whichhaveprovenaccurateindetectingmis-behaviorandattacks,e.g.emailspam[35,36],networkintrusionattacks[22,54],andInternetworms[29].Morerecently,researchershaveusedthemtomodelanddetectmalicioususersinonlineservices,e.g.Sybilsinsocialnetworks[42,52],scammersine-commercesites[53]andfraudulentreviewersononlinereviewsites[31].Despiteawiderangeofsuccessfulapplications,ma-chinelearningsystemshaveaweakness:theyarevulner-abletoadversarialcountermeasuresbyattackersawareoftheiruse.First,througheitherreadingpublicationsorself-experimentation,attackersmaybecomeawareofdetailsoftheMLdetector,e.g.choiceofclassifierandparametersused,andmodifytheirbehaviortoevadede-tection.Second,morepowerfulattackerscanactivelytamperwiththeMLmodelsbypollutingthetrainingset,reducingoreliminatingitsefficacy.Adversarialmachinelearninghasbeenstudiedbypriorworkfromatheoreti-calperspective[6,12,27],usingsimplisticall-or-nothingassumptionsaboutadversaries’knowledgeabouttheMLsysteminuse.Inreality,however,attackersarelikelytogainincompleteinformationorhavepartialcontroloverthesystem.AnaccurateassessmentoftherobustnessofMLtechniquesrequiresevaluationunderrealisticthreatmodels.Inthiswork,westudytherobustnessofmachinelearningmodelsagainstpracticaladversarialattacks,inthecontextofdetectingmaliciouscrowdsourcingactiv-ity.Maliciouscrowdsourcing,alsocalledcrowdturfing,occurswhenanattackerpaysagroupofInternetuserstocarryoutmaliciouscampaigns.Recentcrowdturf-ingattacksrangedfrom“artificialgrassroots”politicalcampaigns[32,38],productpromotionsthatspreadfalserumors[10],tospamdissemination[13,39].Today,thesecampaignsaregrowinginpopularityindedicated24023rdUSENIXSecuritySymposiumUSENIXAssociationcrowdturfingsites,e.g.ZhuBaJie(ZBJ)2andSanDaHa(SDH)3,andgenericcrowdsourcingsites[26,48].Thedetectionofcrowdturfingactivityisanidealcon-texttostudytheimpactofadversarialattacksonma-chinelearningtools.First,crowdturfingisagrowingthreattotoday’sonlineservices.Becausetasksareper-formedbyintelligentindividuals,theseattacksareunde-tectablebynormalmeasuressuchasCAPTCHAsorratelimits.Theresultsofthesetasks,fakeblogs,slander-ousreviews,fakesocialnetworkaccounts,areoftenin-distinguishablefromtherealthing.Second,centralizedcrowdturfingsiteslikeZBJandSDHprofitdirectlyfrommaliciouscrowdsourcingcampaigns,andthereforehavestrongmonetaryincentiveandthecapabilitytolaunchadversarialattacks.Thesesiteshavethecapabilitytomodifyaggregatebehavioroftheirusersthroughinter-facechangesorexplicitpolicies,therebyeith