TextputinpublicdomainbyDidierStevens,noCopyrightContext:thisisthefirstdraftofachapterIwroteasco-authorofamalwareanalysisbook.Thebookhasbeencanceledbythemainauthor.ThischapteronMaliciousPDFAnalysiswasmycontributiontothebook.(EXEfiles)areoftenblockedbymanye-mailserversandclients,theyhadtolookforalternativesandPDFfilesturnedouttobeaviablesolution.ButwhyisaPDFfileagoodalternativetoanexecutable?ThePortableDocumentFormatisnotaprogramminglanguage,itsapagedescriptionlanguage,specifyinghowtorenderthecontentofapage,likethepagesyoufindinthisbook.Sohowcanthisbeusedtodeliveramaliciouspayload?TheanswerliesinprogrammingerrorsmadeintheapplicationsthatprocessPDFfiles,likePDFrenderingsoftware,ofwhichAdobeReaderisbyfarthemostpopular.Whatmalwareauthorsdoisexploitvulnerabilities(programmingerrors)inAdobeReaderinsuchawaythattheycanexecutearbitrarycodeonaWindowsmachinewithavulnerableinstallationofAdobeReader.ThePDFlanguageisbasedonthePostScriptlanguagewhichisaprogramminglanguage,butPDFisasubsetofPostScript,withoutthefeaturesthatmakeitaprogramminglanguage.BriefintroductiontothePDFfileformatAlthoughyoudonotneedtofullyunderstandthePDFlanguagetobeabletoanalyzemaliciousPDFfiles,somebasicnotionswillgetyoufar.IfsomeaspectofthePDFlanguageisnotcleartoyouandyoususpectitisessentialtounderstandwhatthemalwareauthordid,refertotheofficialPDFreferencedocuments().APDFfilecanbeabinaryfileoranASCIIfile.EveryPDFdocumentcanbeencodedsothatitisapureASCIIfile,butthesearerareandaremostlyusedforeducationalpurposes.AllthePDFfilesyouarelikelytobeconfrontedwithwillbebinaryPDFfiles.Toeasethelearningcurve,Ihaveproducedapure-ASCIIPDFfilewithjusttheessentialelementstorenderapagewiththetext“HelloWorld”.%PDF-1.110obj/Type/Catalog/Outlines20R/Pages30Rendobj20obj/Type/Outlines/Count0endobj30obj/Type/Pages/Kids[40R]/Count1endobj40obj/Type/Page/Parent30R/MediaBox[00612792]/Contents50R/Resources/ProcSet60R/Font/F170Rendobj50obj/Length46streamBT/F124Tf100700Td(HelloWorld)TjETendstreamendobj60obj[/PDF/Text]endobj70obj/Type/Font/Subtype/Type1/Name/F1/BaseFont/Helvetica/Encoding/MacRomanEncodingendobjxref08000000000065535f000000001200000n000000008900000n000000014500000n000000021400000n000000038100000n000000048500000n000000051800000ntrailer/Size8/Root10Rstartxref642%%EOFIwillnotexplainthisPDFfilefromAtoZinthisbook,wewillonlyfocusonsomeessentialelements.Butifyouwantafullexplanation,readmyHakin9magazinearticle“AnatomyofMalicousPDFDocuments”()APDFdocumentstartswithaheader:%PDF-X.Y.X.YistheversionofthePDFlanguageusedbythePDFdocument.Ifthisheaderisnotpresent(orcorrupted),itisnotavalidPDFfileandmostPDFrenderingsoftwarewillnotacceptit.TheelementsyouwillneedtounderstandinyouranalysisofaPDFfileareindirectobjects:10obj…endobjIndirectobjectshaveanindexnumber(1inourexample)andaversionnumber(0inourexample)andtheircontentiscontainedbetweenkeywordsobjandendobj.Objectscanrefertootherindirectobjectsbyusingtheirindexandversionnumber,likethis:10R(thisisareferencetoobject10).Thisreferencingcreatesatreestructureofobjects,whichisknownasthelogicalstructureofaPDFdocument:TherootelementofthistreestructureisidentifiedinthePDFdocumentbythe/Rootentryinthetrailer.Inourexample,/Rootreferstoindirectobject10(/Root10R).Asitsnameimplies,thetrailerisfoundattheendofthePDFdocument.ThephysicalstructureofaPDFdocumentistheorderinwhichtheindirectobjectsappearinthefileandisindependentofthelogicalstructure.OnetypeofobjectessentialforanalyzingmaliciousPDFfilesisthestreamobject.Indirectobject50inourexamplefileisastreamobject:50obj/Length46streamBT/F124Tf100700Td(HelloWorld)TjETendstreamendobjAstreamobjectcontainsastreamofdatabetweenthekeywordsstreamandendstream.Thisdatastreamisoftencompressedandthuslookslikeameaninglessbunchofbytestotheuntrainedeye:50obj/Subtype/Type1C/Length5416/Filter/FlateDecodestreamH‰|T}T#W#Ÿ!d&FI#ʼnNFW#åC…endstreamendobjInthisexample,thecompressionusedistheFlatemethodofthezliblibrary.Youcanseethisbecauseofthe/Filter/FlateDecodeentry.InthePDFparlance,afilterisacompressionmethod.Astreamofdatacanbecompressedbymorethanonefilter.TherearemanyfeaturesofthePDFlanguageandseveraltricksmalwareauthorscanuse(consciouslyandunconsciously)tomakeyouranalysisofaPDFfilemoredifficult,butwewillonlygetintotheseafterhavinganalyzedasimplemaliciousPDFfile.Analyzingasimple,pure-ASCIImaliciousPDFfileNow,beforeIlooseyourinterestinPDFfileanalysisbyexplainingtoomuchintoogreatdetailbeforewegetintotheactualanalysisofareal,in-the-wildmaliciousPDFfile,IwanttoperformananalysisofamaliciousPDFfilethatrequiresnodedicatedtools,onlyanASCIIeditor.IdesignedthefollowingPDFdocumenttouseonlyASCIIandtoexploitawell-knownvulnerabilityofAdobeReader:%PDF-1.110obj/Type/Cat