Module 5- Advanced MapReduce Features

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

PUBLISHERBLOGHomeAPIs&ToolsDocumentationSupportResourcesMyProjectsSearchYDNRecommendedTopics:yqlupdatesappsyuihackdayoauthpatternsModule5:AdvancedMapReduceFeaturesPreviousmodule|Tableofcontents|NextmoduleIntroductionInModule4youlearnedthebasicsofprogrammingwithHadoopMapReduce.ThatmoduleexplainshowdatamovesthroughageneralMapReducearchitecture,andwhatparticularmethodsandclassesfacilitatetheuseoftheHadoopforprocessing.InthismodulewewilllookmorecloselyathowtooverrideHadoop'sfunctionalityinvariousways.ThesetechniquesallowyoutocustomizeHadoopforapplication-specificpurposes.GoalsforthisModule:UnderstandadvancedHadoopfeaturesBeabletouseHadooponAmazonEC2andS3Outline1.Introduction2.GoalsforthisModule3.Outline4.CustomDataTypes1.WritableTypes2.CustomKeyTypes3.UsingCustomTypes4.FasterComparisonOperations5.FinalWritableNotes5.InputFormats1.CustomFileFormats2.AlternateDataSources6.OutputFormats7.PartitioningData8.ReportingCustomMetrics9.DistributingAuxiliaryJobData10.DistributingDebugScripts11.UsingAmazonWebServices12.ReferencesCustomDataTypesHadoopMapReduceusestypeddataatalltimeswhenitinteractswithuser-providedMappersandReducers:datareadfromfilesintoMappers,emittedbymapperstoreducers,andemittedbyreducersintooutputfilesisallstoredinJavaobjects.WRITABLETYPESObjectswhichcanbemarshaledtoorfromfilesandacrossthenetworkmustobeyaparticularinterface,calledWritable,whichallowsHadooptoreadandwritethedatainaserializedformfortransmission.HadoopprovidesseveralstockclasseswhichimplementWritable:Text(whichstoresStringdata),IntWritable,LongWritable,FloatWritable,BooleanWritable,andseveralothers.TheentirelistisintheDEVELOPERNewUser?RegisterSignInHelpMakeY!MyHomepageMailSearchSearchWeborg.apache.hadoop.iopackageoftheHadoopsource(seetheAPIreference).Inadditiontothesetypes,youarefreetodefineyourownclasseswhichimplementWritable.YoucanorganizeastructureofvirtuallyanylayouttofityourdataandbetransmittedbyHadoop.Asamotivatingexample,consideramapperwhichemitskey-valuepairswherethekeyisthenameofanobject,andthevalueisitscoordinatesinsome3-dimensionalspace.Thekeyissomestring-baseddata,andthevalueisastructureoftheform:structpoint3d{floatx;floaty;floatz;}ThekeycanberepresentedasaTextobject,butwhataboutthevalue?HowdowebuildaPoint3DclasswhichHadoopcantransmit?TheansweristoimplementtheWritableinterface,whichrequirestwomethods:publicinterfaceWritable{voidreadFields(DataInputin);voidwrite(DataOutputout);}Thefirstofthesemethodsinitializesallofthefieldsoftheobjectbasedondatacontainedinthebinarystreamin.Thelatterwritesalltheinformationneededtoreconstructtheobjecttothebinarystreamout.TheDataInputandDataOutputclasses(partofjava.io)containmethodstoserializemostbasictypesofdata;theimportantcontractbetweenyourreadFields()andwrite()methodsisthattheyreadandwritethedatafromandtothebinarystreaminthesameorder.ThefollowingcodeimplementsaPoint3DclassusablebyHadoop:publicclassPoint3DimplementsWritable{publicfloatx;publicfloaty;publicfloatz;publicPoint3D(floatx,floaty,floatz){this.x=x;this.y=y;this.z=z;}publicPoint3D(){this(0.0f,0.0f,0.0f);}publicvoidwrite(DataOutputout)throwsIOException{out.writeFloat(x);out.writeFloat(y);out.writeFloat(z);}publicvoidreadFields(DataInputin)throwsIOException{x=in.readFloat();y=in.readFloat();z=in.readFloat();}publicStringtoString(){returnFloat.toString(x)+,+Float.toString(y)+,+Float.toString(z);}}Listing5.1:APointclasswhichimplementsWritableCUSTOMKEYTYPESAswritten,thePoint3Dtypewillworkasavaluetypelikewerequireforthemapperproblemdescribedabove.ButwhatifwewanttoemitPoint3Dobjectsaskeystoo?InHadoopMapReduce,if(key,value)pairssenttoasinglereducetaskincludemultiplekeys,thereducerwillprocessthekeysinsortedorder.Sokeytypesmustimplementastricterinterface,WritableComparable.InadditiontobeingWritablesotheycanbetransmittedoverthenetwork,theyalsoobeyJava'sComparableinterface.ThefollowingcodelistingextendsPoint3Dtomeetthisinterface:publicclassPoint3DimplementsWritableComparable{publicfloatx;publicfloaty;publicfloatz;publicPoint3D(floatx,floaty,floatz){this.x=x;this.y=y;this.z=z;}publicPoint3D(){this(0.0f,0.0f,0.0f);}publicvoidwrite(DataOutputout)throwsIOException{out.writeFloat(x);out.writeFloat(y);out.writeFloat(z);}publicvoidreadFields(DataInputin)throwsIOException{x=in.readFloat();y=in.readFloat();z=in.readFloat();}publicStringtoString(){returnFloat.toString(x)+,+Float.toString(y)+,+Float.toString(z);}/**returntheEuclideandistancefrom(0,0,0)*/publicfloatdistanceFromOrigin(){return(float)Math.sqrt(x*x+y*y+z*z);}publicintcompareTo(Point3Dother){floatmyDistance=distanceFromOrigin();floatotherDistance=other.distanceFromOrigin();returnFloat.compare(myDistance,otherDistance);}publicbooleanequals(Objecto){if(!(otherinstanceofPoint3D)){returnfalse;}Point3Dother=(Point3D)o;returnthis.x==other.x&&this.y==other.y&&this.z==other.z;}publicinthashCode(){returnFloat.floatToIntBits(x)^Float.floatToIntBits(y)^Float.floatToIntBits(z);}}Listing5.2:AWritableComparableversionofPoint3DItisimportantforkeytypestoimplementhashCode()aswell;thesectiononPartitionerslaterinthis

1 / 15
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功