基于Hadoop的分布式视频大数据摘要生成技术的研究与实现重庆大学硕士学位论文(专业学位)学生姓名:冯强指导教师:杨丹教授学位类别:工程硕士(软件工程领域)重庆大学软件学院二O一三年十一月ResearchandImplementationofDistributedBigVideoDataSynopsisGenerationbasedonHadoopAThesisSubmittedtoChongqingUniversityinPartialFulfillmentoftheRequirementfortheProfessionalDegreeByFengQiangSupervisedbyProf.YangDanSpecialty:ME(SoftwareEngineeringField)SchoolofSoftwareEngineeringofChongqingUniversity,Chongqing,ChinaNovember2013重庆大学硕士学位论文中文摘要I摘要视频大数据的处理工作已经远远的超出了单机的处理能力,如何从海量的视频数据中提取出自己想要的信息成为提高生产效率、推动社会发展的一大重要因素。当前已经有研究者提出了基于单机的视频摘要生成算法,但往往是以损失视频的关键信息为代价,同时对视频大数据的处理能力也不足。本文基于Hadoop云平台的分布式计算原理,从视频文件的结构和内容入手进行分析,研究如何实现视频的分布式解码,以及如何在分布式环境中采用运动目标提取、组合、融合的策略重新生成一段较原视频短的多的摘要视频。主要工作如下:1)分析视频文件的组成结构以及视频压缩原理,指出视频文件直接上传到HDFS时,分割所产生的边界帧不完整、边界GOP缺少关键帧、分割数据块缺少头数据的问题,提出一种数据读取策略,实现视频的分布式解码。2)提出一种基于Hadoop的视频大数据摘要生成技术方案,采用前景检测、运动跟踪、摘要生成的算法顺序生成视频摘要,并依赖JNA库解决检测和跟踪算法耗时长的问题。利用HBase做存储中介,设计并存储运动跟踪后的结果。选用遗传算法做轨迹组合优化问题,并利用高斯融合技术实现前景图片与背景图片的无缝融合。最终利用Map/Reduce计算框架实现摘要的分布式生成。3)设计并实现分布式视频摘要生成系统,阐述了系统技术框架的主要流程以及详细实现。并搭建集群实验环境,使用视频数据集进行对比测试。实验结果表明,本文提出的基于Hadoop的视频大数据摘要生成技术方案,可以显著的提高视频摘要的生成速度,与单机处理方式相比,更适合处理海量的视频大数据。关键字:Hadoop,视频大数据,视频摘要,分布式计算重庆大学硕士学位论文英文摘要IIABSTRACTTheworkloadtoprocessbigvideodatahasbeenfarbeyondtheprocessingcapacityofsingle-processingmode.Howtoextracttheinformationfromthehugeamountofvideodatahasbecomeasignificantfactorforincreasingproductivityandpromotingsocialdevelopment.Currently,someresearchershaveproposedvideosynopsisalgorithmbasedonsingle-processingmode.Butitoftenlossesthekeyinformationofthevideoastheprice;atthesametime,itsabilitytoprocessbigvideodataisinsufficient.BasedonthedistributedcomputingframeworkofHadoopcloudplatform,thispapertriedtostudyhowtodecodebigdatavideodistributedly,aswellashowtousethestrategyoftargetextraction,trackrearranging,integrationtogenerateamuchshortervideosynopsisinadistributedenvironmentbyanalyzingthestructureandcontentofvideofiles.Themainworkisasfollows:1)Toanalyzethestructureandcompositionofvideofilesandvideocompressionprinciple,topointouttheproblemsofincompletelyboundaryframe,lackofkeyframeinboundaryGOP,headdatamissingofsegmentationdatablockcausedbysegmentingwhenthevideofilesaredirectlyuploadedtoHDFS,andalsotoputforwardadatareadingstrategysoastorealizethevideodistributeddecodingscheme.2)ToputforwardasolutionofbigvideodatasynopsisgenerationtechnologybasedonHadoop,whichistogeneratevideosynopsiswiththealgorithmsequentialofforegrounddetection,motiontracking,andsynopsisgeneration,aswellastosolvethetime-consumingproblemofdetectionandtrackingalgorithmrelyontheJNAlibrary.HBaseisdesignedtosavetheresultofmotiontrackingasastoragemedium.Geneticalgorithmisadoptedtooptimizetheproblemoftrajectorycombination.Inaddition,Gaussfusiontechnologyisusedtorealizetheseamlessfusionofforegroundimagesandbackgroundimages.Finally,Map/Reducecomputationalframeworkwillbeusedtorealizethedistributedgenerationofsynopsis.3)Todesignandimplementthedistributedvideosynopsisgenerationsystem,alsotoexpoundthemainprocessanddetailedimplementationofthetechnicalframeworkofthesystem.Andtobuildtheclusterexperimentenvironment,tomakecomparisontestwithvideodataset.Theexperimentalresultshowsthatthesolutionofbigvideodatasynopsisgenerationtechnologycanimprovethevideosynopsisgenerationspeedsignificantly,重庆大学硕士学位论文英文摘要IIIanditwillbemoresuitableforprocessingmassbigvideodatacomparedwithsingle-processingmode.Keywords:Hadoop,bigvideodata,videosynopsis,distributedcomputing重庆大学硕士学位论文目录IV目录中文摘要......................................................................................................................................I英文摘要.....................................................................................................................................II1绪论..................................................................................................................................11.1.研究背景与意义..............................................................................................................11.2.国内外研究现状..............................................................................................................21.3.论文研究内容..................................................................................................................51.4.论文组织与结构..............................................................................................................62相关理论与技术.............................................................................................................72.1.Hadoop云平台................................................................................................................72.1.1.HDFS模块...............................................................................................................72.1.2.Map/Reduce模块.....................................................................................................82.2.HBase...............................................................................................................................92.2.1.HBase数据模型.......................................................