基于改进双流细胞神经网络的动作识别(IJMSC-V6-N6-3)

tdzl8226
0 ℃
2021-03-12

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

I.J.MathematicalSciencesandComputing,2020,6,15-23PublishedOnlineDecember2020inMECS()DOI:10.5815/ijmsc.2020.06.03Copyright©2020MECSI.J.MathematicalSciencesandComputing,2020,6,15-23ActionRecognitionBasedontheModifiedTwo-streamCNNDanzhenga,HangLia,*,andShoulinYina,*aSoftwareCollege,ShenyangNormalUniversity,Shenyang110034,ChinaCorrespondingAuthor:lihangsoft@163.com;yslinhit@163.comReceived:20October2020;Accepted:03November2020;Published:08December2020Abstract:Humanactionrecognitionisanimportantresearchdirectionincomputervisionareas.Itsmaincontentistosimulatehumanbraintoanalyzeandrecognizehumanactioninvideo.Itusuallyincludesindividualactions,interactionsbetweenpeopleandtheexternalenvironment.Space-timedual-channelneuralnetworkcanrepresentthefeaturesofvideofrombothspatialandtemporalperspectives.Comparedwithotherneuralnetworkmodels,ithasmoreadvantagesinhumanactionrecognition.Inthispaper,aactionrecognitionmethodbasedonimprovedspace-timetwo-channelconvolutionalneuralnetworkisproposed.First,thevideoisdividedintoseveralequallengthnon-overlappingsegments,andaframeimagerepresentingthestaticfeatureofthevideoandastackedopticalflowimagerepresentingthemotionfeaturearesampledatrandompartfromeachsegment.Thenthesetwokindsofimagesareinputintothespatialdomainandthetemporaldomainconvolutionalneuralnetworkrespectivelyforfeatureextraction,andthenthesegmentedfeaturesofeachvideoarefusedinthetwochannelsrespectivelytoobtainthecategorypredictionfeaturesofthespatialdomainandthetemporaldomain.Finally,thevideoactionrecognitionresultsareobtainedbyintegratingthepredictivefeaturesofthetwochannels.Throughexperiments,variousdataenhancementmethodsandtransferlearningschemesarediscussedtosolvetheover-fittingproblemcausedbyinsufficienttrainingsamples,andtheeffectsofdifferentsegmentalnumber,pre-trainingnetwork,segmentalfeaturefusionschemeanddual-channelintegrationstrategyonactionrecognitionperformanceareanalyzed.Theexperimentresultsshowthattheproposedmodelcanbetterlearnthehumanactionfeaturesinacomplexvideoandbetterrecognizetheaction.IndexTerms:Actionrecognition,dual-channel,convolutionalneuralnetwork.1.IntroductionWhenhumanbeingsgetinformationfromtheoutsideworld,visualinformationaccountsfor80%ofthetotalinformationobtainedbyvariousorgans.Thisinformationisofgreatsignificanceforunderstandingthenatureofthings.WiththerapiddevelopmentofmobileInternetandelectronictechnology,mobilephonesandothervideocapturedeviceshavebecomepopularinlargeNumbers,andInternetshortvideoapplicationshavemushroomedlikemushrooms,greatlyreducingthecostofvideoshootingandsharing,whichleadstotheexplosivegrowthofonlinevideoresources.Theseresourcesenrichpeople'slife,butbecauseoftheirhugeamount,varietyandcontent,howtoconductintelligentanalysis,understandingandrecognitionofthesevideodatahasbecomeanurgentchallenge[1-5].Humanactionrecognitionisanimportantresearchdirectioninthefieldofcomputervision.Themajorresearchobjectivesaretosimulatehumanbraintoanalyzeandrecognizehumanactioninvideos,whichusuallyincludesindividualactionsofhumanbeings,interactionsbetweenhumanbeingsandtheoutsideworldandenvironment.Inthetraditionalactionrecognitionmethodsbasedonartificialdesignfeatures,theearlyfeaturesbasedonhumanbodygeometryoractioninformationareonlysuitablefortherecognitionofsimplehumanbodymovementsinsimplescenes,whilethespatio-temporalinterestpointsmethodismoreeffectiveinthecaseofrelativelycomplexbackground.Inthisway,theinterestpointsordensesamplingpointsinspace-timeinthevideoareobtainedfirst,andlocalcharacteristicsarecalculatedbasedonthespace-timechunksaroundthesepoints.Inthisway,thecharacteristicvectordescribingthevideoactioniseventuallyformedbyusingtheclassicfeatureencodingmethodssuchasBagofFeatures(BoF),VLAD(VectorofLocallyAggregatedDescriptors)orFisherVector[6-8].Currently,inthelocalfeature-basedapproach,theactionidentificationmethodbasedonDenseTrajectory(DT)hasobtainedbetteridentificationresultsinmanypublicrealsceneactiondatabases.TheyobtaintheDenseTrajectorybytrackingthedensesamplingpointsineachframeofthevideo,andthencalculatetheTrajectorycharacteristicstodescribetheactioninthevideo.Forexample,Cai[9]usedmulti-viewsupervector(MVSV)asglobaldescriptortocodethefeatureofDenseTrajectory.Wang[10]improvedsetterTrajectory(IDT)featureusingFVencoding.Peng[11]usedBagofVisualWords,(BoVW)tocodespace-timepointofinterestorfeaturesofimproveddensetrajectorycharacteristic.Basedondensetrajectory16ActionRecognitionBasedontheModifiedTwo-streamCNNCopyright©2020MECSI.J.MathematicalSciencesandComputing,2020,6,15-23characteristics,Wang[12]proposedamultistagevideorepresentationmodelMoFAP(MotionFeatures,Atoms,andPhrases),whichcouldrepresentthevisualinformationinahierarchicalmanner.Densetrajectoriescanextractactionalfeatureswithwidercoverageandfinergranularity,butthereisusuallyalargenumberoftrajectoryredundancywhichlimitstherecognitioneffect.Alongwiththedeeplearningsuccessfullyusedinthefieldofspeechandimagerecognitionandsoon,especiallytheConvolutionalneuralnetwork(CNN),avarietyofhumanactionrecognitionm