I.J.MathematicalSciencesandComputing,2020,6,15-23PublishedOnlineDecember2020inMECS()DOI:10.5815/ijmsc.2020.06.03Copyright©2020MECSI.J.MathematicalSciencesandComputing,2020,6,15-23ActionRecognitionBasedontheModifiedTwo-streamCNNDanzhenga,HangLia,*,andShoulinYina,*aSoftwareCollege,ShenyangNormalUniversity,Shenyang110034,ChinaCorrespondingAuthor:lihangsoft@163.com;yslinhit@163.comReceived:20October2020;Accepted:03November2020;Published:08December2020Abstract:Humanactionrecognitionisanimportantresearchdirectionincomputervisionareas.Itsmaincontentistosimulatehumanbraintoanalyzeandrecognizehumanactioninvideo.Itusuallyincludesindividualactions,interactionsbetweenpeopleandtheexternalenvironment.Space-timedual-channelneuralnetworkcanrepresentthefeaturesofvideofrombothspatialandtemporalperspectives.Comparedwithotherneuralnetworkmodels,ithasmoreadvantagesinhumanactionrecognition.Inthispaper,aactionrecognitionmethodbasedonimprovedspace-timetwo-channelconvolutionalneuralnetworkisproposed.First,thevideoisdividedintoseveralequallengthnon-overlappingsegments,andaframeimagerepresentingthestaticfeatureofthevideoandastackedopticalflowimagerepresentingthemotionfeaturearesampledatrandompartfromeachsegment.Thenthesetwokindsofimagesareinputintothespatialdomainandthetemporaldomainconvolutionalneuralnetworkrespectivelyforfeatureextraction,andthenthesegmentedfeaturesofeachvideoarefusedinthetwochannelsrespectivelytoobtainthecategorypredictionfeaturesofthespatialdomainandthetemporaldomain.Finally,thevideoactionrecognitionresultsareobtainedbyintegratingthepredictivefeaturesofthetwochannels.Throughexperiments,variousdataenhancementmethodsandtransferlearningschemesarediscussedtosolvetheover-fittingproblemcausedbyinsufficienttrainingsamples,andtheeffectsofdifferentsegmentalnumber,pre-trainingnetwork,segmentalfeaturefusionschemeanddual-channelintegrationstrategyonactionrecognitionperformanceareanalyzed.Theexperimentresultsshowthattheproposedmodelcanbetterlearnthehumanactionfeaturesinacomplexvideoandbetterrecognizetheaction.IndexTerms:Actionrecognition,dual-channel,convolutionalneuralnetwork.1.IntroductionWhenhumanbeingsgetinformationfromtheoutsideworld,visualinformationaccountsfor80%ofthetotalinformationobtainedbyvariousorgans.Thisinformationisofgreatsignificanceforunderstandingthenatureofthings.WiththerapiddevelopmentofmobileInternetandelectronictechnology,mobilephonesandothervideocapturedeviceshavebecomepopularinlargeNumbers,andInternetshortvideoapplicationshavemushroomedlikemushrooms,greatlyreducingthecostofvideoshootingandsharing,whichleadstotheexplosivegrowthofonlinevideoresources.Theseresourcesenrichpeople'slife,butbecauseoftheirhugeamount,varietyandcontent,howtoconductintelligentanalysis,understandingandrecognitionofthesevideodatahasbecomeanurgentchallenge[1-5].Humanactionrecognitionisanimportantresearchdirectioninthefieldofcomputervision.Themajorresearchobjectivesaretosimulatehumanbraintoanalyzeandrecognizehumanactioninvideos,whichusuallyincludesindividualactionsofhumanbeings,interactionsbetweenhumanbeingsandtheoutsideworldandenvironment.Inthetraditionalactionrecognitionmethodsbasedonartificialdesignfeatures,theearlyfeaturesbasedonhumanbodygeometryoractioninformationareonlysuitablefortherecognitionofsimplehumanbodymovementsinsimplescenes,whilethespatio-temporalinterestpointsmethodismoreeffectiveinthecaseofrelativelycomplexbackground.Inthisway,theinterestpointsordensesamplingpointsinspace-timeinthevideoareobtainedfirst,andlocalcharacteristicsarecalculatedbasedonthespace-timechunksaroundthesepoints.Inthisway,thecharacteristicvectordescribingthevideoactioniseventuallyformedbyusingtheclassicfeatureencodingmethodssuchasBagofFeatures(BoF),VLAD(VectorofLocallyAggregatedDescriptors)orFisherVector[6-8].Currently,inthelocalfeature-basedapproach,theactionidentificationmethodbasedonDenseTrajectory(DT)hasobtainedbetteridentificationresultsinmanypublicrealsceneactiondatabases.TheyobtaintheDenseTrajectorybytrackingthedensesamplingpointsineachframeofthevideo,andthencalculatetheTrajectorycharacteristicstodescribetheactioninthevideo.Forexample,Cai[9]usedmulti-viewsupervector(MVSV)asglobaldescriptortocodethefeatureofDenseTrajectory.Wang[10]improvedsetterTrajectory(IDT)featureusingFVencoding.Peng[11]usedBagofVisualWords,(BoVW)tocodespace-timepointofinterestorfeaturesofimproveddensetrajectorycharacteristic.Basedondensetrajectory16ActionRecognitionBasedontheModifiedTwo-streamCNNCopyright©2020MECSI.J.MathematicalSciencesandComputing,2020,6,15-23characteristics,Wang[12]proposedamultistagevideorepresentationmodelMoFAP(MotionFeatures,Atoms,andPhrases),whichcouldrepresentthevisualinformationinahierarchicalmanner.Densetrajectoriescanextractactionalfeatureswithwidercoverageandfinergranularity,butthereisusuallyalargenumberoftrajectoryredundancywhichlimitstherecognitioneffect.Alongwiththedeeplearningsuccessfullyusedinthefieldofspeechandimagerecognitionandsoon,especiallytheConvolutionalneuralnetwork(CNN),avarietyofhumanactionrecognitionm