基于卷积神经网络的Selfe手语识别(IJISA-V10-N10-7)

devilcloud
0 ℃
2021-03-12

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

I.J.IntelligentSystemsandApplications,2018,10,63-71PublishedOnlineOctober2018inMECS()DOI:10.5815/ijisa.2018.10.07Copyright©2018MECSI.J.IntelligentSystemsandApplications,2018,10,63-71SelfieSignLanguageRecognitionwithConvolutionalNeuralNetworksP.V.V.Kishore,G.AnanthaRao,E.KiranKumar,M.TejaKiranKumar,D.AnilKumarDepartmentofElectronicsandCommunicationEngineering,KLUniversity,GreenFields,Vaddeswaram,Guntur,IndiaE-mail:pvvkishore@kluniversity.in,ananth.gondu@gmail.com,kiraneepuri@kluniversity.in,mtejakiran@kluniversity.in,danilmurali@kluniversity.inReceived:22November2017;Accepted:09February2018;Published:08October2018Abstract—Extractionofcomplexheadandhandmovementsalongwiththeirconstantlychangingshapesforrecognitionofsignlanguageisconsideredadifficultproblemincomputervision.ThispaperproposestherecognitionofIndiansignlanguagegesturesusingapowerfulartificialintelligencetool,convolutionalneuralnetworks(CNN).Selfiemodecontinuoussignlanguagevideoisthecapturemethodusedinthiswork,whereahearing-impairedpersoncanoperatetheSignlanguagerecognition(SLR)mobileapplicationindependently.Duetonon-availabilityofdatasetsonmobileselfiesignlanguage,weinitiatedtocreatethedatasetwithfivedifferentsubjectsperforming200signsin5differentviewinganglesundervariousbackgroundenvironments.Eachsignoccupiedfor60framesorimagesinavideo.CNNtrainingisperformedwith3differentsamplesizes,eachconsistingofmultiplesetsofsubjectsandviewingangles.Theremaining2samplesareusedfortestingthetrainedCNN.DifferentCNNarchitecturesweredesignedandtestedwithourselfiesignlanguagedatatoobtainbetteraccuracyinrecognition.Weachieved92.88%recognitionratecomparedtootherclassifiermodelsreportedonthesamedataset.IndexTerms—Selfiesignlanguage,ConvolutionalNeuralNetworks(CNN),Stochasticpooling,Signlanguagerecognition(SLR),Deeplearning.I.INTRODUCTIONSignlanguagerecognition(SLR)isanevolvingresearchareaincomputervision.ThechallengesinSLRarevideotrimming,signextraction,signvideobackgroundmodelling,signfeaturerepresentationandsignclassification.Alltheproblems[1]areattemptedinthepasthavemetconsiderableamountofsuccessandareinstrumentalindevelopmentofthestateofthealgorithmsforSLR.Gesturerecognitionusespowerfulimagingandartificialintelligencebasedalgorithmsforclassification[2].Currenttrendsshowanurgetobringgesturerecognitionintomobileenvironments[3].Signlanguageisvisualmodeofcommunicationbetweentwohearingimpairedorhardhearingpeople.Thecommunicationfoundationsarebasedonfingershapes,handshapes,handmovementsinspacewithrespecttobody,handorientationsandfacialexpressions.Thehumansaretrainedexclusivelytohandsuchhugeamountsofinformationforyears.Formachinetranslation,theproblemtransformsintoa2Dnaturallanguageprocessingproblem.Many1D/2D/3Dmodelsareproposedinliteraturewithlittlesuccesstobringthemodelclosetorealtimeimplementation[4-7].Inthiswork,thefocuswillbetorecognizesignsofIndiansignlanguageusing2Dselfievideocapturedusingamobilefrontcamera.Eventhoughthedevelopmentofamobileappisfarfromreality,theobjectiveistosimulatealgorithmsthatcanoptimallyexecuteonamobileplatform.Theprimarymoduleistoextractinformationframestoreduceinputvideodataperframe.Avisualattentionbasedframeworkproposedin[8]ischosenforaccuracyandcomputationtime.Themodelworkswellforconstantvideobackgroundsandwewilllimitourworktothistypicalvideosets.Fig.1.SampledatabaseofselfiesignlanguageUnavailabilityofbenchmarkdatasetsforSelfiemodeIndiansignlanguage(ISL)promptedustocreateourowndataset.Thedatasetishaving200ISLcommonlyusedwordsperformedby5nativeISLusers(i.e.5sets)in5differentviewingangles(userdependentangles)atarate64SelfieSignLanguageRecognitionwithConvolutionalNeuralNetworksCopyright©2018MECSI.J.IntelligentSystemsandApplications,2018,10,63-71of30fps.Trainingisinitiatedwiththreedifferentbatchsizes.InBatch-Ioftrainingonlyoneset,i.e.200signsperformedby1userin5differentviewinganglesfor2secondsat30fps,totalof2001523060000signimages.Batch-IIoftrainingisdoneusing2setsi.e.atotalof20025230120000signimages.InBatch-IIIoftraining3setsofsignimageswereused.ThetrainedCNN’saretestedwithtwodiscretevideosetshavingdifferentsignersandviewingangleswithvaryingbackgrounds.Therobustnesstestingisperformedintwocases.Incase-Ioftestingsamedataseti.e.alreadytraineddatasetisusedandincase-IIoftestingdifferentdatasetisused.Figure1showsthesampledatabasecreatedforthiswork.TheperformanceoftheCNNalgorithmsismeasuredbasedontheiraccuracyinrecallandrecognitionrates.Therestofthepaperisasfollows:Section2discusstherelatedworks.Insection3,theproposedarchitectureofCNNisdescribed.Section4discusstheresultsobtainedindifferenttrainingandtestingcases.Finally,section5concludestheoutcomesofthispaper.II.RELATEDWORKSSignlanguagerecognition(SLR)hastransformedwithtechnologyupgradationfrom1D,2Dto3Dmodelsinthelast2decades.In1D,SLRisbasedon1Dsignalsacquiredfromahandgloves[8]andclassifiedusingsignalprocessingmethods[9].Bhuyanetal.[10]usedhandshapesandhandtrajectoriestorecognizestaticanddynamichandsignsfromISL.ZhouandChen[11]proposedasigneradaptationmethod,inwhic