计算机视觉介绍IntroductiontoComputerVision邹丰美联系:fmzou@xmu.edu.cn,13459246202资料下载:计算机视觉的背景及几何基础简介5次讲座的题目/时间1.计算机视觉的背景及几何基础(2/13,第1周)2.摄像机的几何标定(3/6,第4周)3.刚体运动姿态估计问题(3/27,第7周)4.姿态估计问题(II)(或对应问题)(4/17,第10周)5.应用(5/8,第13周)要求•听5次讲座并积极提问,共同讨论(每次有约15-20分钟的提问及讨论时间)•至少完成3个实验中的一个(程序+报告)•(上机地点头两周内定,到时候我通知)•完成一篇(与实验相关的)“学术”论文•最终成绩计算:•本科生:60%(实验)+40%(文章)•研究生:40%(实验)+60%(文章)纲要•什么是CV?–什么是CV?它是从什么时候发展起来的?–它有哪些研究内容?–它与哪些学科/领域相关?–CV的若干问题及应用展望•几何基础/概率基础•一些相关资源DefinitionsofCV(1)•“Today,thestudyofextracting3-Dinformationfromvideoimagesandbuildinga3-Dmodelofthescene,calledcomputervisionorimageunderstanding,isoneoftheresearchareasthatattractthemostattentionallovertheworld.”–fromK.Kanatani,“StatisticalOptimizationforGeometricComputation:TheoryandPractics”,1996.CV的定义(2)•“视觉,不仅指对光信号的感受,它还包括了对视觉信息的获取、传输、处理、存储与理解的全过程.信号处理理论与计算机出现以后,人们试图用摄像机获取环境图像并将其转换成数字信号,用计算机实现对视觉信息处理的全过程,这样,就形成了一门新兴的学科--计算机视觉”.“计算机视觉的研究目标是使计算机具有通过二维图像认知三维环境信息的能力.”-“计算机视觉-计算理论与算法基础”,马颂德,张正友,1998.•“计算机视觉是当前计算机科学研究的一个非常活跃的领域,该学科旨在为计算机和机器人开发出具有与人类水平相当的视觉能力。各国学者对于计算机视觉的研究始于20世纪60年代初,但相关基础研究的大部分重要进展则是在80年代以后取得的。”–“=1332”研究的内容•早期:低层(low-level)图像处理,如imagetransformation,imagerestoration,imageenhancement,thresholding,regionlabelling,andshapecharacterization.•“TriedtoidentifyandclassifyobjectsinimagesbytechniquesofPatternRecognition(模式识别),whichhadbeendevelopedforthepurposeofrecognizing2-Dcharactersandsymbolsbyfeatureextractionandstatisticaldecisionmakingbylearning”.•“Manypatternrecognitionresearchersbelievedthattheparadigmofpatternrecognitionwouldalsoleadtointelligentvisionsystemsthatcouldunderstand3-Dscenes”.•“However,theysoonrealizedthecrucialfactthat3-Dobjectslookverydifferentfromviewpointtoviewpointbeyondthecapabilityof2-Dfeature-basedlearning;3-Dmeaningsof2-Dimagescannotbeunderstoodunlesssomeapriorknowledgeaboutthesceneisgiven.Thus,‘Knowledge’cametoplayanessentialrole”.•“Thistypeofknowledge-based‘high-level’reasoningiscalledthetop-down(自上而下)(orgoal-driven(目标驱动))approach.”•“Inasense,thisapproachcorrespondstothepsychologicalviewtowardhumanperception(感知)thathumansunderstandtheenvironmentbyunconsciously‘matching’thevastamountofknowledgeaccumulatedfromexperienceintheprocessofgrowth.”•“ThisviewcanbecomparedtowhatisknownastheGestaltpsychology,whichregardshumanperceptionasintegrationoftheenvironmentandexperience.”•Thus,theproblemofhowtorepresentandorganizesuchknowledgebecameamajorconcern,andmanysymbolicschemeswerederived.Establishingsuchsymbolicrepresentationsisoneofthecentralthemesofartificialintelligence(人工智能),andmachinevisionwasregardedasproblemsolvingbyartificialintelligence.•“However,theinherentdifficultyofthisapproachwassoonrealized:theamountofnecessaryknowledge,mostofwhichhastheformof“if…then…else…”,islimitless,heavilydependingonthedomainofeachapplication(“officescene”,“outdoorscene”,etc)andconstantlychanging(e.g.,today,manytelephonesarenolongerblackanddonothavedials).Howeverlargetheamountofknowledgeis,exceptionsareboundtoappear,andcomputationtimeblowsupexponentiallyastheamountofknowledgeincreases.”•Manycombinatorialtechniqueswereproposedsoastofindplausibleinterpretationefficientlywithoutdoingexhaustivesearch.Suchtechniquesincludevarioustypesofheuristic(启发式的)searchaswellasspecialtechniquessuchasconstraintpropagation(约束繁殖)andprobabilisticrelaxation(概率松弛).•“Realizingthatsuchcomputationalproblemsareinevitableaslongasknowledgeisdirectlymatchedwithfeaturesextractedfromrawimages,researchersbegantopayattentionto“physical/opticallaws”governing3-Dscenes.Inanalyzing2-Dimages,suchlawscanprovidecluestothe3-Dshapesandpositionsofobjects.”•“Forexample,thesurfacegradientsofobjectscanbeestimatedbyanalyzingshadingintensities(shapefromshading).Theorientationofasurfaceinthescenecanalsobeestimatedbyanalyzingtheperspectivedistortionofatextureonit(shapefromtexture).Ifobjectsaremovinginthescene(orthecameraismovingrelativetotheobjects),the3-Dshapesoftheobjectsandtheir3-Dmotions(orthecameramotion)canbecomputed(shapefrommotionorstructurefrommotion).”•“Althoughsuchanalysesrequireappropriateassumptionsaboutsurfacereflectance,illumination,perspectivedistortion,andrigidmotion,theydonotdependonspecificapplicationdomains;theyarecalledconstraintsincontrastto‘knowledge’forthetop-downapproach.•Thisapproachisinlinewiththepsychologicalviewtowardhumanvisionthathumanperceptionoccursautomaticallywhenvisualsignalstrigger‘computation’inthebrainandthatthiscomputationalfunctionalityisinnate,acquiredintheprocessofevolution.”•ThisviewwasassertedbyJ.J.Gibson,whohadagreatinfluenceonnotonlypsychologistsbutalsomachinevisionresearchers.•Thus,anewparadigm(范例)wasestablished.First,primitivefeaturesareextractedfromrawimagesbyedgedetectionandimagesegmentation,resultinginprimalsketches;next,approximateshapesandsurfaceorientationsareestimatedbyapplyingavailableconstraints(shading,texture,motion,stereo,etc.),resultingin2.5-Dsketches;•then,appropriate3-Dmodels(e.g.,generalizedcylinders)arefittedtosuchdata,resultinginanumericalandsymbolicr