2012语料库与外语研究研修班语料库研究方法概述选题、设计与方法Putitaltogether李文中中国外语教育研究中心20122012语料库与外语研究研修班语料库研究方法概述语料库不是人学的,正则表达式不是女人学的。2012语料库与外语研究研修班语料库研究方法概述Corpus-drivenisbasicallycorpusbased.2012语料库与外语研究研修班语料库研究方法概述Anycorpus-basedresearchisnecessarilydrivenbycorpusdata.2012语料库与外语研究研修班语料库研究方法概述目标:通过语料库分析和研究:–验证假设、直觉–获得新发现–建立新的假设–构建新的理论–验证已有的发现–解决难题2012语料库与外语研究研修班语料库研究方法概述创新:数据方法技术解读/理论/视角√新√√√√√√√√√√2012语料库与外语研究研修班语料库研究方法概述基于语料库方法是一种验证程序语料库驱动方法是一种发现程序2012语料库与外语研究研修班语料库研究方法概述理据:任何感知都是推断Anyperceptionisbutinferencing.2012语料库与外语研究研修班语料库研究方法概述worldofrealityworldoftextEinsteinGulfUnbridgeable2012语料库与外语研究研修班语料库研究方法概述眼耳鼻舌身意色声香味触法学问思辨行文本2012语料库与外语研究研修班语料库研究方法概述基本步骤:1.确定题目2.提出问题3.确定总体和样本4.选择工具5.处理数据6.描述结果:分类、总结特征(description)7.解释结果:观察、描述、解释(explanation)8.解读结果:意义、价值、应用(interpretation)Identifyingaproblem•Somethingorphenomenon:–outofexpectation–Incongruent–Needasolution–puzzlingReadingtobebetterinformed•Whathasbeendoneascontribution•Whathasbeenleftundone•Whathasbeendonewrong•Nevercountsomeoneelse’smoney.Formulatingresearchquestions•Naming:whatis…•Classificatory:Howaretheyinterrelated(patterned)?•Explanatory:towhatextentdotheyco-occur?•Predictive:Whatwillhappenif…?•Neveraskaquestiontowhichyoualreadyknowtheanswer;neverask'howto'questionFindingamethod•Population•Sample•SamplingP(population)S(Sample)R(Result)I(Interpretation)SamplingvalidityreliabilityValidityGeneralizability•IF•PS•SR•RI•THEN•IP2012语料库与外语研究研修班语料库研究方法概述Descriptiveresearch–singletext–textvs.text–peoplevs.text2012语料库与外语研究研修班语料库研究方法概述Researchquestions1.Howmanydifferentwordformsareusedinthetext?Howmanyrunningwordsareused?Whatistheirdistribution?2.Towhatextentcanthelevelofdifficultyofthetextbecomputedonthebasisofthegradedwordlists?3.Howmanydifferentwordclassesareused?Whatisthenumberofeachwordclass?2012语料库与外语研究研修班语料库研究方法概述Method–ToanswerRQ1,generateawordlistofthegiventextandobserve:•Thenumberoftypes•Thenumberoftokens•thetype/tokenratio(TTR)•Ifthetextisverylarge,standardizetheTTR•thetypesandtheirfrequencycumulativepercentage2012语料库与外语研究研修班语料库研究方法概述–ToanswerRQ2,computethewordlistagainstabatchofgradedwordlists,andobserve:•HowmanytypesonLevel1,2,and3listsareusedinthetext?Andwhatistheirpercentage?•Whatabouttheirtokens?•Howmanytypesthatarenotonanylistareusedinthetext?Summarizetheirfeatures.2012语料库与外语研究研修班语料库研究方法概述–ToanswerRQ3,retrieveeachwordclassfromthePOStaggedtext,andsortthemonfrequencyindecreasingorder•Retrieveallthenouns,verbs,andadjectives•Sortthelist2012语料库与外语研究研修班语料库研究方法概述Instruments–UseAntconc3.0togeneratethewordlist;–UseRangetocompareandcontrastthewordlistagainstabatchofgradedwordlists;–UsePowerGreptoretrievethewordclassfromthePOStaggedtext;2012语料库与外语研究研修班语料库研究方法概述Explanatoryresearch–interrelationshipbetweenwords–IRbetweenphraseologies–IRbetweengenres2012语料库与外语研究研修班语料库研究方法概述Researchonrelationship:–shape–direction–strength2012语料库与外语研究研修班语料库研究方法概述Researchquestions–Whatarethewordsthatareuniquetothetextintermsofitssubjectmatter?–Towhatextentarethesewordsrelatedtothesubject/topicofthetext?–Whatpatternsofrelationshipsexistamongthekeywords?2012语料库与外语研究研修班语料库研究方法概述Method–Compare&contrastthewordlist(oftheobservedtextorcorpus)againstthewordlistofthereferencetextorcorpus(larger);–Observeandgroupthewordswithinaclassificationframework;2012语料库与外语研究研修班语料库研究方法概述Instruments–Antconc3.0Otherapplications–Literaryanalysis–Automaticsummarization2012语料库与外语研究研修班语料库研究方法概述ResearchonwordusesObjectives:–Observethecollocatesofaword;–Studyitspatternsofuses;–Studyitsmeaningsassociatedwithitspatternsofuses;–Studythesemanticprosodyofitsmeaning2012语料库与外语研究研修班语料库研究方法概述Researchquestions1.WhatwordscollocatewiththeSearchWord?Whatisthestrengthofthecollocability?2.WhatisthepatternoftheSW?Andwhatisitssemanticpreference?3.Whatisthesemanticprosodyofthepattern?2012语料库与外语研究研修班语料库研究方法概述Method–Searchtheword(KW,SW,orNodeWord)asKWIC;–Observeitscollocatesandtheirwordclasses;–Observethemeaningthatisassociatedwiththepattern;–Observeitssemanticprosody;2012语料库与外语研究研修班语料库研究方法概述Instruments–Antconc3.0•Concordance–Sort:Level1,Level2,Level3–Frequencycount•Collocates–Sort–SortPOStags2012语料库与外语研究研修班语料库研究方法概述ResearchonchunksObjectives:–Toretrievethemultiwordsequences;–Toexaminetheinternalstructureofsuchsequences;–Toobtainthesequencesuniquetoaspecifictext;2012语料库与外语研究研修班语料库研究方法概述Researchquestions–Whatmultiwordsequences(intermsofn-gram)arefoundinthegiventext?–Howarethesesequencesstructuredintermsoflexicalgrammaticalpattern?–Howisthemessageconveyedassociatedwiththeoverallstructureofthesequences?2012语料库与外语研究研修班语料库研究方法概述Method–Segmentthetextandgenerateabatchoflistsofmultiwordsequences(ofvariouslengths);–Observethestructureoftheretrievedn-gramsandexaminetheirregularities;–Studythesemanticandpragmaticfeatures;2012语料库与外语研究研修班语料库研究方法概述Instruments–Kfngram–WordsmithToolsv3.0–PowerGrep3.52012语料库与外语研究研修班语料库研究方法概述ResearchonparalleltextsObjectives:–Toobservehowthesourcetextwastranslatedintothetargettext;–Toobservetheprobabilityofthetranslationunitsandcorrespondingunitsfoundinthetext;–Tostudythedynamicsofthetranslationofagivencommunity;2012语料库与外语研究研修班语料库研