资讯检索策略与技巧

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

1資訊檢索策略與技巧黃慕萱,Chap.6Harter,Chap.72檢索策略v.s.檢索技巧最早為軍方用語各家看法1979,MarciaBates,”InformationSearchTactics”Hartly如何避免找到不相關文章的方法處理找到過多或過少相關文章的可能對策Palmer指分區組合檢索和引用文獻滾雪球法Pao指布林邏輯、引用文獻及機率檢索策略檢索策略(searchstrategy)針對一檢索問題之通盤考量或全面性之規劃如分區組合檢索法、引用文獻滾雪球法….等檢索技巧(searchheuristics)為完成特定目的所採取的行動3Briefsearch簡易檢索最常見的檢索方式快速簡單fastandinexpensive但常是低recall,低precision適用主題明確想瞭解資料庫製作者所使用的敘述語和索引詞彙確認書目資料已知書名、作者等4BuildingBlocksSearch分區組合檢索法亦有人稱為“blockbuilding”或“buildingblock”檢索方式將索引問題分解成數個主題層面(facets)確定主題層面間的關係通常facets間的關係為”AND”,出現”OR”或”NOT”的情況較少找出可代表各主題層面的檢索詞彙利用布林邏輯”OR”做聯集,以求完整性使用率最高,早期參考晤談表格常依此設計5BuildingBlocksSearchStrategy--1/41.Conductreferenceinterviews2.FormulatesearchobjectivesHighrecallHighprecisionModeratelevelsofrecallandprecision3.Selectdatabase(s)andsearchsystem4.Identifymajorconceptsorfacetsandtheirlogicalrelationshipswithoneanother6BuildingBlocksSearchStrategy--2/45.IdentifysearchstringsthatrepresenttheconceptsWordsFull-textphrasesPiecesofwordsDescriptorsIdentifiersCodesNon-semanticbibliographiccharacteristics非主題相關的欄位,如資料類型、語言、年代等包括同義詞、類同義詞、狹義詞、相關詞fieldstobesearched7BuildingBlocksSearchStrategy--3/46.Foreachdistinctfacetofthesearch,asetofpostingswillbecreatedforeachsearchstringwithinthatfacet.ThesetsarethencombinedintoasinglesetrepresentingthatfacetusingBooleanOR7.Followingsetp#6,thefacetssetsthemselveswillbecombinedwithBooleanANDandNOT8.Planalternatives8BuildingBlocksSearchStrategy--4/49.Formulatetheinitialstatementsofthesearchinthecommandlanguageofthesystem10.Logonandputthesearchtothesystem11.Evaluatetheintermediateresults12.IterateUsetheinteractivefeaturesofthesystemtocarryoutsearchheuristicstactics,maneuvers,strategies,tricks,devices,approaches,totrytoimprovesearchresults9BuildingblocksapproachFacetAFacetBTermA1ORTermA2OR………..TermApTermB1ORTermB2OR………..TermBqFactCTermC1ORTermC2OR………..TermCrAnswerSetBooleancombinationoffacets(AND,OR,NOT)10BuildingBlockssearchsampleFacet1Facet2Facet3Facet4Facet5RISKMEASUREMENTRISKAVERSIONBEHAVIORALDECISIONTHEORYINSURANCEriskmeasurementassessmentchoicedecisionoutcomeriskaversionriskavoidanceriskneutralityriskpronerisktendencybehavioraldecisiontheoryinsurancecontractbankfinancestockinvestmentadvertisementMeasurementofRiskTendencies(lookingforhighrecall)BooleanCombination:((RISKANDMEASUREMENT)ORRISKAVERSIONORBEHAVIORALDECISIONTHEORY)NOTINSURANCE11檢討結果重新檢索想增加recall時findadditionalconceptsorsearchtermstoaddtooneormorefacetsdeleteafacet想增加precision時deletesomeofthemorebroaderormoreambiguoustermsinthefacetsaddanadditionalfacettobeintersectedwiththeothers12Successivefacetstrategies主題層面連續檢索法—1/3其他名稱fewestpostingsfirst(最少筆數優先)mostspecificconceptfirst(最精確概念優先)successivefractions(非以主題層面開始的連續檢索)分區v.s.主題層面分區檢索法使用所有主題層面主題層面連續檢索法設法動用最少的主題層面決定檢索問題的主題層面後,需確定其優先順序,視結果決定是否要繼續進行檢索13Successivefacetstrategies--2/3FirstFacetSecondFacet(optional)OtherFacet(optional)OtherFacetSolutionSet(optional)ANDAND例1:“membersandactivitiesof4-Hclubs”例2:”theemotional,physical,andintellectualcharacteristicsofchildrenwhohavestudiedviolinwiththeSuzukimethod”14Successivefacetstrategies--3/3適用情況當所有的主題層面以布林運算元結合,很可能產生零筆資料時當檢索問題中有一至兩個主題層面涵義相當模糊時當檢索問題具備其他非主題之檢索條件,如資料類型、語言、或出版年代等,可將此非主題檢索條件視為第一個檢索概念時當檢索者寧願忍受誤引而不願失去相關文章時當加入其他主題層面所花費的時間和金錢,可能會超越直接列印檢索結果時當相關文獻過少,檢索者願意檢視一些相關度較低的文章時15PairwiseFacets主題層面配對法—1/3將主題層面兩兩配對並取其交集,而後再聯集之適用情形所有主題層面都同樣重要主題層面之精確性或模糊性相差不大將所有主題層面結合會導致零筆資料注意:主題層面過多時,盡量以3-4個為執行交集的基本單位,以免混淆16PairwiseFacets—2/3分區組合檢索主題層面配對檢索AANDBANDC(AANDB)OR(AANDC)OR(BANDC)17PairwiseFacets—3/3Facet#1Facet#2Facet#3SolutionSetBSolutionSetASample:Adoctoralstudentwantsahighrecallbibliographypreparedontherelationshipbetweenfacialmusculatureandthephysiological(autonomic)respondingofemotions,e.g.,fear.SolutionSetCFINALSOLUTIONSET:AORBORCANDANDAND18CitationPearlGrowing引用文獻滾雪球法以highprecision為目的由100%precision(相關的文章),反推追求recall不斷從已知相關的文獻中,獲取檢索所需的descriptors、identifiers、words,重新進行檢索適用情形資料庫無索引典或詞彙集新興學科常需重複多次檢索,不適於初學者19OtherfacetstrategiesMultipleBriefsearch利用不同的database,盡量取得highrecallInteractiveScanningmosttime-consumingandinteractive如使用classificationcodes,naturallanguageImpliedConcepts掌握隱含性概念,視資料庫之主題性質,選用不同詞彙例:possiblehealthhazardsfromfoodscookedusingmicrowaveovens20Citationindexingstrategies利用引用(citing)與被引用(cited)文獻之間的關係,建構檢索策略Offerhighlyinterdisciplinaryandmultidisciplinaryapproachestoonlinesearching檢索策略Citedpublication、CitedAuthor、CocitedAuthors國科會人文學研究中心人文學引用文獻資料庫(THCI)Non-subjectsearchingDocumenttype、yearofpublication、language、author、corporatesourcedoublelimitingFactsearchingSearchforaknownitemMultipledatabasesearching注意收錄欄位和控制語言用法22檢索技巧(Heuristics)LanguageHeuristicsCommandLanguage,DatabaseandFileStructureHeuristicsRecallandPrecisionHeuristicsHeuristicsforIncreasingRecallHeuristicsforIncreasingPrecisionPersonalHeuristics23LanguageHeuristics—1/2當有下列情形,應使用自然語言檢索OneormoreoftheconceptsofinterestinvolvesasubtlenuanceofmeaningOneormoreoftheconceptsofinterestishigh

1 / 36
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功