DataDrivenApproachesforLarge-scaleKnowledgeGraphConstructionYanghuaXiaoFudanUniversityKowledgeWorksatFudan(kw.fudan.edu.cn)Knowledge Graph•Knowledge graph is a large scale semantic network consisting of entities/concepts as well as the semantic relationships among them•Higher coverage over entities and concept•Richer semantic relationships•Usually organized as RDF•Quality insurance by Crowdsourcing•Why Knowledge Graphs?•Understanding the semantic of text needs background knowledge•A robot brain needs knowledge base to understand the world•Yago,WordNet, FreeBase, Probase, NELL, CYC, DBPedia….DataDrivenvsHandCrafted•Manually constructed knowledgegraph•Examples: WordNet, Cyc•Size: Small(Huge human cost)•Quality: Almost perfect(Each relation is checked by expects)•Auto-constructed knowledgegraph•Automatically extracted from huge web corpus•Examples: Probase、WikiTaxonomy, etc•Size: Huge(From huge corpus)•Quality: Good(The accuracy can’t reach 100%)•Because of the huge size, there are many wrong factsPipelineofKGconstructionExtraction•End-to-end•DomainspecificCompletion•Collaborativefilteringbasedcompletion•TransitivityinferencebasedcompletionCorrection•GraphstructurebasedcorrectionCost:CostlyHumanEffortsQuality:MissingdataQuality:WrongdataPipelineofKGconstructionExtraction•End-to-end•DomainspecificCompletion•Collaborativefilteringbasedcompletion•TransitivityinferencebasedcompletionCorrection•GraphstructurebasedcorrectionCost:CostlyHumanEffortsQuality:MissingdataQuality:WrongdataJiaqingLiang,YanghuaXiao,eta,Probase+:InferringMissingLinksinConceptualTaxonomies,tobepublishedinTKDE2017Probase•Aweb-scale taxonomy derived from web pagesbyHearst linguistic patterns•“…famous basketball players such as Michael Jordan …” •domestic animalssuch as catsand dogs... •Chinais a developing country. •Lifeis a box of chocolate. •10M concepts, and 16M isArelationsHearst patternNP such as NP, NP, ..., and|orNP such NP as NP,* or|andNPNP, NP*, or other NPNP, NP*, and other NP NP, including NP,* or | and NP NP, especially NP,* or|andNP Missing isArelationshipsinProbase•“car” and “automobile” are synonyms •They should share hypernyms•“automobile” should beA“wheelbase vehicle”•MissingisArelaitonhurtstheunderstandingtheconceptsofentities•IsLincolnzephyracar?Solutionidea: CFbasedMissing isAinference•User-based collaborative filtering!•Hypernyms ---Items•Concepts ---Users•Synonyms or Siblings ---Similar users•Concepts with similar meanings tend to share hypernyms/hyponyms in an isA taxonomy•To find missing hypernyms for a concept c•First find c’s synonyms and siblings•Then we transport their hypernyms to cIdea: if most similar terms of c have h as the hypernym, c is likely to have the hypernym h. Problemstobesolved•Effectiveness•Sparsity:Howtodeignaneffectivesimilaritymetric?•Noisy-ormodelamplifyingtheweaksignals•Weightaware:HowtoestimateafrequencyforthenewisArelation?•Buildaregressionmodel•Diversity:Howtoselectthefinalhypernyms?•Dynamicallytuningkforthetop-kselection•Efficiency•Howtoreducethequadraticcomplexityofpairwisesimilaritycomputation?•Upper-boundpruningResults•Recover5.1Mmissingedges,withprecision87%,recall80%.•Probaseplushasaccuracy91%Case studyPrecisionandrecallPipelineofKGconstructionExtraction•End-to-end•DomainspecificCompletion•Collaborativefilteringbasedcompletion•TransitivityinferencebasedcompletionCorrection•GraphstructurebasedcorrectionCost:CostlyHumanEffortsQuality:MissingdataQuality:WrongdataJiaqingLiang,YiZhang,YanghuaXiao*,HaixunWang,WeiWangandPinpinZhu,OntheTransitivityofHypernym-hyponymRelationsinData-DrivenLexicalTaxonomies,(AAAI2017)Motivation•We can use transitivity to find many missing isA relations •Example 1•But it is not trivial, there are wrong cases•Example 2 & 3•If we can determineinwhich cases transitivityhold, we can generate many missing isA relations •There are some examples, a isA care found missing isA relationshuman-craftedtaxonomies,transitivityinalexicaltaxon-omyistakenforgranted,thatis,givenhyponym(A,B)andhyponym(B,C),weknowhyponym(A,C)(Sang2007),asshowninExample1.Transitivityisthusoneofthecorner-stonesinknowledge-basedinferencing,andmanyapplica-tionsrelyontransitivity(e.g.,findingallthesuperconceptsofaninstance).Example1IsEinsteinascientist?hyponym(einstein,physicist)hyponym(physicist,scientist))hyponym(einstein,scientist)Unfortunately,transitivitydoesnotalwaysholdindata-drivenlexicaltaxonomies.Letusconsiderthefollowingtwoexamples:Example2IsEinsteinaprofession?hyponym(einstein,scientist)hyponym(scientist,profession);hyponym(einstein,profession)Example3Isacarseatapieceoffurniture?hyponym(carseat,chair)hyponym(chair,furniture);hyponym(carseat,furniture)ItisobviousthatEinsteinisnotaprofession.However,inadata-drivenlexicaltaxonomysuchasProbase,wehavestrongevidencethathyponym(einstein,scientist)andhyponym(scientist,profession).Iftransitivityholds,wewilldrawaconclusionthatconflictswithcommonsense.Asforcarseatandfurniture,wearetrappedinasimilarsituation.Thus,itisclearthattransitivitydoesnotalwaysholdind