GraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnManaging and Mining Knowledge Graphs – Challenges and Opportuni7es肖仰华 复旦大学 GDM@FUDAN hp://gdm.fudan.edu.cnGraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnCopyrights• Fabian Suchanek & Gerhard Weikum, Knowledge Harves7ng in the Big Data Era, SIGMOD 2013 Tutorials. • Bin Shao, Haixun Wang, Yanghua Xiao, Managing and Mining Large Graphs: Systems and Implementa7ons, SIGMOD 2012 Tutorials. • 肖仰华,面向知识图谱的数据融合与管理,NSFC重点基金“网络数据融合理论与技术”研讨会,苏州,2013-‐8-‐14. • 肖仰华,中文知识图谱,上海,Italk 沙龙,2012 GraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnOutline• Preliminaries • Opportuni7es • Managing big knowledge graph • Building big knowledge graphGraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnWhat is knowledge graph?knowledge graph contains en77es/concepts as ver7ces and seman7c rela7onships as edgesChinese Knowledge Graph4GraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnWhat makes knowledge graph different?• Ontology – Domain dependent – Small scale – Edited by humans • Seman7c network – Focus on concepts instead of en77es – Low coverage• Knowledge graph – Large scale – Cover both en77es and concepts – Cover different seman7c rela7onships – Automa7cally harvested from Web or other large scale corpusChinese Knowledge Graph5GraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnhp://richard.cyganiak.de/2007/10/lod/lod-‐datasets_2011-‐09-‐19_colored.png Web of Data: RDF, Tables, Microdata 60 Bio. SPO triples (RDF) and growing Cyc TextRunner/ ReVerb WikiTaxonomy/ WikiNet SUMO ConceptNet 5 BabelNet ReadTheWeb 6GraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnhp://richard.cyganiak.de/2007/10/lod/lod-‐datasets_2011-‐09-‐19_colored.png Web of Data: RDF, Tables, Microdata 60 Bio. SPO triples (RDF) and growing • 10M en77es in 350K classes • 120M facts for 100 rela7ons • 100 languages • 95% accuracy • 4M en77es in 250 classes • 500M facts for 6000 proper7es • live updates • 25M en77es in 2000 topics • 100M facts for 4000 proper7es • powers Google knowledge graph 7Ennio_MorriconetypecomposerEnnio_MorriconetypeGrammyAwardWinnercomposersubclassOfmusicianEnnio_MorriconebornInRomeRomelocatedInItalyEnnio_MorriconecreatedEcstasy_of_GoldEnnio_MorriconewroteMusicForThe_Good,_the_Bad_,and_the_UglySergio_LeonedirectedThe_Good,_the_Bad_,and_the_UglyGraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnSome Publicly Available Knowledge Bases YAGO: yago-‐knowledge.org Dbpedia: dbpedia.org Freebase: freebase.com En7tycube: research.microsoi.com/en-‐us/projects/en7tycube/ NELL: rtw.ml.cmu.edu DeepDive: research.cs.wisc.edu/hazy/demos/deepdive/index.php/Steve_Irwin Probase: research.microsoi.com/en-‐us/projects/probase/ KnowItAll / ReVerb: openie.cs.washington.edu reverb.cs.washington.edu PATTY: ‐inf.mpg.de/yago-‐naga/pay/ BabelNet: lcl.uniroma1.it/babelnet WikiNet: ‐its.org/english/research/nlp/download/wikinet.php ConceptNet: conceptnet5.media.mit.edu WordNet: wordnet.princeton.edu Linked Open Data: linkeddata.org 8GraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnGoogle Knowledge Graph• Source • CIA Factbook • Freebase • Wiki • Current status • 500 million en77es and more than 3.5 billion factsChinese Knowledge Graph9GraphDataManagementLabatFudanUniversityGDM@FUDAN云海会,上海,SAP研究院,2012-11-20Email:shawyh@fudan.edu.cnCapture concepts in human mind Represent them in a computable form Transform them to machines Machines have beer understanding of human world More than 2.7 million concepts automa7cally harnessed from 1.68 billion documents Computa7on/Reasoning enabled by scoring: Consensus: e.g., is there a company called Apple? Typicality: e.g. how likely you think of Apple when you think about companies? Ambiguity: e.g., does the word Apple, sans any context, represent Apple the com