IntroductiontoKnowledgeGraphs肖仰华复旦大学知识工场实验室shawyh@fudan.edu.cn2017-7-13Outline源起内涵价值分类源起历史脉络诞生背景核心优势规模巨大语义丰富质量精良结构友好历史脉络人工智能知识工程知识表示知识图谱AI(ArtificialIntelligence):Think,act,humanlyorrationallyTheexcitingnewefforttomakecomputersthink…machineswithminds,inthefullandliteralsense.(Haugeland,1985)AI…isconcernedwithintelligentbehaviorinartifacts.(Nilsson,1998)KE(Knowledgeengineering)isanengineeringdisciplinethatinvolvesintegratingknowledgeintocomputersystemsinordertosolvecomplexproblemsnormallyrequiringahighlevelofhumanexpertiseKR(Knowledgerepresentation)isdedicatedtorepresentinginformationabouttheworldinaformthatacomputersystemcanutilizetosolvecomplextaskssuchasdiagnosingamedicalconditionorhavingadialoginanaturallanguage.KG(Knowledgegraph)isalargescalesemanticnetworkconsistingofentities/conceptsaswellasthesemanticrelationshipsamongthem•2012年5月,Goolge正式发表自己的知识图谱•搜索核心需求:让搜索通往答案•无法理解搜索关键词•无法精准回答•根本问题•缺乏大规模背景知识•传统知识表示难以满足需求诞生背景•HighercoverageoverentitiesandconceptsKG优势1:largescaleKGs#ofEntities/Concepts#ofRelationsYAGO10Million120MillionDBpedia28Million9.5BillionProbase2.7Million70BillionBabelNet14Million5BillionCN-DBpedia17Million200Million•HighercoverageovernumeroussemanticrelationshipsKG优势2:semanticallyrichKGs#ofRelationsDBpedia1,650YAGO114YAGO374CN-DBpedia100ThousandsKG优势3:highquality•Highquality•Bigdata:Crossvalidationbymultiplesources•Crowdsourcing:qualityguarantee[Yin,etc.,TruthDiscoverywithMultipleConflictingInformationProvidersontheWeb,kdd07]KG优势4:friendlystructure•Structuredorganization•ByRDF•Bygraph时间知识图谱数量2017-03-161,1392014-08-305702011-09-192952010-09-222032009-07-14952008-09-18452007-11-07282007-05-0112越来越多的知识图谱应运而生LinkingOpenDataclouddiagram2017,byAndrejsAbele,JohnP.McCrae,PaulBuitelaar,AnjaJentzschandRichardCyganiak.,WordNet,FreeBase,Probase,NELL,CYC,DBpedia….内涵KG组成点实体概念值边KG的表述逻辑表示物理表示KG组成-Node-Entity•Entity/Objects/Instances•Wikipedia:Anentityissomethingthatexistsasitself,asasubjectorasanobject,actuallyorpotentially,concretelyorabstractly,physicallyornot.•黑格尔《小逻辑》:能够独立存在的,作为一切属性的基础和万物本原的东西KG组成-Node-Concept•Concept•Inmetaphysics,andespeciallyontology,aconceptisafundamentalcategoryofexistence.•(mental)representationsofcategories•Category•Groupsofentitieswhichhavesomethingincommon;•Type/class•WIKITIONARY:Agroupingbasedonsharedcharacteristics;aclass.CATEGORIZATION:1、theprocessofformationofcategories;2、theprocessofidentifyingXasamemberofaparticularcategoryY;DBpediaTypesProbaseCategoriesKG组成-Node-Value•Date•特朗普出生日期1946年6月14日•String•特朗普简介“唐纳德·特朗普(DonaldTrump),第45任美国总统,1946年6月14日生于纽约,美国共和党籍政治家”•Numeric•特朗普年龄71KG组成-边•Relation•侧重实体(individual)之间的关系•Examples:•Sitting-On:Anapplesittingonatable•Taller-than:WashingtonMonumentistallerthantheWhiteHouse•Property/Attribute/Quality•Acharacteristic/qualitythatdescribesanobject•Examples:•size,color,weight,composition,andsoforth,ofanobjectModelsofKnowledgeGraphEntities•Concepts•Instance•ValueRelationships•IsA•Co-occurrence•Synonyms•Others….KnowledgeGraph•Acollectionsofentitiesandrelationshipbetweenthem•Entity•Relationships•Euler•SevenBridgesofKönigsberg17Whatisagraph?EntitiesRelationshipsGraph•Weightedgraphs•Directedgraphs•Probalisticgraphs•Evolvinggraphs18Modelsofgraphs•Vertices/Nodes•Edges/arcs•Neighborsofavertex•Degreeofavertex•Subgraph•Shortestpath•Examplegraph19Notations•Adjacentlist•Spaceefficientonsparsegraph•Matrix20RepresentationofagraphRDF:ResourceDescriptionFramework•Aframework(notalanguage)fordescribingresources,recommendedbyW3C•Facilitatingreadingandcorrectuseofinformationbycomputers,notnecessarilybypeople•Resource,Property,PropertyValue=Subject,Predicate,Objectofastatement•RDFidentifiesresourceswithURIs•RDFoffersonlybinarypredicates.•ThinkofthemasP(x,y)wherePistherelationshipbetweentheobjectsxandy.•Fromtheexample,•X=•Y=JanEgilRefsnes•P=authorRDFrepresentations=1.0?rdf:RDFxmlns:rdf=:cd=:Descriptionrdf:about=:artistBobDylan/cd:artistcd:countryUSA/cd:countrycd:companyColumbia/cd:companycd:price10.90/cd:pricecd:year1985/cd:year/rdf:Descriptionrdf:Descriptionrdf:about=:artistBonnieTyler/cd:artistcd:countryUK/cd:countrycd:companyCBSRecords/cd:companycd:price9.90/cd:pricecd:year1988/cd:year/rdf:Description…/rdf:RDF.RootelementofRDFdocumentsSourceofnamespaceforelementswithrdfprefixSourceofnamespaceforelementswithcdprefixDescriptionelementdescribestheresourceidentifiedbytherdf:aboutattribute.Cd:countryetcarepropertiesoftheresource.价值理论意义KnowledgeispowerinAIFromStringstothingsAccuratebigdataanalyticEnablesmartrobotbrainKnowledge-poweredtasks应用价值搜索推荐问答医疗风控KnowledgeispowerinAI•AIsystem=knowledge+reasoning•EdwardFeigenbaum:fatherofexpertsystems“•Knowledgeispower,andthecomputerisanamplifierofthatpower.Wearenowatthedawnofanewcomputerrevolution...Kn