ExploratoryAnalysisofConceptandDocumentSpaceswithConnectionistNetworksDieterMerklInstituteofSoftwareTechnology,ViennaUniversityofTechnologyResselgasse3/188,A-1040Vienna,AustriaE-mail:dieter@ifs.tuwien.ac.atErichSchweighoferInstituteofPublicInternationalLaw,UniversityofViennaUniversitatsstrae2,A-1090Vienna,AustriaE-mail:Erich.Schweighofer@univie.ac.atWernerWiniwarterDepartmentofInformationScience,KyotoUniversitySakyo,Kyoto,606-01JapanE-mail:ww@kuis.kyoto-u.ac.jpAbstract.Exploratoryanalysisisanareaofincreasinginterestinthecomputa-tionallinguisticsarena.Pragmaticallyspeaking,exploratoryanalysismaybepara-phrasedasnaturallanguageprocessingbymeansofanalyzinglargecorporaoftext.Concerningtheanalysis,appropriatemeansarestatistics,ontheonehand,andarticialneuralnetworks,ontheotherhand.Asachallengingapplicationareaforexploratoryanalysisoftextcorporawemaycertainlyidentifytextdatabases,beitinformationretrievalorinformationlteringsystems.Withthispaperwepresentrecentndingsofexploratoryanalysisbasedonbothstatisticalandneuralmodelsappliedtolegaltextcorpora.Concerningthearticialneuralnetworks,werelyonamodeladheringtotheunsupervisedlearningparadigm.Thischoiceappearsnat-urallywhentakingintoaccountthespecicpropertiesoflargetextcorporawhereoneisfacedwiththefactthatinput-output-mappingsasrequiredbysupervisedlearningmodelscannotbeprovidedbeforehandtoasatisfyingextent.Thisisduetothefactofthehighlychangingcontentsoftextarchives.Inanutshell,articialneuralnetworkscountfortheirhighlyrobustbehaviorregardingtheparametersformodeloptimization.Inparticular,wefoundstatisticalclassicationtechniquesmuchmoresusceptibletominorparametervariationsthanunsupervisedarticialneuralnetworks.Inthispaperwedescribetwodierentlinesofresearchinexploratoryanalysis.First,weusetheclassicationmethodsforconceptanalysis.Thegeneralgoalistouncoverdierentmeaningsofoneandthesamenaturallanguageconcept.Ataskthat,obviously,isofspecicimportanceduringthecreationofthesauri.Asaconvenientenvironmenttopresenttheresultsweselectedthelegaltermof\neu-trality,whichisaperfectrepresentativeofaconcepthavinganumberofhighlydivergentmeanings.Second,wedescribetheclassicationmethodsinthesettingofdocumentclassication.Theultimategoalinsuchanapplicationistouncoversemanticsimilaritiesofvarioustextdocumentsinordertoincreasetheeciencyofaninformationretrievalsystem.Inthissense,documentclassicationhasitsxedpositionininformationretrievalresearchfromtheverybeginning.Nowadaysrenewedmassiveinterestindocumentclassicationmaybewitnessedduetotheappearanceoflarge-scaledigitallibraries.TheauthorsarealiatedtotheResearchCenterforComputersandLawattheInstituteofPublicInternationalLaw.2D.MERKL,E.SCHWEIGHOFER,ANDW.WINIWARTERKeywords:LegalInformationRetrieval,ArticialNeuralNetworks,UnsupervisedLearning,ExploratoryDataAnalysis,NaturalLanguageProcessing,VectorSpaceModel1.IntroductionTheinformationcrisisinlaw[Simitis1970]wastheimpetusforthedevelopmentoflegalinformationretrievalsystems.Asaresultofarsthugeeortanumberofinformationretrievalsystemshavebeendevelopedwithsucientcoverageconcerningtheunderlyingtextcor-pora.Lawyers,however,needmuchmorethanjustadocumentationofthevariouslegalactsoftherelevantjurisdiction.Thematerial,rather,hastobeorganizedinasystematicmannerintheformofalegalcom-mentary.Inthisregard,wehavetoconfessthatthelevelofqualityforusefulsystemsissetratherhighwithrespecttothemorethan2500yearsofintellectualexperience.Thisexperience,obviously,representsthebaselineagainstwhichimprovementshavetobemeasured.Themajorquestionforsuchsystemsistondanecientwaytoformalizelegalknowledge.Anumberofdierentapproachescanbedistinguished.Someofthemoreinuentialonesareoutlinedbelow.Therstsolutionwastoutilizevariousdocumenttypesandeldsrepresentingsemanticknowledgeasatractablemeanstorepresentdeepstructureinlegalinformationretrievalsystems[Schweighofer1995].Twomajorproblemsremainunsolved,namelyrst,theusersarenotexperi-encedenoughtodealwiththesedicultbutecientsearchalgorithms,andsecond,thedocumentshavetobeindexedmanuallythusmak-ingthedevelopmenthighlytime-consuming.Similarproblemsoccurwhenadheringtoaknowledge-basedapproach[Bing1987,CrossandBessonet1985]towardslegalinformationrepresentation.Hence,atime-consumingmanualconstructionoftheknowledge-baseisindependentoftheactualmechanismforencodingthesemanticsoflegalconcepts,thusirrespectiveofaparticularknowledgerepresentationtechniquebeitsemanticnetworks[Paice1991],conceptualgraphs[Dick1991],con-ceptframes[Hafner1981],diagnosticexpertsystems[Merkletal.1992],object-orientedprogramming[Mitaletal.1991]orcase-basedreason-ing[Ashley1990].Theothersideofthecoin,obviously,ismarkedbyimprovedcapabilitiesoftheoverallsystemwithrespecttoretrievaleciency.Neuralnetworksfoundsomeattentionforencapsulationoflegalknowledge.Thismightbeduetothefactofonlylimitedsuccessofknowledge-basedapproaches.Twomainstreamsofresearchmaybeobserved.First,neuralnetworksaretrainedtorepresentvagueconceptsailj.tex;21/04/1997;15:28;nov.;p.2EXPLORATORYANALYSISWITHCONNECTIONISTNETWORKS3accordingtosome
本文标题:Exploratory analysis of concept an document spaces
链接地址:https://www.777doc.com/doc-6495100 .html