ExactandApproximateAggregationinConstraintQueryLanguagesMichaelBenediktBellLaboratories263ShumanBlvdNaperville,IL60566E-mail:benedikt@research.bell-labs.comLeonidLibkinBellLaboratories600MountainAvenueMurrayHill,NJ07974E-mail:libkin@bell-labs.comAbstractWeinvestigatetheproblemofhowtoextendconstraintquerylanguageswithaggregateoper-ators.Wedealwithstandardrelationalaggregation,andalsowithaggregatesspeci ctospatialdata,suchasvolume.Westudyseveralapproaches,includingtheadditionofanewclassofap-proximateaggregateoperatorswhichallowanerrortoleranceinthecomputation.WeshowhowtechniquesbasedonVC-dimensioncanbeusedtogivelanguageswithapproximationoperators,butalsoshowthattheselanguageshaveanumberofshortcomings.Wethengiveasetofresultsshowingthatitisimpossibletogetconstraint-basedlanguagesthatadmitde nableaggregationoperators,bothforexactoperatorsandforapproximateones.Theseresultsarequiterobust,inthattheyshowthatclosureunderaggregationisproblematicevenwhentheclassoffunctionspermittedinconstraintsisexpanded.Thismotivatesadi erentapproachtotheaggregationproblem.WeintroducealanguageFO+Poly+Sum,whichpermitsstandarddiscreteaggregationoperatorstobeappliedtotheoutputsofrange-restrictedconstraintqueries.Weshowthatthislanguagehasanumberofattractiveclosureandexpressivityproperties,andthatitcancomputevolumesoflinear-constraintdatabases.1IntroductionNewapplicationsofdatabasetechnology,suchasGeographicalInformationSystems,havespurredaconsiderableamountofresearchintogeneralizationsofthestandardrelationalmodeltodealwiththemanipulationofgeometricorspatialdata.Onecommonapproachtomodelingspatialdatabaseistoconsiderinputdatabasesasgivenbyasetofwell-behavedrelationsineuclideanspace{forexample,byasetofsemi-linearorsemi-algebraicsets.Thereareanumberofproposedquerylanguagesthatextendclassicalrelationalalgebratothissetting,languagesthatallowtheuseofvariousgeometricoperationsinmanipulatingspatialdatabases.Oneofthemostwell-developedmodelsforspatialqueriesistheconstraintdatabasemodel[23].Inthismodel,spatialdatabasesarerepresentedassetsoflinearorpolynomialconstraints.Databasesarequeriedusingstandardrelationalcalculuswithlinear(resp.polynomial)inequalitiesasselectioncriteria,see[4,5,6,19,20,32,38].Theselanguages,denotedbyFO+LinandFO+Poly,havebecomethedominantonesintheconstraintdatabaseliterature.Theyhaveaveryimportantclosureproperty:theapplicationofaFO+Linquerytoalinearconstraintsetyieldsanewsetoflinearconstraints;similarlyFO+Polyqueriesonsetsde nablewithpolynomialconstraintsproducesetsthatcanstillbede nedwithpolynomialconstraints.1ConstraintQueryLanguages,then,giveanaturalanalogofrelationalcalculusinthegeometriccontext.Acrucialquestion,though,concernshowtoextendstandardaggregationconstructsfromtherelationalmodeltothegeometricsetting.Thisquestionhastwocomponents.First,wewouldlikeourlanguagestobeabletoapplystandardSQLoperatorssuchasTOTALandAVGtospatialqueries,whenevertheseoperatorsmakesense.Sincetheoutputofqueriesinconstraintquerylanguages(andinotherspatialquerylanguages)maybemerely nitelyrepresentable(thatis,representablebysome nitemeans,e.g.,a nitesetofconstraints)andnot nite,theaggregationoperatorscannotbeallowedtobeappliedtoanyconstraintqueryoutput.Oneproblemthen,istodesignalanguagethatallowsthesafeapplicationofclassicalaggregates.Thesecondcomponentofthe‘aggregationquestion’concernsaggregationnotionsthatarespeci ctothespatialdatabases.Mostcommonly,givenadatabaseandtheoutputofaqueryoverit,onewishestoformnewqueriesaboutthevolumeofthisoutput.OnemayalsoextendstandardaggregatessuchasAVG,andaskfortheaveragevalueofapolynomialoveraspatialobject.SuchaggregatesarisebothfrompracticalconcernsofGIS,andalsoasthenaturalcontinuousanalogsofclassicalaggregationqueries.Thus,wewouldliketoextendconstraintquerylanguagestoincorporatetheabilitytocalculatevolumesandotheraggregatesarisinginthespatialsetting.Inattemptingtoaddaggregationtoconstraintquerylanguages,oneimmediatelyencounterssomedauntingobstacles.Whilestandardconstraintdatabasesareclosedunder rst-orderoperationssuchasjoinandprojection,theyareclearlynotclosedundertakingofvolumes.Thisfactiswell-knownintheliterature[23,27,12],andstemsfromthefactthatneitherthesemi-linearnorsemi-algebraicsetsareclosedunderintegrals.Totakeanexamplefromthesemi-algebraicsetting,aqueryaskingforthevolumeofinitialslicesoftheepigraphof1=xoutputsthegraphofthelnfunction,whileiteratingvolumequeriesinthisfashionwouldgiveasoutputtranscendentalfunctionsthatarenotevenexpressibleusing eldoperations,logarithmsandexponents.Thus,onecannothopetoaddageneralvolumeoperatortoexisting rst-orderconstraintquerylanguagessuchasFO+Polyandgetaclosedlanguagewhilestillremainingwithinthedomainofpolynomialconstraintdatabases.Thereareseveralapproachestothevolumeproblemmentionedabove.First,onecouldweakentherequirementthatvolumesbecomputedexactlyandinsteadaimonlytocomputeapproximatevolumes.Thusaquerymighthaveatoleranceassociatedwitheachinstanceofavolumeoperator,withoutputrequiredonlytobecorrectwithinthegiventolerance.Thereareanumberofpracticalandtheoreticalmotivationsfo