Tailoring a broad coverage grammar for the analysi osf

Tailoring A Broad Coverage Grammar For The Analysi Osf-PDF Download

  • Date:02 Feb 2020
  • Views:41
  • Downloads:0
  • Pages:11
  • Size:771.90 KB

Share Pdf : Tailoring A Broad Coverage Grammar For The Analysi Osf

Download and Preview : Tailoring A Broad Coverage Grammar For The Analysi Osf


Report CopyRight/DMCA Form For : Tailoring A Broad Coverage Grammar For The Analysi Osf


Transcription:

266 EURALEX 92 PROCEEDINGS, for each dictionary definition There are two main reasons for parsing the definition. First it is possible to abstract away from variations in the surface realization of the same. pattern which exist regardless of the regularity typical in a dictionary Second the results. are expected to be more reliable because we can specify a given level of embedding at. which the defining formula is to be found rather than accepting the defining formula no. matter where it occurs and because the real extraction process consists of identifying the. relevant complements of the defining formulae and so accessing the structural informa. tion again yields more reliable results see Montemagni Vanderwende for a discussion. of string patterns versus structural patterns During the second stage a pattern match. ing mechanism maps structural patterns onto the syntactic analysis computed at the. previous stage thereby deriving and makingexplicit the semantic knowledge implicitly. stored in any standard printed dictionary The general framework we are using and. tailoring for our purposes was originally developed by Jensen and Binot for acquiring. the semantic information necessary for the resolution of prepositional phrase attachment. ambiguities 0ensen Binot 1987 Others have also accepted the use of syntactic ana. lyses and structural patterns for some time now Klavans 1990 Ravin 1990 and Vander. wende 1990 all of which use the PLNLP English Parser to provide the structural infor. This paper focuses on the first stage of the extraction process computing a syntactic. analysis for each dictionary definition This stage is crucial it creates the data structures. on which to operate during further processing stages and thereby determines the quan. tity and the quality of the information that can be extracted We first experimented with. the syntactic analyses as they are computed by a broad coverage Italian grammar With. this grammar it is possible to produce on average one parse per definition In addition. by setting a switch it is possible to force a parse of any desired category ff4P VP etc. and even if no parse is available for the entire string pieces can be assembled or fitted. together so that there will always be some analysis for any given string Although the. output of the broad coverage grammar already was adequate we chose to add a post. processor that would modify and improve the parses based on the peculiarities of the. dictionary text type The post processor thus captures the differences between general. text and dictionary text a contrastive study that would not be possible if a dictionary. specific parser were constructed We found that the post processor is a very small com. ponent as compared to thebroad overage grammar reflecting our preliminary observa. tions that the constructions found in dictionaries are as complex as and not very different. from those of general text Making use of a broad coverage grammar followed by a. small post processor provides a robust parsing tool that is both efficient with respect to. the reusability of components and interesting for contrastive reasons. 2 Syntacticparsing, The broad coverage Italian grammar that was used for this study was written following. the general strategy called the relaxed approach aimed at accepting unrestricted input. text 0ensen 1986 1988 1989 Sentences are analyzed according to syntactic information. formalized in augmented phrase structure rules with a bottom up parallel parsing algo. rithm producing an attribute value analysis structure that can be displayed as a parse. M o n t e m a g n i Tailoring a b r o a d c o v e r a g e g r a m m a r 267. tree Heidorn 1975 The lexicon which supports this analysis contains very limited. information parts of speech morphology and essential word class features A gram. mar constructed in this way computes preliminary syntactic sketches that are syntacti. cally consistent but not necessarily semantically valid The analyses contain syntactic. and whenever possible functional information but no semantic or other information. beyond the functional level In Italian in some cases even the functional roles cannot be. assigned on the basis of purely syntactic information but only after background seman. tic a n d o r contextual information has been acquired and evaluated within the initial. The analysis of a sentence using only syntactic information may contain many ambi. guities We just mentioned the ambiguity of assigning functional roles Attaching modi. fiers to their appropriate heads is the other main source of ambiguity The strategy. adopted for dealing with both kinds of ambiguity is that of packing the different syntac. tic descriptions into the same structure whenever possible For attachment ambiguity. the solution is to attach modifiers to the closest possible head and to mark alternative. attachment sites so that they can be tracked down for later semantic processing For. functional ambiguity we code the possible interpretations within the same structure in. order to have them ready for further processing stages This is the reason why we. usually think of the resulting analysis as a syntactic sketch This attachment and assign. ment strategy which allows the grammar to produce on average one parse per sentence. eliminates any combinatorial explosion while preserving all the necessary information. 3 Parsing dictionary definitions with a broad coverage Italian grammar. The first question to be answered at this point is whether and how well dictionary. definitions can be analyzed by a general purpose grammar The formulaic language of. dictionary text mentioned above reflects the frequent occurrence of lexical and syntactic. patterns expressing particular conceptual categories or semantic relations and the. higher frequency of defining generic terms see Calzolari 1984 These formulae how. ever crucial to the extraction of semantic information can be considered almost irrele. vant from the point of view of parsing because the variety of syntactic constructions in. which these formulae are manifested can be compared to that of text corpora And this. is also true with respect to the vocabulary used within definitions since unfortunately. none of the Italian dictionaries uses a restricted vocabulary unlike the LONGMAN DIC. TIONARY OF CONTEMPORARY ENGLISH These two factors combined make a robust ana. lysis even more necessary Such a variety of lexical choices and phrasal constructions in. dictionary definitions poses therefore the same range of problems a parser is faced with. in analyzing ordinary texts, Dictionary text does of course differ from general text in some predictable ways First. and most obvious the definition text rarely forms a complete sentence Fortunately the. syntactic form of the definition text is largely predictable from the part of speech of the. definiendum It is therefore important that the parser provide a switch indicating. whether the input should be parsed as a nominal verbal adjectival adverbial or prepo. sitional phrase or as a relative clause depending on the part of speech of the definiendum. and on the definition itself,268 EURALEX 92 PROCEEDINGS. For example the parse trees in Figure 1 show the parse of a very simple definition in. Garzanti for the noun arancia orange before and after the switch has been set which. forces an N P analysis The definition reads frutto dell arancio fruit of the orange. Before After,dell arancio dell arancio, Figure 1 Parse trees for the definition text frutto dell arancio.
In the first parse without an NP the definition has been analyzed as a complete sentence. with an empty subject headed by the verb fruttare to yield in the financial held. followed by the noun phrase dell arancio as object According to this analysis the. string would be translated as I yield some orange tree While this analysis is syntacti. cally valid it is not semantically well formed this interpretation can be ruled out only on. the basis of semantic information First it is very unlikely if not impossible for the noun. arancio to be the object of the verb fruttare Second the partitive determiner dello. cannot premodify a countable singular noun The more appropriate second parse is. obtained by forcing the analysis of the input string to be an NP Thus although a senten. tial parse is possible the correct NP parse is computed given that this text is the definition. text of a noun definiendum The category switch is an essential tool because it allows the. correct phrasal parse to be computed without having made any modification to the. broad overage grammar, Second definition text also differs from general text because it does not always form. even a complete phrase but often only fragments of phrases e g obligatorily transitive. verbs without objects It is therefore necessary that a parser provide a form of fitted. parsing 0ensen et al 1983 for handling fragments and for handling gaps in the gram. mar itself to ensure that the parser never fails to produce an analysis Fitted parsing is. accomplished by a set of procedures which assign a reasonable approximate structure to. the input in cases where no parse covering the entire string could be computed Such a. rough parse is still useful as input for further processing stages and for the extraction. procedure itself even if the results of this extraction have to be treated differently from. those derived from a complete analysis Examples of the results of the fitting procedure. applied to dictionary definitions will be given in the following section. Using only the broad overage Italian grammar and the parser described above we. began parsing definition text extracted from Il Nuovo DIZIONARIO GARZANTl and the. M o n t e m a g n i Tailoring a b r o a d c o v e r a g e g r a m m a r 269. ITALLAN DMI DATABASE mainly based on the Zingarelli dictionary It didn t take a lot for. us to identify two main areas of the grammar which needed to be tailored in order to give. more appropriate parsing results for dictionary text. 1 Resolution of ambiguous assignment, The default strategy for attachment ambiguity namely attachment to the closest possible head. should sometimes be changed for dictionary text bi this way some attachments which would. remain ambiguous in ordinary texts can be disambiguated in the context of dictionary defini. tions This is the case for instance with the attachment of post modifiers to coordinated. genus terms of certain classes Similarly the functional role assignment ambiguous in general. text almost always can be disambiguated in the context of dictionary definitions We assume. that constructions used within dictionary definitions are always in unmarked SVO order and. that the ambiguity stemming from potentially marked ordering of sentenceconstituents such. as SOV OVS and so forth is very unlikely to occur in this specific context. 2 Analysis of specific dictionary language constructions. Definition texts should be seen as fragments of wider text corpora Very rarely do they appear. as complete sentences in which case they are exceptions within the definition language They. are usually formulated as NP VP AdjP AdvP PP or relative clauses but because they are. condensed fragments of real texts obligatory elements are sometimes elided which makes the. definition syntactically ill formed and interpr table only by reference to a wider context. From this perspective it is often the case that syntactic deviance from the point of view of a. general grammar is a typical occurrence within dictionary definitions Such is the case with. noun definitions formulated as a noun phrase premodified by a prepositional phrase where. the PP specifies the usage domain of the word sense expressed by the NP Because a PP NP. construction with the PP pre modifying the NP is a syntactically deviant order within the. core grammar of Italian the grammar is unable to produce an NP node covering the whole. input string in spite of the switch forcing an NP analysis. These observations necessitate a revision of the grammar output in order to make the. extraction of semantic information from natural language definitions more efficient and. reliable This revision has been carried out a by ruling out some ambiguous construc. tions and b by handling and regularizing otherwise ill formed input The next section. will describe when and how these tasks are performed in relation to the whole parsing. 4 Disambiguating and reshaping the syntactic analysis of the. definitions, We decided not to intervene in the general grammar itself which should remain re. stricted in our opinion to the description of the central agreed upon grammatical struc. tures of language The choice was made to revise the initial syntactic analysis during a. post processing stage The disambiguation task is carried out by a module specifically. conceived for this purpose the Dictionary Definition Disambiguator DDD This com. ponent still in an embryonic stage has been designed to resolve whenever possible. what was left undecided during the first stage of processing The task of reshaping. incomplete parses is handled by modifying the fitting procedure to deal properly with. the ill formed but in the context of dictionary language common constructions This. minor addition to the overall architecture of the general parsing system led to a marked. improvement of the parsing results Since these parses are the input to the c. Tailoring a broad coverage grammar for the analysi osf dictionary definitions semantic knowledge from the dictionary definitions in order to construct lexical

Related Books