Model of Vietnamese linking grammar - 8

After studying and drawing out the features of this language model, the thesis will focus on solving the following problems:

parsing problem . this is a must-do problem when building a new syntactic representation model. 1

  • Parsing problem . This is a must-do problem when building a new syntactic representation model.
  • Machine translation problem . The associative grammar model represents many distinctive features of Vietnamese that need to be transformed into another language. Therefore, the thesis chooses the Vietnamese-English translation problem to take advantage of the ability to represent the direct relationship between words of the associative grammar model.



2.1.Associative grammar for Vietnamese

From the formal definition of a linking grammar, it can be seen that the most important job when building a grammar is to map words with linking nodes.

If the elemental unit when parsing some languages ​​is a morpheme, that unit in Vietnamese is a word. According to documents of the Social Science Committee [28], each word in Vietnamese can consist of many morphemes. The word limit detection in the text is done by the automatic word separator.

Vietnamese has different characteristics from other languages, such as in semantics, there is no meaning in the morphological category (like, number, way); In sentence-making activities, grammatical relationships are not expressed in transformations but in word order [16]. The connections of the association grammar can perfectly represent these relationships.

Links appear when words are combined. According to Nguyen Tai Can [2], there are three main types of combinations: conjugation, clause, and short phrase. Conjugation and clause will be considered when performing complex parsing steps and will be covered in the next chapter. Phrases are combinations consisting of a center connected to the sub-elements by the main-sub-relation [2]. Depending on the type of center, short phrases are divided into nouns, verbs or adjectives. The association relationships will be built based on the structure of the phrases. In addition, some relationships are not expressed through word relations, for example “ my mother ”, “ Ao anh”., are two noun phrases that go side by side, the second noun will indicate the owner of the first noun. This is one of many special phenomena of Vietnamese syntax. Showing these relationships will effectively support the machine translation system with the source language being Vietnamese.

All linking cases will be stored in the linking grammar dictionary.

2.1.1. Link dictionary structure

The English grammar dictionary system was built by Sleator and Temperley, according to [111]. In 2003, Szolovits added a series of medical words [113]. From 2008 to 2011, the dictionary was updated by Linas Vepstas, adding clause relations, Mike Ross also added some new entries mainly related to subordinate clauses with the words “than” and words . link form “ wh ” [137].

The system is divided into 12 large sections with 7 categories for English words: nouns, determiners, pronouns, verbs, adjectives, adverbs and prepositions. Also included are the following items:

  • Number formats.
  • Words indicating time and place.
  • Conjunctions, question words.
  • From comparison.
  • Punctuation, other words.

In order to organize the storage easily, [111] has given the notation to form the formula to represent the association rules, that is:

Link Dimension :

The “+” sign after the connection name is only associated with the word on the right,

The "-" sign after the connection name is only associated with the word on the left,

Operator :

&  occurs simultaneously on both component associations.

or  occurs in either, or both, component associations.

xor  selects only one of the two component links. This operator added by the thesis to the Vietnamese parser to handle the case allows to choose only one of two ways of linking, for example linking with the word "beautiful" can be "very beautiful" or " beautiful " . wonderful ” but cannot be “ very beautiful ”.

{C}: C may or may not appear.

@C: Multiple instances of a C-type connection can occur, for example in the phrase " the cute red hat ", two adjectives " cute ", " red ", both modify the noun " hat ".

Macro : Allows you to define a number of "macros" to make formulas more concise and easy to understand, for example a macro that defines a clause:

: {({@COd-} & (C- or )) or ({@CO-} & (Wd- & {CC+})) or [Rn-]};

In the following formulas, all occurrences of the expression on the right hand side are replaced by .

The Vietnamese linked dictionary also has the same structure as the English linked dictionary, meaning that each formula is set up for words of the same type. According to [16], Vietnamese words are divided into categories as shown in Table 2.1. down here:

Table 2.1. Types of Vietnamese words

                             STT Type code Type name
first WOMEN noun
2 DRAW verb
3 A adjective
4 USA number of words
5 P pronouns
6 CHEAP adverb
7 E preposition
8 OLD conjunctions
9 I auxiliary word
ten O sympathy
11 EASY the word
twelfth Z word elements (real, no, etc.)
13 X Unknown

Maybe you are interested!

Words are further divided into subcategories. In Table 2.2 below are the subcategories based on the hierarchy of [16] with the addition of the number of subcategories to meet the requirements for distinguishing links when translating according to the thesis's machine translation system.

Table 2.2 . Vietnamese word subcategories

STT Symbol Type code Subtype name
first Np WOMEN proper noun
2 Nc WOMEN monosyllabic noun
3 ENGLISH WOMEN overall noun
4 Na WOMEN Abstract nouns
5 Ns WOMEN noun of type
6 Nu WOMEN unit noun
7 Nl WOMEN position noun
8 Because DRAW intransitive verb
9 Vt DRAW transitive verb
ten With DRAW state verbs
11 Vm DRAW modal verb
twelfth Vr DRAW relational verbs
13 Ap A adjective
14 Ar A relational adjective
15 Pond A onomatopoeia
16 Who A pictographic adjective
17 MC USA number from number
18 Mo USA ordinal word number
19 Pp P address pronouns
20 Pd P subject
21 Pq P quantity pronouns
22 Pi P interrogative pronoun
23 Rt CHEAP present time subjunctive
24 Rp CHEAP past time subjunctive
25 Rf CHEAP future time adverb
26 Rl CHEAP adverb of degree
27 Rc CHEAP comparative adverb
28 Go out CHEAP affirmative adverb
29 Rn CHEAP negative adverb
30 Rs CHEAP adverb of range
thirty first Es E preposition range
32 Ep E position preposition
33 Waist E possessive preposition
34 Em E material prepositions
35 Eg E purpose preposition
36 Cs OLD main conjunction
37 Cc OLD conjugated conjunctions
38 I I auxiliary word
39 O O sympathy
40 Dp EASY determine from quantity
41 Dp EASY plural adjective
42 Ds EASY singular adjective
43 Z Z word elements (real, no, etc.)
44 X X Unknown

Send Message

Agree Privacy Policy *