After studying and drawing out the features of this language model, the thesis will focus on solving the following problems:
- Parsing problem . This is a must-do problem when building a new syntactic representation model.
- Machine translation problem . The associative grammar model represents many distinctive features of Vietnamese that need to be transformed into another language. Therefore, the thesis chooses the Vietnamese-English translation problem to take advantage of the ability to represent the direct relationship between words of the associative grammar model.
CHAPTER 2
VIETNAMESE LINKED PHARMACOLOGY MODEL
2.1.Associative grammar for Vietnamese
From the formal definition of a linking grammar, it can be seen that the most important job when building a grammar is to map words with linking nodes.
If the elemental unit when parsing some languages is a morpheme, that unit in Vietnamese is a word. According to documents of the Social Science Committee [28], each word in Vietnamese can consist of many morphemes. The word limit detection in the text is done by the automatic word separator.
Vietnamese has different characteristics from other languages, such as in semantics, there is no meaning in the morphological category (like, number, way); In sentence-making activities, grammatical relationships are not expressed in transformations but in word order [16]. The connections of the association grammar can perfectly represent these relationships.
Links appear when words are combined. According to Nguyen Tai Can [2], there are three main types of combinations: conjugation, clause, and short phrase. Conjugation and clause will be considered when performing complex parsing steps and will be covered in the next chapter. Phrases are combinations consisting of a center connected to the sub-elements by the main-sub-relation [2]. Depending on the type of center, short phrases are divided into nouns, verbs or adjectives. The association relationships will be built based on the structure of the phrases. In addition, some relationships are not expressed through word relations, for example “ my mother ”, “ Ao anh”., are two noun phrases that go side by side, the second noun will indicate the owner of the first noun. This is one of many special phenomena of Vietnamese syntax. Showing these relationships will effectively support the machine translation system with the source language being Vietnamese.
All linking cases will be stored in the linking grammar dictionary.
2.1.1. Link dictionary structure
The English grammar dictionary system was built by Sleator and Temperley, according to [111]. In 2003, Szolovits added a series of medical words [113]. From 2008 to 2011, the dictionary was updated by Linas Vepstas, adding clause relations, Mike Ross also added some new entries mainly related to subordinate clauses with the words “than” and words . link form “ wh ” [137].
The system is divided into 12 large sections with 7 categories for English words: nouns, determiners, pronouns, verbs, adjectives, adverbs and prepositions. Also included are the following items:
- Number formats.
- Words indicating time and place.
- Conjunctions, question words.
- From comparison.
- Punctuation, other words.
In order to organize the storage easily, [111] has given the notation to form the formula to represent the association rules, that is:
Link Dimension :
The “+” sign after the connection name is only associated with the word on the right,
The "-" sign after the connection name is only associated with the word on the left,
Operator :
& occurs simultaneously on both component associations.
or occurs in either, or both, component associations.
xor selects only one of the two component links. This operator added by the thesis to the Vietnamese parser to handle the case allows to choose only one of two ways of linking, for example linking with the word "beautiful" can be "very beautiful" or " beautiful " . wonderful ” but cannot be “ very beautiful ”.
{C}: C may or may not appear.
@C: Multiple instances of a C-type connection can occur, for example in the phrase " the cute red hat ", two adjectives " cute ", " red ", both modify the noun " hat ".
Macro : Allows you to define a number of "macros" to make formulas more concise and easy to understand, for example a macro that defines a clause:
: {({@COd-} & (C- or )) or ({@CO-} & (Wd- & {CC+})) or [Rn-]};
In the following formulas, all occurrences of the expression on the right hand side are replaced by .
The Vietnamese linked dictionary also has the same structure as the English linked dictionary, meaning that each formula is set up for words of the same type. According to [16], Vietnamese words are divided into categories as shown in Table 2.1. down here:
Table 2.1. Types of Vietnamese words
STT | Type code | Type name |
first | WOMEN | noun |
2 | DRAW | verb |
3 | A | adjective |
4 | USA | number of words |
5 | P | pronouns |
6 | CHEAP | adverb |
7 | E | preposition |
8 | OLD | conjunctions |
9 | I | auxiliary word |
ten | O | sympathy |
11 | EASY | the word |
twelfth | Z | word elements (real, no, etc.) |
13 | X | Unknown |
Maybe you are interested!
-
Vietnamese linking grammar model - 1
-
Vietnamese linking grammar model - 6
-
Model of Vietnamese linking grammar - 10
-
Vietnamese linking grammar model - 16
-
Model of Vietnamese linking grammar - 17
Words are further divided into subcategories. In Table 2.2 below are the subcategories based on the hierarchy of [16] with the addition of the number of subcategories to meet the requirements for distinguishing links when translating according to the thesis's machine translation system.
Table 2.2 . Vietnamese word subcategories
STT | Symbol | Type code | Subtype name |
first | Np | WOMEN | proper noun |
2 | Nc | WOMEN | monosyllabic noun |
3 | ENGLISH | WOMEN | overall noun |
4 | Na | WOMEN | Abstract nouns |
5 | Ns | WOMEN | noun of type |
6 | Nu | WOMEN | unit noun |
7 | Nl | WOMEN | position noun |
8 | Because | DRAW | intransitive verb |
9 | Vt | DRAW | transitive verb |
ten | With | DRAW | state verbs |
11 | Vm | DRAW | modal verb |
twelfth | Vr | DRAW | relational verbs |
13 | Ap | A | adjective |
14 | Ar | A | relational adjective |
15 | Pond | A | onomatopoeia |
16 | Who | A | pictographic adjective |
17 | MC | USA | number from number |
18 | Mo | USA | ordinal word number |
19 | Pp | P | address pronouns |
20 | Pd | P | subject |
21 | Pq | P | quantity pronouns |
22 | Pi | P | interrogative pronoun |
23 | Rt | CHEAP | present time subjunctive |
24 | Rp | CHEAP | past time subjunctive |
25 | Rf | CHEAP | future time adverb |
26 | Rl | CHEAP | adverb of degree |
27 | Rc | CHEAP | comparative adverb |
28 | Go out | CHEAP | affirmative adverb |
29 | Rn | CHEAP | negative adverb |
30 | Rs | CHEAP | adverb of range |
thirty first | Es | E | preposition range |
32 | Ep | E | position preposition |
33 | Waist | E | possessive preposition |
34 | Em | E | material prepositions |
35 | Eg | E | purpose preposition |
36 | Cs | OLD | main conjunction |
37 | Cc | OLD | conjugated conjunctions |
38 | I | I | auxiliary word |
39 | O | O | sympathy |
40 | Dp | EASY | determine from quantity |
41 | Dp | EASY | plural adjective |
42 | Ds | EASY | singular adjective |
43 | Z | Z | word elements (real, no, etc.) |
44 | X | X | Unknown |