The grammar of tree connection was also built for Vietnamese in [22] by extracting from the Vietnamese treebank. In terms of language representation, tree-connected grammars are capable of representing context-sensitive languages. This approach is effective when the Vietnamese treebank is large enough.
1.2. Approach through stroke structure and unified grammar
Unified grammar is built on the basis of merging feature structures. The stroke structure is represented through the Attribute Value Matrix (AVM) of the form:
Stroke 1 Value1
Maybe you are interested!
-
The Structure Approach and Context-Free Grammar
-
Link Building Based On Verb Structure
-
Building Links Based On Adjective Structure
-
Features and Conditions of Selection Structure
-
Steps for Selection Structure
Stroke 2 Value2
… . . .
Stroke n Value n
For example, a noun structure in English describes the features of a noun: Type - noun, Number - Few, Person - 3 as follows:
CAT NP
NUMBER SG
PERSON 3
The stroke structure is defined as the mapping F → VF, F is the set of strokes, VF is the set of values that can be assigned to the strokes.
The above example is a stroke structure on the stroke set F = { CAT, NUMBER, PERSON }, the value set VF = { NP, SG, 3 }.
The incremental grammar contains the rules of the form A → X1…Xn where A is the name of the parent stroke structure, X1, …Xn is the child stroke structure.
Rules in additive grammar are represented by a stroke structure containing variables, so that the rule can be applied to many different situations. For example, the rule of addition for a simple noun phrase:
(NP NUMBER ?n) → (ART NUMBER ?n) (N NUMBER ?n )
represents the numerical unity of articles and nouns.
If the stroke can be represented as a line graph, then the stroke graphs can be merged into one large graph. It is the main component of unified grammar.
Unified grammar is a tool that can represent language class 0 which is the largest language class according to Chomsky's hierarchy [63]. According to Tran Ngoc Tuan's group [26], using unified grammar can solve some phenomena in Vietnamese such as the association of some words. Words can join together only when a conjugation that unites their strokes can be made. For example, the word “book” with the SHAPE: square/thin stroke is associated only with objects that have the same SHAPE stroke description, such as “book”. However, the detailed description for most of the phenomena of Vietnamese grammar to build a specific analyzer is too complicated. The authors of [26] only deal with a subset of Vietnamese nouns.
1.3. Dependency approach
1.3.1. Some concepts
Dependent grammar has its origins in the ancient Indian language Panini, the modern model introduced by Lucien Tesnière [75]. The study of dependent grammar flourished in Slavic languages [92], Turkish due to the free characterization of word order.
An important point in the dependency grammar model is an asymmetric relationship called a dependency (or dependency - dependency) relationship. The dependent relationship that occurs between a dependent word and another word on which it depends is called the head word .
The dependency grammar uses two alphabets: the terminating symbol set and the auxiliary symbol set.
Each element of the terminating symbol set is a smallest syntactic unit (prime unit), i.e. morpheme (in morphologically modified languages), pronunciation, or word... The utterance is considered as a string of elements of the terminating symbol set.
The auxiliary symbol set is the set of occurrence type names of the terminating symbols. Complementary symbols are not allowed to be ambiguous; Each symbol has fixed syntactic properties .
There are different models of dependency grammars. The first model was formally described by Hays [62] and Gaifman [57].
Definition 1.3 . [57]
The dependent grammar is a set of four components DG = ( L, C, F, R ), where
L: Terminal alphabet.
C: The auxiliary alphabet.
F: L → C assignment function.
R: The set of rules depends on one of the following three forms:
- Xi(Xj1, Xj2,… ,*, …, Xjn), where Xi is the central word, Xj1, Xj2,…, Xjn are the dependent words, n is a number. The order of words in rule 1 is the order in which they appear in the sentence (there may be interjections between the words mentioned in the rule). The * marks the position of the central word when standing with its dependent words in the utterance.
- Xi (*), indicating that the terminator for Xi can appear without the dependent word.
- *(Xi), indicates that the unit corresponding to Xi can occur without a central word. This object is the center of the utterance in which it appears.
For example:
Grammar DG = ( L, C, F, R )
L = { John, loves, a, woman }
C = { N, V, Det }
F: John → N, woman → N, loves → V, a → Det
R includes the rules:
- *(V)
- V(N, *, N)
- N(Det, *)
- N(*)
- Det(*)
Usually, a ROOT word is added to easily handle objects like V. The sentence “ John loves a woman ” can be represented as a tree as shown in Figure 1.4 below:
Figure 1.4 . Analysis of the sentence “ John loves a woman ” in a dependent grammar model
In relation to dependent grammars, there are several important concepts and properties that will be discussed below.
The definitions below are taken from [75]
Definition 1.4.
A sentence is a sequence of prefixes (words) represented by S = w0w1…wn
For simplicity, assume that the sequence w1,…wn is a sequence of different words, for example in the sentence “ Mary saw John and Fred saw Susan ”, two different instances of the word “ saw ” are considered distinct.
Definition 1.5.
Suppose R = { r1, … , rm } is a finite set of possible dependencies between two words in a sentence. The relation type r R is called the label of the arc,
Definition 1.6.
The dependency graph G = (V, A) is a directed graph consisting of a vertex set V and an arc set A such that for the sentence S = w0w1…wn and the label set R, the following statements are true:
- V ⊆ { w0, w1, … wn }.
- A ⊆ V × R × V.
- If (wi , r, wj) ∈ A then (wi . r',wj) ∉A for all r'≠ r.
Example : The dependency graph of the sentence " Economic news had little effect on financial market " in Figure 1.5.
Figure 1.5 . Dependency graph of the sentence “ Economic news had little effect on financial market ”
G = (V, A)
V = VS = { ROOT, Economic, news, had, little, effect, on , financial, markets }
A = { (ROOT, PRED, had), (had, SBJ, news), (had, OBJ, effect), (had, PU,.), (news, ATT, Economic), (effect, ATT, little) , (effect, ATT, on), (on, PC, market), (market, ATT, financial) }
The definition of dependencies (wi , r , wj ) is not unique but varies across different linguistic theory systems .
Definition 1.7.
The correct dependency graph G = (V, A) of the sentence on S and the set of dependencies R is a tree-shaped, directed dependency graph that comes from node w0 and has a set of frame nodes.
V = VS. We call this dependency graph the dependency tree .