# Simple Sentence Analysis Test Results And Simple Compound Sentences

The COUNT function takes as inputs the two words L and R and a pair of two pointers to join lists: l points to a connection in the list to the right of a selection of L and r points to a connection belonging to L and R. the list to the left of a selection form of R. COUNT returns a number, that is, the number of ways to draw connections of words from L to from R, with connections in the list pointed to by l and r.

The calculated result of the COUNT function at each call is stored in a hash array (just before returning). In the next time, the result of the previous calculation is looked up in the hash table. Since there is a hash table that stores the results, the runtime cost is O(c2 d) where d is the number of pools and c is the number of connections. For a given grammar, d = O(n) and c = O(n), so the running time is O(n3).

3.1.2. Trimming comb

With the formulas presented in Chapter 3, in order to cover Vietnamese syntactic phenomena, the number of selection forms to consider is very large. However, most conjunctions are not used because they contain connections that do not match a connection of a word in the sentence. In particular, suppose a word W has the form d with the connection C in the list on the right. If none of the words to the right of W have a join on the left that matches C, the candidate form d cannot be included in any correct analysis. Therefore, this type of recruitment can be deleted without changing the results of the association analysis. The deletion of that selection is called the pruning step .

Maybe you are interested!

The trimming comb process is divided into two steps: the trimming comb and the strong trimming comb.

Trimming comb

Sequentially traverse the words in the sentence from left to right, then right to left, and so on until no more forms can be eliminated.

Assume the mth word in the sentence under consideration. The set S of the connections in the list must be in a collection of the words 1,…, m – 1 stored in a hash table, with the hash using the initial capital letters of the connection name. This will save you a lot of time looking for a match to match it.

In fact, the parsing process of  shows that it never takes more than five passes to finish the pruning process.

Strong Trimming Comb

Call a connection shallow if it is the first connection in its list of connections.

In contrast the connection is deep. Strong pruning comb is based on the following criteria:

1. The nearest word criterion must be satisfied for both connections to form the association.
2. There can be no link between two deep connections.
3. The two connections of a link between two adjacent words must be the last in their list.
4. Two connections of a link between two non-adjacent words cannot simultaneously be the last connection in their list (Except in the case of a large connection).

Trimming comb on expression tree

Although according to , after building all the newly-trimmed categories, the thesis has chosen the method of English association analyzers , which is to build a tree representing the association formulas. the concatenation of each word, and then pruning the tree before constructing the aggregates. This treatment allows for much faster execution than the one introduced in .

If the connection name is treated as operands, &, or, xor are the operators, the join formula has the same structure as an arithmetic expression ({X} is converted to X or()). Figure 3.5.below depicts a tree representing the association formula

Figure 3.5 . Formula tree (NN- &{NN+}) or ({PqNt-} & {NN+})

When traversing the association of words for pruning as introduced in Chapter 4, if we find a connection that doesn't match any of the connections on the right, we will remove the nodes in the tree according to the following rules:

• If a child node of the node labeled “&” is deleted, delete the node.
• If the node labeled “or” “xor” has no children, remove the node.

In addition, the following three rules should be applied sequentially.

1. If there is more than one node labeled “( )” as a child of a certain node, only one node is kept
2. If the node labeled “&” has more than one child node including the node labeled “( )” then it will be removed from the tree.
3. If a node labeled “&” or “or” , “xor” contains only one child node, replace its label with the child node's label.

Of course, the process is still performed in the order of left → right then right → left, etc. The results are the same as the results of two steps of strong pruning comb and comb, but the execution speed is much faster.

The effect of trimming comb in Vietnamese

Because Vietnamese does not change morphemes, tenses, forms, numbers ... are expressed by adding words, so the number of initial forms of each word, especially nouns and verbs, is much larger than in English. However, the pruning algorithm is very effective: after two strong pruning and combing processes, the number of selection forms is only equivalent to English, and there are no sentences in the thesis's example set. 5 times trimming comb.

Figure 3.6 . Number of types after trimming combs and strong trimming combs

In Figure 3.6. is an image of the comb result produced by the analyzer with the sentence “we want to win the titles”.

3.1.3. Test results for analyzing simple sentences and simple compound sentences

Link parser built in Java, working on Windows environment. To test the parser model according to , the thesis has collected 200 sentences, typical for different types from articles on the Internet on a number of topics: Vietnamese conversation, science casual, sport, travel. Below are the results of the program with the sentence “ We want to win titles

Figure 3.7 . The results of the link analysis of the sentence " We want to win titles "