3.3.2. Conjugate Disambiguation
Conjugate ambiguity is ambiguity related to phrases that have an equivalent role in a sentence. As noted in , the word “and” plays a special role when analyzing a sentence on the associative grammar model because it can contain both a regular connection as well as a large connection.
According to discourse structure theory, the word " and " is itself a discursive sign. Therefore, it is necessary to distinguish the case where the word “ and ” is a discursive sign and the word “ and ” only connects two simple words or phrases but not two clauses.
In , Lê Thanh Hương also mentioned the problem of ambiguity when segmenting discourse, in which a word can act as a discursive sign as well as another, with the most obvious example being the word "" and ” in English. Checking whether the word “ and ” is a sign of discourse is done by checking if the sentence is still syntactically correct when removing the word. Example sentences with " Mary borrowed that book from our library, and she returned it this morning ". This observation can work convincingly in English, where the word “ and ” is a discursive sign often followed by a comma and nouns are often followed by articles. From “ and” in Vietnamese also has a similar role when it plays the role of a conjunction. However, in Vietnamese, the words “ and ” usually do not come with commas like “ I study and you sleep ”. Moreover, when removing the word " and " in the two noun phrase " she and princess " can lead to a completely correct phrase " princess ", which cannot be syntactically incorrect.
If a discursive sign is found immediately after the occurrence of “ and ” and if the left boundary of the prime unit is found to the left of “ and ” then a new prime unit is defined whose right boundary is of that elemental unit immediately before “ and ”. In such a case, “ and ” are considered to have a discursive function.
For example, with the sentence “Even though it rained heavily and although everyone prevented it, it went ahead ”, the result of the discourse segment would be [Even though it was raining heavily] [and although everyone prevented,] [it goes on.]. In this sentence the word “ and ” has a discursive role because it is preceded by the word “ although ” which is a sign of concessional relationship.
Apart from the above case, the analyzer is shallow in and ignores any words “ and ” other than the NOTHING action.
It is found that in Vietnamese syntax, the subject with the first clause appearing in the compound sentence is mainly a noun, the predicate is mainly a verb or an adjective. There are other core types, such as the subject being a verb, but the thesis proposes a processing algorithm according to the following idea:
A phrase in a compound sentence is a true proposition if in its analysis by linking grammar there exists at least one link SV (link between subject and verb), link SA (link subject with character). word) or a combination of two associations DT_LA and LA_DT (association of the word “is”).
The thesis solved the problem of ambiguity by parsing the phrase appearing before and after the word “ and ”. If both phrases are syntactically correct, the words “ and ” act as discourse. On the contrary, it acts as a conjugate. That is shown in the algorithm in Figure 3.11 and the results of analyzing the sentence " I like cake and candy, you like wine and beer " in Figure 3.24.
Figure 3.24 . Analyze the sentence " I like cake and candy, you like wine and beer "
When analyzing the phrase "I was in Nghe An and Ho Chi Minh City", the phrase "I was in Nghe An" is a proposition, but the phrase "Ho Chi Minh City" is not a proposition. The word "and" is not a discursive sign.
With the sample corpus used for the compound sentence parser, the conjugation de-ambiguity significantly improves the results of discourse analysis. Comparison of the results of discourse analysis with and without ambiguity is presented in Table 3.8 below.
Table 3.8 . Compare the results of discourse analysis
|Input set||Number of compound sentences||Number of clauses||Number of analytic clauses (without de-ambiguity)||Number of correct analytic statements (with ambiguity removed)|
|first||50||eighty seven||62 (71.26%)||87 (100%)|
|2||25||62||27 (43.54%)||36 (58.06%)|
Maybe you are interested!
- Vietnamese linking grammar model - 1
- Vietnamese linking grammar model - 6
- Model of Vietnamese linking grammar - 8
- Model of Vietnamese linking grammar - 10
- Vietnamese linking grammar model - 16
The percentage of correct analytical statements after de-ambiguity increases more or less depending on the frequency of occurrence of suggested words that can cause ambiguity. The results are incorrect when removing ambiguities related to the words “ and ”, “ or ”, commas mainly because the clauses contain noun-adjective phrases. A noun-adjective phrase can be the core, but it can also be just a noun that acts as the subject. For example in the sentence " Sapa is the "kingdom" of fruits, peach blossoms, big yellow peaches, small yellow peaches, queen plums, purple plums, tam hoa plums, lily flowers, plum blossoms, pear blossoms, peach blossoms, chrysanthemums, roses ... especially immortal flowers live forever with time ", commas cause ambiguity. Phrases like “ big yellow peach ”, “ small yellow peach” , “ purple plum”' are decomposed into separate clauses when in fact they are just nouns acting as proofs for the assertion before the word ' like '.
When acting as a conjugate, the word “ and ” will have connections such that it plays the role of each element in its list. The selection form of the word “ and ” has a large connection F. The connection F points to both sides of the word “ and ”, in addition, the connections of the word “ and ” are an extension of F, i.e. the initial connections. the beginning of F. This helps the words “ and ” connect the two parts of the list “ and ” together, and act as those elements in the sentence as discussed in chapter 1.
When applied on the link parser, the result is as shown in Figure 3.25.
Figure 3.25 . An analysis with the F connection for the word “ and ”
However, this can lead to a connection: brother — sister . Although the linking grammar allows for cycles, this association does not represent the actual relationship in the sentence.
To remove this association,  adds some information for the large connection and corrects the matching condition of the connections. Each connection is appended with a priority of 0, 1, and 2. Normal connections (not a large connection) have a priority of 0. A large connection on a word has a priority of 1, and large connections on the words “ and ” have precedence of 2. In order for two connections to match, they must first match according to normal criteria, and their precedence must be compatible: 0 compatible with 0; 1 compatible with 2; 2 is compatible with 1. No precedence is compatible anymore.
The applied thesis method has effectively solved a number of cases with the word “ and ” in practice. However, there are some phenomena with the words “ and ” and are treated according to .
The most common case is a list with more than two elements, where the elements in the list “ and ” are separated by commas. For example " grandpa, grandma, father and mother ". Then, the comma will have the form of selection (( G2 ) ( G1 , G2 )). Here, the subscript represents the priority of the connection.
Figure 3.26 . Connect G joins multiple commas and the words “ and ”
In the example in Figure 3.26, the second comma used that form to connect to the first comma via the G connection (priority 2, because the G connection of the first comma already has precedence 1), then then the G connection with priority 1 is used to connect the second comma with the word “ dad ”, and the connection G with priority 2 is used to connect the second comma with the word “ and ” (Connect G with Priority 1 was used to connect the words “ and ” with the word “ mother ”).
The parsing problem is the crucial problem to be solved when building a new syntactic model. With the linking grammar model built for Vietnamese, the link parser of the thesis has solved the following problems:
- Parsing for single sentences.
- Parsing for compound sentences with multiple clauses.
- Completely solved the conjugate ambiguity problem.
- Testing the component de-ambiguity algorithm.
The experimental results of the parsing algorithms are acceptable. However, due to the complexity of natural language as well as time constraints, the thesis has not solved the following issues:
- Parsing sentence types where some elements have arbitrary positions. The nature of the associative grammar is the dependent type grammar, so this problem is not too difficult, although in some cases it may violate the flatness.
- Parsing for compound sentences without conjunctions. This problem also has the potential to be solved. When concluding a sentence is not syntactically correct, the parser has come up with all possible analyzes of every phrase in the sentence. Violation of analytic connectivity can be a sign of a missing conjunction. To fully solve this problem requires more in-depth study of the language as well as the large corpus.
- Parsing for complex sentences. This is also a very difficult problem with other languages and requires the use of statistical methods to find the bounds of the proposition. Hopefully, this problem will be solved in the future, when a large enough corpus is built.
Another development direction that is also of interest is the integration of semantic linkages in Vietnamese associative grammar. This is possible with the associative grammar model that allows the sentence analysis to be represented by a cyclic associative graph, but this is also a big problem, requiring a lot of time investment.