The above problem is solved by the Inside - Outside Algorithm proposed by Baker in 1979 for context-free grammar [81]. This is essentially a variant of the forward-backward algorithm of the Hidden Markov Model (Hidden Markov Model - HMM). The algorithm allows to calculate the inner and outer probabilities for the input sentence S recursively.

The hidden Markov model introduced by Manning and Schütze [87], is interested in the sequence of observations O1,…, Om produced by the laws Ni → NjN k and Ni → wj . Where Oi , i = 1, m are actually the terminating symbols (from) w1,…, wm of the input string.

According to the HMM model, the parameter matrix of the probability context-free grammar is α[i, j, k] and β[i, r] with:

α [i, j, k] = Pr ( Ni → NjNk | G )

Maybe you are interested!

- Distinguishing Rhetorical Metaphor from Vocabulary Metaphor
- Probability Density Curve According to Normal Distribution “Source: Nguyen Phuong, Nguyen Van Lam 2009”, [19]
- Can Use Tools To Check The Difficulty Level Of Vocabulary And How To Choose Materials Appropriate For Learners' Level

β [i, r] = Pr ( Ni→ r | G )

To be able to construct the above parameter matrix, the context-free grammar is assumed to be in Chomsky normal form. This does not reduce the generality of the model, because according to [63], any context-free grammar can be converted to Chomsky normal form. The following constraint is required for the parameters:

∑ α 8i, j, k; + ∑> β 8i, r; = 1 for all i.

This constraint (regarding the i-th non-terminating notation in the grammar) shows that every productive applicability where the left-hand side is the i-th non-terminated symbol can produce only or two non-terminating symbols. or a terminator (due to the grammar in Chomsky normal form).

Here is the notation convention according to [87]:

- The non-terminating symbol set of the grammar is denoted by { N1 , …, Nn }. The first symbol is N1 .
- The terminating symbol set of the grammar is {w1 , …, w V }.
- Sentence analyzed w1… wm.
- wpq is the part of the sentence that needs to be analyzed from the pth word to the qth word.
- NBC ? is a non-terminating notation Nj that produces a sequence of words from position p to q in the sentence.
- αj(p, q) is the outer probability.
- βj(p, q) is the inner probability.

*The probability in* βj (p, q) is the probability that the jth non-terminating symbol (Nj ) produces the observation (sequence of words) wp,… .wq. Formally,

βj ( p, q ) = Pr ( wpq | NBC ? , G )

*Outer probability* αj (p, q) is the probability that starting from the initial symbol N1 produces the non-terminating symbol NBC ? and the words of the input string are outside wp,… , wq. Formally, we have:

αj ( p, q ) = Pr ( w1(p-1), NBC ? , w(q+1)m | G )

The inner and outer probabilities are the basis for building algorithms related to two main problems in parsing probabilistic models, which are:

1. ** Recognition** : Calculate the probability that the first symbol N1 generates the sequence of observations O in the grammar G. Thus, with the algorithm in (Inside Algorithm), the probability that a sentence has m from w1… wm true (produced by the grammar G) is:

Pr ( w1m | G ) = Pr ( N1 ∗ ⇒ w1m | G ) = β1 ( 1, m )

The aforementioned probability is the true probability of the sentence, i.e. *the sum of the probabilities of the analyses* . To solve the ambiguity problem, it is necessary to find the analysis with the largest probability among the analyses. This problem is solved by *the Viterbi type algorithm * in the HMM model. Similar to the internal probability algorithm, but this algorithm finds the maximum value instead of the sum. In [87], the entire Viterbi-type algorithm was presented to find the best syntax tree for the sentence w1… .wm.

2. ** Training** : After finding the best parser for the input sentence, the parser needs to continue with the training phase. The training problem can be described as follows: redefine the probability of the rule set in the grammar G given the training sequence of sentences s1, s2,…, sn. The problem of training for a probabilistic context-free grammar was presented in [87].

According to [70], a probabilistic context-free grammar has the following disadvantages:

- It is not possible to model the dependency between structures on the syntax tree because the probability of each rule is calculated completely independent of each other.
- Lack of lexical information: Syntactic information may be related to particular words, but the context-free model is not descriptive. This leads to ambiguity in coordination, subcategory, and prepositional use.

**1.1.3. lexicalization probabilistic context-free grammar**

The lexicalized probabilistic context-free grammar not only shows the structure of phrases but also shows the relationship between words. In a Lexicalized Probabilistic Context Free Grammar, each non-terminating symbol is written as A(x), x = (w, t) where A is the label of the structure. The number of non-terminating symbols will increase dramatically, at most until |ν| × |τ| times, |ν| is the number of words in the vocabulary and |τ| is the number of type words of the language

The law of a context-free grammar of lexicalization probability takes the form:

*1. Internal law:*

P (h) → Ln(ln)…L1(l1) H(h) R1(r1) … Rm(rm) (1.1)

In which, h is the *word/label word pair of type* . H is the primary child of the rule, which will inherit the *word/label pair from the parent node's type* P. Component Ln(ln) … L1(l1) modifies H on the left and component R1(r1)… Rm(rm ) modifies the H on the right (n or m can be zero). The left and right ranges are extended by the STOP notation. So Ln+1 = Rm+1 = STOP

*2. Vocabulary law:*

P(h) → w, P is a type label word, h is a (w, t) pair (1.2)

Figure 1.3. The following illustrates a lexicalized probabilistic context-free grammar [43].

When calculating the probability for each production, adding lexical information makes the denominator infinitely large, the probability is almost zero.

To avoid the number of parameters being too large, in the model given by Collins [43], the probability of the internal law is calculated based on the probability series law.

The probability of generating a modifier can depend on any function of the previous modifiers, the domain of the central word or the central word. Therefore, distance is added [43] to the assumption of independence of modifiers.

This model has also been used by Le Thanh Huong group [22] to build a Vietnamese parser with the comment " *In Vietnamese, the boundary components of languages depend more on the component next to it than on the term boundary components. depends on the central component* ”. In [22], a formula for calculating the probability of a rule for boundary components has no appearance of the distance and a formula for calculating the probability of a rule with added probability values connecting words on either side of the component. main of the right-hand side.

**Judiciary (production)**

**Intrinsic Laws**

TOP → S(bought, VBD)

S(bought, VBD → NP(week, NN) NP(IBM, NNP) VP(bought, VBD)

NP(week, NN → JJ(Last, JJ) NN(week,NN)

NP(IBM, NNP) → NNP(IBM, NNP)

VP(bought, VBD) → VBD(bought,VBD) NP(Lotus,NNP)

NP(Lotus, NNP) → NNP(Lotus, NNP)

**Vocabulary rules**

JJ(last, JJ) → last

NP(week, NN) → week

NNP(IBM, NNP) → IBM

VBD(bought,VBD) → bought

NP(Lotus, NNP) → Lotus

** Structural tree for the sentence** "Last week IBM bought Lotus"

**Figure 1.3** . Probabilistic context-free grammar and structure tree of the sentence “ *Last week IBM bought Lotus* ”

**1.1.4. ****Tree connection grammar**

With the advent of treebanks, rewrite operations on the grammar can no longer take place on strings, but on the structure tree.

The base element of the Tree Adjoining Grammar (TAG) is the basic tree [69]. The underlying trees are joined together through two rewrites, combine and replace. The intermediate trees that are generated when substitutions and joins are applied are called parse trees.

A full parse tree is an parse tree in which every leaf node is labeled as the termination symbol. The parsing for a sentence can be understood as: starting from a basic tree whose root is the axiom, find a complete parse tree with leaf nodes corresponding to the sequence of words in the sentence.

The TAG grammar is lexicalized to become LTAG (Lexicalized Tree Adjoining Grammar). This is also a completely lexicalized grammar. Every elementary tree has at least one leaf node associated with a lexical unit called an anchor word. In addition, the grammar also satisfies the following conditions:

- Each LTAG initialization tree represents the components of an anchor word (component modifiers for the anchor word).
- Basic trees are minimal: the initialization tree must have the word anchor that is the center word of a principal element in the sentence and contain all the imperative opposites of the word anchor [20].