标签:str first tin stat ble exit begin sam delete
Our task is to modify the two strings with three operations: (1) deleting one character, (2) inserting one character, (3) substituting a character with another, with the minimum edit distance (the least edit times). It seems that it can be solved using dynamic programming. To use this strategy we should first try to find out the optimal substructure and its overlapping subproblems.
Assume that the lengths of string A
and string B
are \(m\) and \(n\). Let‘s first try to make the last characters of the two strings identical. We should perform some operations if the last characters of the two strings are not the same. If they are the same, we should only consider their substrings, the \(1^{st}\) one to the \((m - 1)^{th}\) one for string A and the \(1^{st}\) one to the \((n - 1)^{th}\) one for string B.
We can both substitute the last character of string A
with the last character of string B
and vice versa. Let‘s first consider the first situation. The original strings in the question are:
fxpimu
xwrs
We substitute "u" in string A
with "s" in string B
. Thus the strings now are:
fxpims
xwrs
Since the last characters of the two strings are identical now, we should merely find out the minimum edit distance for their substrings, the \(1^{st}\) one to the \((m - 1)^{th}\) one for string A
and the \(1^{st}\) one to the \((n - 1)^{th}\) one for string B
. The situation for substiting the last character of string B
with the last one of string A
is similar.
We can both insert the last character of string A
to the end of string B
, and vice versa. First let‘s consider the situation that the last character of string B
is to be inserted to the end of string A
. After the insertion the strings will be:
fxpimus
xwrs
The lengths of the two strings now are \(m + 1\) and \(n\). The last characters of the two strings are identical now, so we can consider the minimum edit distance of their substrings. The \(1^{st}\) one to the \(m^{th}\) (the length of string A
is increased by one after the insertion, which is \(m + 1\). Since we are to omit the last character and consider its substring now, the last character of the substring is the \(m^{th}\) one.) for string A
and the \(1^{st}\) one to the \((n - 1)^{th}\) one for string B
. The other situation that the last character of string A
is inserted to the end of string B
is similar.
Both insertion and substitution can make the last character of the two strings identical, but that‘s not the case for the deletion operation. After deleting one character of the string, the last character of the modified string may not be the same as the last character of the other string. Concretely, if we delete the last character "u" in string A
, the strings will be like:
fxpim
xwrs
The last characters are still not identical. Therefore, we should still consider the whole length of the two strings. That is, the \(1^{st}\) character to the \((m - 1)^{th}\) character for string A
(because the last character of string A
is deleted, the length of it is decreased by one.), and the \(1^{st}\) character to the \(n^{th}\) one for string B
. Deleting the last character of string B
will be similar.
When one of the strings are empty, the minimum edit distance will be the length of the non-empty string. The empty string can be edited to be the non-empty string by inserting characters of the non-empty string into the empty string, one by one from the end.
And when both of the strings are empty, we need to do nothing.
According to the analysis above we can find out the optimal substructure of the task. The optimal solution of the original task depends on the solution of its subtasks.
The task also has many overlapping subproblems. Let‘s say we want to find out the minimum edit distance of the substrings of two strings, we have to find out the minimum distance of their subsubstrings. And when we are to find out the minimum edit distance of two strings, we find the minimum distance of their substrings, whose minimum distance depends on their subsubstrings. Thus here the minimum edit distance of the subsubstrings are calculated twice.
Let‘s say the minimum edit distance of string A
and string B
with lengths \(m\) and \(n\) is edit[m][n]
. With the optimal substucture, we can work out the equation for the task:
\[
edit[i][j]=
\left
\{
\begin{aligned}
& 0 & {i = 0, j = 0} & \& n & {i = 0, j > 0} & \& m & {i > 0, j = 0} \& min\{edit[i - 1][j] + 1, edit[i][j - 1] + 1, edit[i - 1][j - 1] + notIdentical\} & {i > 0, j > 0}
\end{aligned}
\right.
\]
where notIdentical
is like:
\[
notIdentical=
\left
\{
\begin{aligned}
& 0 & {A[i] = B[j]} & \& 1 & {A[i] \neq B[j]} & \\end{aligned}
\right.
\]
Note that \(min(edit[i - 1][j] + 1\) is for the case that (1) deleting the last character of string A
and that (2) inserting the last character of string A
to the end of string B
. \(edit[i][j - 1] + 1\) is for the case that (1) deleting the last character of string B
and that (2) inserting the last character of string B
to the end of string A
. \(edit[i - 1][j - 1] + notIdentical\) is for the case that (1) the last characters of the two strings are identical and (2) the substitution operation is performed.
The equation can be simplefied as:
\[
edit[i][j]=
\left
\{
\begin{aligned}
& i == 0\;?\;i\;:\;j & {i == 0\;||\;j == 0} & \ & min\{edit[i - 1][j] + 1, \\
& \qquad edit[i][j - 1] + 1, \\
& \qquad edit[i - 1][j - 1] + int(!(A[i] == B[j])\} & {i > 0, j > 0}
\end{aligned}
\right.
\]
What we should do when solving a dynamic programming task is merely filling a table. What we should know are:
Since we use a 2-dimensional matrix edit[i][j]
to store the solutions, the table is 2D.
The range of the indices of the solution is \(0 \leq i \leq m\), \(0 \leq j \leq n\). Thus the whole table should be filled.
To calculate edit[i][j]
, we should first calculate edit[i - 1][j]
, edit[i][j - 1]
and edit[i - 1][j - 1]
. Let‘s find out their related position in the table:
\[
\begin{matrix}
edit[i - 1][j - 1] & edit[i - 1][j] \\edit[i][j - 1] & *edit[i][j]* \\end{matrix}
\]
thus the order is from left to right, from the top to the bottem.
After considering how the table should be filled, we can start writing code with the equation.
for (int i = 0; i <= stringA.length(); i++) { // from the top to the bottom
for (int j = 0; j <= stringB.length(); j++) { // from left to right
if (i && j) { // i > 0 and j > 0
// (1) delete A[m - 1]
// (2) insert A[m - 1] to B[n]
int tmpEditTimes1 = editTimes[i - 1][j] + 1;
// (1) delete B[n - 1]
// (2) insert B[n - 1] to A[m]
int tmpEditTimes2 = editTimes[i][j - 1] + 1;
// (1) A[m - 1] == B[n - 1]
// (2) substitution
int tmpEditTimes3 = editTimes[i - 1][j - 1] +
int(!(stringA[i - 1] == stringB[j - 1]));
// find out the smallest edit distance
editTimes[i][j] = min(
tmpEditTimes1,
tmpEditTimes2,
tmpEditTimes3);
}
else { // i = 0 or j = 0 or both equal 0
editTimes[i][j] = i == 0 ? j : i;
}
}
}
#include <iostream>
#include <string>
using namespace std;
int editTimes[2001][2001];
int min(int a, int b);
int min(int a, int b, int c);
int main(void) {
// receive string A
string stringA;
getline(cin, stringA);
// receive string B
string stringB;
getline(cin, stringB);
// fill the table
for (int i = 0; i <= stringA.length(); i++) { // from the top to the bottom
for (int j = 0; j <= stringB.length(); j++) { // from left to right
if (i && j) { // i > 0 and j > 0
// (1) delete A[m - 1]
// (2) insert A[m - 1] to B[n]
int tmpEditTimes1 = editTimes[i - 1][j] + 1;
// (1) delete B[n - 1]
// (2) insert B[n - 1] to A[m]
int tmpEditTimes2 = editTimes[i][j - 1] + 1;
// (1) A[m - 1] == B[n - 1]
// (2) substitution
int tmpEditTimes3 = editTimes[i - 1][j - 1] +
int(!(stringA[i - 1] == stringB[j - 1]));
// find out the smallest edit distance
editTimes[i][j] = min(
tmpEditTimes1,
tmpEditTimes2,
tmpEditTimes3);
}
else { // i = 0 or j = 0 or both equal 0
editTimes[i][j] = i == 0 ? j : i;
}
}
}
// display the minimum edit distance
cout << editTimes[stringA.length()][stringB.length()];
return 0;
}
int min(int a, int b) {
return a < b ? a : b;
}
int min(int a, int b, int c) {
return min(min(a, b), c);
}
In the code which fills the table, there are two loops, and statements in the loops all have a time complexity of \(O(1)\). Thus the time complexity is
\[T(m, n) = O(m) * O(n) = O(mn)\]
where \(m\) and \(n\) is the lengths of the two strings, respectively.
We used a 2-dimensional array whose size is \((m + 1) * (n + 1)\). Thus the space complexity is
\[S(m, n) = O(m) * O(n) = O(mn)\]
To find out the solution to the problems of this chapter, we should:
The 3 things needed to determine to solve a dynamic programming problem is:
Everything gets simple if we finish the steps above.
reference: 动态规划之编辑距离问题
Minimum Edit Distance with Dynamic Programming
标签:str first tin stat ble exit begin sam delete
原文地址:https://www.cnblogs.com/Chunngai/p/11691697.html