码迷,mamicode.com
首页 > 其他好文 > 详细

Longest Common Substring

时间:2015-03-08 11:38:44      阅读:143      评论:0      收藏:0      [点我收藏+]

标签:

Problem Statement

Give two string $s_1$ and $s_2$, find the longest common substring (LCS). E.g: X = [111001], Y = [11011], the longest common substring is [110] with length 3.

One terse way is to use Dynamic Programming (DP) to analyze the complex problem.

Instead of dealing with irregular substring, we can first deal with substring indexed by last character.

Define $dp[i][j] =$ the length of longest common substring of $s_1[0$~$i]$ and $s_2[0$~$j]$ ending with $s1[i]$ and $s2[j]$.

Then, the maximum LCS length could be the maximum number in array $dp$.

In order to get the value of $dp[i][j]$, we need to know if $s1[i]$ == $s2[j]$. If it is, then the $dp[i][j] = dp[i-1][j-1]+1$, else it‘ll be zero. Thus:

dp[i][j] = (s1[i] == s2[j] ? (dp[i-1][j-1] + 1) : 0);

As we want to know the concrete string with LCM, we just need to do a few modifications.

When we get a larger $dp[i][j]$ than present maxLength, we‘ll update the maxLength by $dp[i][j]$.

if(dp[i][j] > maxLen)
    maxLen = dp[i][j];

At the same time, we can also record the starting index of the new longer substring. For string $s_1$, the beginning index of LCM is the present index $i$ adding 1 minus the length of LCM, i.e.

if(dp[i][j] > maxLen){
    maxLen = dp[i][j];
    maxIndex = i + 1 - maxLen; 
}

 

Finally, we need to initialize state of $dp$. That‘s simple:

for(int i = 0; i < s1.length(); ++i)
    dp[i][0] = (s1[i] == s2[0] ? 1 : 0);

for(int j = 0; j < s2.length(); ++j)
    dp[0][j] = (s1[0] == s2[j] ? 1 : 0);

 

 


The complete code is:

void (const string s1, const string s2, int &sIndex, int &length)
{
    n1 = s1.length();
    n2 = s2.length();
    
    if(0 == n1 || 0 == n2) 
    {
        sIndex = -1;
        length = 0;
        return;
    }
    
    // initialize dp
    vector<vector<int> > dp;
    for(int i = 0; i < n1; ++i){
        vector<int> tmp;
        tmp.push_back((s1[i] == s2[0] ? 1 : 0));
        for(int j = 1; j < n2; ++j)
        {
            if(0 == i){
                tmp.push_back((s1[0] == s2[j] ? 1 : 0));
            }else{
                tmp.push_back(0);
            }
        }
        
        dp.push_back(tmp);
    }
    
    // compute max length and index
    length = 0;
    for(int i = 1; i < n1; ++i){
        for(int j = 1; j < n2; ++j){
            if(st1[i] == st2[j])
                dp[i][j] = dp[i-1][j-1] + 1;
                
            if(dp[i][j] > length){
                length = dp[i][j];
                sIndex = i + 1 - length;
            }
        }
    }
    
}

 

Longest Common Substring

标签:

原文地址:http://www.cnblogs.com/kid551/p/4321392.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!