码迷,mamicode.com
首页 > 编程语言 > 详细

字符串模式匹配算法 Sunday算法

时间:2017-10-26 18:55:33      阅读:242      评论:0      收藏:0      [点我收藏+]

标签:c语言   bcd   +=   pattern   部分   adc   result   href   sizeof   

  Sunday算法的思想类似于BM算法中的坏字符思想。差别在于Sunday算法在失配之后,是取目标串中当前和模式串匹配的部分后面一个位置的字符来做坏字符匹配。

  举例:

  技术分享

  BM算法在b与x失配后,坏字符为b(下标1),在模式串中寻找b的位置,找到之后对齐并继续匹配,见下图:

  技术分享

  Sunday算法在失配后,取目标串中和模式串匹配部分后面的一个字符,也就是e,然后用e来做坏字符匹配。e在模式串中没有,移动位置继续匹配,见下图:

  技术分享

  可以看出Sunday算法的位移比BM算法更大,所以Sunday算法的效率比BM算法更高。但是Sunday算法最坏的时间复杂度仍然是o(n*m)。考虑如下目标串:baaaabaaaabaaaabaaaa,在里面搜索aaaaa,没有匹配位置。如果用Sunday算法,坏字符大部分都是a,而模式串中又全部都是a,所以在大部分情况下,失配后模式串只能往右移动1位。而如果用改进的KMP算法,可以保证线性时间内匹配完。

  Sunday算法不要求固定地从左到右匹配或者从右到左匹配,因为失配后把目标串中后一个没有匹配过的字符当作坏字符。可以先统计模式串中字符出现的概率,每次使用概率最小的字符所在的位置进行比较,失配的概率较大,可以减少比较次数,加快匹配速度。

  举例:

  技术分享

  模式串中b只出现一次,a和c都出现了2次,所以先比较b所在的位置(只看模式串中的字符时,b失配的概率比较大)。

  Sunday算法最好情况下的时间复杂度是O(n),在匹配随机字符串时效率比其他匹配算法快。
  

  C语言实现:

 1 #include <stdio.h>
 2 #include <string.h>
 3 
 4 bool BadChar(const char *pattern, int nLen, int *pArray, int nArrayLen)
 5 {
 6     if (nArrayLen < 256)
 7     {
 8         return false;
 9     }
10     for (int i = 0; i < 256; i++)
11     {
12         pArray[i] = -1;
13     }
14     for (int i = 0; i < nLen; i++)
15     {
16         pArray[pattern[i]] = i;
17     }
18     return true;
19 }
20 
21 int SundaySearch(const char *dest, int nDLen,
22                  const char *pattern, int nPLen,
23                  int *pArray)
24 {
25     if (0 == nPLen)
26     {
27         return -1;
28     }
29     for (int nBegin = 0; nBegin <= nDLen-nPLen; )
30     {
31         int i = nBegin, j = 0; 
32         for ( ;j < nPLen && i < nDLen && dest[i] == pattern[j];i++, j++);
33         if (j == nPLen)
34         {
35             return nBegin;
36         }
37         if (nBegin + nPLen > nDLen)
38         {
39             return -1;
40         }
41         else
42         {
43             nBegin += nPLen - pArray[dest[nBegin+nPLen]];
44         }
45     }
46     return -1;
47 }
48 
49 void TestSundaySearch()
50 {
51     int nFind;
52     int nBadArray[256]  = {0};
53                                //        1         2         3         4
54                                //0123456789012345678901234567890123456789012345678901234
55     const char  dest[]      =   "abcxxxbaaaabaaaxbbaaabcdamno";
56     const char  pattern[][40] = {
57         "a",
58         "ab",
59         "abc",
60         "abcd",
61         "x",
62         "xx",
63         "xxx",
64         "ax",
65         "axb",
66         "xb",
67         "b",
68         "m",
69         "mn",
70         "mno",
71         "no",
72         "o",
73         "",
74         "aaabaaaab",
75         "baaaabaaa",
76         "aabaaaxbbaaabcd",
77         "abcxxxbaaaabaaaxbbaaabcdamno",
78     };
79 
80     for (int i = 0; i < sizeof(pattern)/sizeof(pattern[0]); i++)
81     {
82         BadChar(pattern[i], strlen(pattern[i]), nBadArray, 256);
83         nFind = SundaySearch(dest, strlen(dest), pattern[i], strlen(pattern[i]), nBadArray);
84         if (-1 != nFind)
85         {
86             printf("Found    \"%s\" at %d \t%s\r\n", pattern[i], nFind, dest+nFind);
87         }
88         else
89         {
90             printf("Found    \"%s\" no result.\r\n", pattern[i]);
91         }
92 
93     }}
94 
95 int main(int argc, char* argv[])
96 {
97     TestSundaySearch();
98     return 0;
99 }

  输出结果:

 1 Found    "a" at 0       abcxxxbaaaabaaaxbbaaabcdamno
 2 Found    "ab" at 0      abcxxxbaaaabaaaxbbaaabcdamno
 3 Found    "abc" at 0     abcxxxbaaaabaaaxbbaaabcdamno
 4 Found    "abcd" at 20   abcdamno
 5 Found    "x" at 3       xxxbaaaabaaaxbbaaabcdamno
 6 Found    "xx" at 3      xxxbaaaabaaaxbbaaabcdamno
 7 Found    "xxx" at 3     xxxbaaaabaaaxbbaaabcdamno
 8 Found    "ax" at 14     axbbaaabcdamno
 9 Found    "axb" at 14    axbbaaabcdamno
10 Found    "xb" at 5      xbaaaabaaaxbbaaabcdamno
11 Found    "b" at 1       bcxxxbaaaabaaaxbbaaabcdamno
12 Found    "m" at 25      mno
13 Found    "mn" at 25     mno
14 Found    "mno" at 25    mno
15 Found    "no" at 26     no
16 Found    "o" at 27      o
17 Found    "" no result.
18 Found    "aaabaaaab" no result.
19 Found    "baaaabaaa" at 6       baaaabaaaxbbaaabcdamno
20 Found    "aabaaaxbbaaabcd" at 9         aabaaaxbbaaabcdamno
21 Found    "abcxxxbaaaabaaaxbbaaabcdamno" at 0    abcxxxbaaaabaaaxbbaaabcdamno

 

  参考资料

  【模式匹配】之 —— Sunday算法

字符串模式匹配算法 Sunday算法

标签:c语言   bcd   +=   pattern   部分   adc   result   href   sizeof   

原文地址:http://www.cnblogs.com/WJQ2017/p/7738034.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!