When
the problem to match S string in T string is mentioned, people always
put KMP, Aho-Corasick and Suffixarray forward. But Mr Liu tells Canoe
that there is an algorithm called Burrows–Wheeler Transform(BWT) which
is quite amazing and high-efficiency to solve the problem.
But how does BWT work to solve the matching S-in-T problem? Mr Liu tells Canoe the firstly three steps of it.
Firstly,
we append the ‘$’ to the end of T and for convenience, we still call
the new string T. And then for every suffix of T string which starts
from i, we append the prefix of T string which ends at (i – 1) to its
end. Secondly, we sort these new strings by the dictionary order. And we
call the matrix formed by these sorted strings Burrows Wheeler Matrix.
Thirdly, we pick characters of the last column to get a new string. And
we call the string of the last column BWT(T). You can get more
information from the example below.
Then
Mr Liu tells Canoe that we only need to save the BWT(T) to solve the
matching problem. But how and can it? Mr Liu smiles and says yes. We can
find whether S strings like “aac” are substring of T string like
“acaacg” or not only knowing the BWT(T)! What an amazing algorithm BWT
is! But Canoe is puzzled by the tricky method of matching S strings in T
string. Would you please help Canoe to find the method of it? Given
BWT(T) and S string, can you help Canoe to figure out whether S string
is a substring of string T or not?
There are multiple test cases.
First Line: the BWT(T) string (1 <= length(BWT(T)) <= 100086).
Second Line: an integer n ( 1 <=n <= 10086) which is the number of S strings.
Then n lines comes.
There is a S string (n * length(S) will less than 2000000, and all characters of S are lowercase ) in every line.
For
every S, if S string is substring of T string, then put out “YES” in a
line. If S string is not a substring of T string, then put out “NO” in a
line.
#include <cstdio>
#include <iostream>
#include <cstring>
#include <vector>
#include <algorithm>
using namespace std;
typedef long long ll;
struct node
{
int id;
char r;
}str[100186];
char s[100186];
char str2[100186];
char T[2000100];
int Next[2000100];
int tlen;
bool cmp(node x,node y)
{
return x.r<y.r;
}
void getNext()
{
int j, k;
j = 0; k = -1; Next[0] = -1;
while(j < tlen)
if(k == -1 || T[j] == T[k])
Next[++j] = ++k;
else
k = Next[k];
}
bool KMP_Index(char S[],int slen)
{
int i = 0, j = 0;
getNext();
while(i < slen && j < tlen)
{
if(j == -1 || S[i] == T[j])
{
i++; j++;
}
else
j = Next[j];
}
if(j == tlen)
return true;
else
return false;
}
int main()
{
int n;
while(scanf("%s",s)!=EOF)
{
int len=strlen(s);
for(int i=0;i<len;i++)
{
str[i].id=i;
str[i].r=s[i];
}
stable_sort(str,str+len,cmp);
int now=0;
for(int i=0;i<len-1;i++)
{
now=str[now].id;
str2[i]=str[now].r;
}
len=len-1;
str2[len]=0;
scanf("%d",&n);
while(n--)
{
scanf("%s",T);
tlen=strlen(T);
getNext();
if( KMP_Index(str2,len))puts("YES");
else puts("NO");
}
}
return 0;
}