标签:sam oid imp 解决 har dice 子串 导出 air
"Multidimensional spaces are completely out of style these days, unlike genetics problems" — thought physicist Woll and changed his subject of study to bioinformatics. Analysing results of sequencing he faced the following problem concerning DNA sequences. We will further think of a DNA sequence as an arbitrary string of uppercase letters "A", "C", "G" and "T" (of course, this is a simplified interpretation).
Let w be a long DNA sequence and s1, s2, ..., sm — collection of short DNA sequences. Let us say that the collection filters w iff w can be covered with the sequences from the collection. Certainly, substrings corresponding to the different positions of the string may intersect or even cover each other. More formally: denote by |w| the length of w, let symbols of w be numbered from 1 to |w|. Then for each position i in w there exist pair of indices l, r (1 ≤ l ≤ i ≤ r ≤ |w|) such that the substring w[l ... r] equals one of the elements s1, s2, ..., sm of the collection.
Woll wants to calculate the number of DNA sequences of a given length filtered by a given collection, but he doesn‘t know how to deal with it. Help him! Your task is to find the number of different DNA sequences of length n filtered by the collection {si}.
Answer may appear very large, so output it modulo 1000000009.
First line contains two integer numbers n and m (1 ≤ n ≤ 1000, 1 ≤ m ≤ 10) — the length of the string and the number of sequences in the collection correspondently.
Next m lines contain the collection sequences si, one per line. Each si is a nonempty string of length not greater than 10. All the strings consist of uppercase letters "A", "C", "G", "T". The collection may contain identical strings.
Output should contain a single integer — the number of strings filtered by the collection modulo 1000000009 (109 + 9).
2 1
A
1
6 2
CAT
TACT
2
In the first sample, a string has to be filtered by "A". Clearly, there is only one such string: "AA".
In the second sample, there exist exactly two different strings satisfying the condition (see the pictures below).
题意:简单说一下吧,就是让你构造一个长度为n的字符串,在总共m个子串中选择,构造的字符串中的每一个字符都需要有子串提供,求构成的方案。
题解:我写这题也是煞费苦心啊,写了两个晚上,重构了n次,ac自动机掌握的不扎实啊。
讲一下我的思考过程吧,也就是我抄题解的过程。
首先,由于串的匹配,并且有多个串,就很容易(??)想到ac自动机这个算法。
然后既然是dp专题里的,就是用dp去做(formally这些数方案的差不多都是dp。。)
我们用区间dp的套路设 dp[i][j]表示匹配到i这个位置,选j的方案。发现并不能成功,于是我们再把j改成ac自动机上的节点,
于是方程就变成了dp[i][j]表示匹配到串i,到达ac自动机节点j的方案数。
然后会发现这样并不是最优解,因为串和串之间是有可能重叠的。
我们再设dp[i][j][k]表示。。。后面有k个剩余的答案。剩余表示这个串的后k个还没有匹配,接下来有可能有串匹配
于是我们非常容易推导出dp方程
dp[i][j][k]->dp[i+1][j][0]剩余k->0匹配上了。
dp[i][j][k]->dp[i+1][j][k+1]没有匹配上。
于是答案就是sigma dp[n][所有ac自动机的节点][0]
边界dp[0][0][0]=1;
于是我们就在o(100nm)的时间内解决了这个问题。
放上臭但是不长的代码。
#include<bits/stdc++.h> #define int long long #define N 1100 using namespace std; const int mod =1e9+9; int f[N][N][11],n,m,ch[N][4],val[N],fail[N],sz,root=0,mx=0; void add(int &x,int y) { x+=y; if(x<0) x+=mod; while(x>=mod) x-=mod; } int idx(char c) { if(c==‘A‘) return 0; if(c==‘C‘) return 1; return c==‘T‘?3:2; } int newnode() { for(int i=0;i<4;i++) ch[sz][i]=0; val[sz++]=0;return sz-1; } void insert(char *s,int len) { int u=0; for(int i=0;i<len;i++) { int v=idx(s[i]); if(ch[u][v]==0) ch[u][v]=newnode(); u=ch[u][v]; } val[u]=max(val[u],len); } void build() { queue<int> q; for(int i=0;i<4;i++) { if(ch[root][i]==0) ch[root][i]=root; else fail[ch[root][i]]=root,q.push(ch[root][i]); } while(!q.empty()) { int u=q.front();q.pop(); for(int i=0;i<4;i++) { int &v=ch[u][i]; if(v==0) v=ch[fail[u]][i]; else fail[v]=ch[fail[u]][i],q.push(v),val[v]=max(val[v],val[fail[v]]); } } } void dp() { f[0][0][0]=1; for(int i=0;i<n;i++) for(int j=0;j<=sz;j++) for(int k=0,v;k<=mx;k++)if(v=f[i][j][k]) { //printf ( "dp[%d][%d][%d] = %d\n" , i , j , k , f[i][j][k] ); for(int p=0;p<4;p++) { int now=ch[j][p]; if(val[now]>k) add(f[i+1][now][0],v); else if(k<mx) add(f[i+1][now][k+1],v); } } int ans=0; for(int i=0;i<=sz;i++) add(ans,f[n][i][0]); cout<<ans<<"\n"; // for(int i=1;i<=sz;i++) cout<<val[i]<<"\n"; } char str[200]; main() { //freopen("1.txt","w",stdout); root=newnode(); cin>>n>>m; for(int i=1,len;i<=m;i++) scanf("%s",str),len=strlen(str),mx=max(mx,len),insert(str,len); build(); dp(); return 0; }
标签:sam oid imp 解决 har dice 子串 导出 air
原文地址:http://www.cnblogs.com/foreverpiano/p/7225867.html