码迷,mamicode.com
首页 > 其他好文 > 详细

AC自动机+dp(CodeForces - 86C )

时间:2017-07-23 21:16:25      阅读:410      评论:0      收藏:0      [点我收藏+]

标签:sam   oid   imp   解决   har   dice   子串   导出   air   

"Multidimensional spaces are completely out of style these days, unlike genetics problems" — thought physicist Woll and changed his subject of study to bioinformatics. Analysing results of sequencing he faced the following problem concerning DNA sequences. We will further think of a DNA sequence as an arbitrary string of uppercase letters "A", "C", "G" and "T" (of course, this is a simplified interpretation).

Let w be a long DNA sequence and s1, s2, ..., sm — collection of short DNA sequences. Let us say that the collection filters w iff w can be covered with the sequences from the collection. Certainly, substrings corresponding to the different positions of the string may intersect or even cover each other. More formally: denote by |w| the length of w, let symbols of w be numbered from 1 to |w|. Then for each position i in w there exist pair of indices l, r (1 ≤ l ≤ i ≤ r ≤ |w|) such that the substring w[l ... r] equals one of the elements s1, s2, ..., sm of the collection.

Woll wants to calculate the number of DNA sequences of a given length filtered by a given collection, but he doesn‘t know how to deal with it. Help him! Your task is to find the number of different DNA sequences of length n filtered by the collection {si}.

Answer may appear very large, so output it modulo 1000000009.

Input

First line contains two integer numbers n and m (1 ≤ n ≤ 1000, 1 ≤ m ≤ 10) — the length of the string and the number of sequences in the collection correspondently.

Next m lines contain the collection sequences si, one per line. Each si is a nonempty string of length not greater than 10. All the strings consist of uppercase letters "A", "C", "G", "T". The collection may contain identical strings.

Output

Output should contain a single integer — the number of strings filtered by the collection modulo 1000000009 (109 + 9).

Example
Input
2 1
A
Output
1
Input
6 2
CAT
TACT
Output
2
Note

In the first sample, a string has to be filtered by "A". Clearly, there is only one such string: "AA".

In the second sample, there exist exactly two different strings satisfying the condition (see the pictures below).

题意:简单说一下吧,就是让你构造一个长度为n的字符串,在总共m个子串中选择,构造的字符串中的每一个字符都需要有子串提供,求构成的方案。

题解:我写这题也是煞费苦心啊,写了两个晚上,重构了n次,ac自动机掌握的不扎实啊。

讲一下我的思考过程吧,也就是我抄题解的过程。

首先,由于串的匹配,并且有多个串,就很容易(??)想到ac自动机这个算法。

然后既然是dp专题里的,就是用dp去做(formally这些数方案的差不多都是dp。。)

我们用区间dp的套路设 dp[i][j]表示匹配到i这个位置,选j的方案。发现并不能成功,于是我们再把j改成ac自动机上的节点,

于是方程就变成了dp[i][j]表示匹配到串i,到达ac自动机节点j的方案数。

然后会发现这样并不是最优解,因为串和串之间是有可能重叠的。

我们再设dp[i][j][k]表示。。。后面有k个剩余的答案。剩余表示这个串的后k个还没有匹配,接下来有可能有串匹配

于是我们非常容易推导出dp方程

dp[i][j][k]->dp[i+1][j][0]剩余k->0匹配上了。

dp[i][j][k]->dp[i+1][j][k+1]没有匹配上。

于是答案就是sigma dp[n][所有ac自动机的节点][0]

边界dp[0][0][0]=1;

于是我们就在o(100nm)的时间内解决了这个问题。

放上臭但是不长的代码。

#include<bits/stdc++.h>
#define int long long
#define N 1100
using namespace std;
const int mod =1e9+9;
int f[N][N][11],n,m,ch[N][4],val[N],fail[N],sz,root=0,mx=0;
void add(int &x,int y)
{
	x+=y;
	if(x<0) x+=mod;
	while(x>=mod) x-=mod;
}

int idx(char c)
{
	if(c==‘A‘) return 0;
	if(c==‘C‘) return 1;
	return c==‘T‘?3:2;
}

int newnode()
{
	for(int i=0;i<4;i++) ch[sz][i]=0;
	val[sz++]=0;return sz-1;
}

void insert(char *s,int len)
{
	int u=0;
	for(int i=0;i<len;i++)
	{
		int v=idx(s[i]);
		if(ch[u][v]==0) ch[u][v]=newnode();
		u=ch[u][v];
	}
	val[u]=max(val[u],len);
}
void build()
{
	queue<int> q;
	for(int i=0;i<4;i++)
	{
		if(ch[root][i]==0) ch[root][i]=root;
		else fail[ch[root][i]]=root,q.push(ch[root][i]);
	}
	while(!q.empty())
	{
		int u=q.front();q.pop();
		for(int i=0;i<4;i++)
		{
			int &v=ch[u][i];
			if(v==0) v=ch[fail[u]][i];
			else fail[v]=ch[fail[u]][i],q.push(v),val[v]=max(val[v],val[fail[v]]);
		}
	}
	
}

void dp()
{
	f[0][0][0]=1;
	for(int i=0;i<n;i++)
		for(int j=0;j<=sz;j++)
			for(int k=0,v;k<=mx;k++)if(v=f[i][j][k])
			{
				//printf ( "dp[%d][%d][%d] = %d\n" , i , j , k , f[i][j][k] ); 
				for(int p=0;p<4;p++)
				{
					int now=ch[j][p];
					if(val[now]>k) add(f[i+1][now][0],v);
					else if(k<mx) add(f[i+1][now][k+1],v);
				}
			}
			
	int ans=0;
	for(int i=0;i<=sz;i++) add(ans,f[n][i][0]);
	cout<<ans<<"\n";
//	for(int i=1;i<=sz;i++) cout<<val[i]<<"\n";
}
char str[200];
main()
{
	//freopen("1.txt","w",stdout);
	root=newnode();
	cin>>n>>m;
	for(int i=1,len;i<=m;i++) 
		scanf("%s",str),len=strlen(str),mx=max(mx,len),insert(str,len);
	build();
	dp();
	return 0;	
}

 

AC自动机+dp(CodeForces - 86C )

标签:sam   oid   imp   解决   har   dice   子串   导出   air   

原文地址:http://www.cnblogs.com/foreverpiano/p/7225867.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!