标签:pattern ons useful search duplicate sed sel 传递闭包 tps
也许是OI生涯最后一场正式比赛了,说是省选前模板,其实都是非常基础的东西,穿插了英文介绍和部分代码实现
祝各位参加JXOI2019的都加油吧
也希望今年JX能翻身,在国赛中夺金
KMP算法,又称模式匹配算法,是用来在一个文本串(text string)s
中找到所有模式串(pattern)w
出现的位置.
它是通过当失配(mismatch)发生时,模式串本身能提供足够的信息来决定下一个匹配能从哪里开始,这样就可以避开对前面已经匹配好的字符的再次遍历检查(即暴力做法)
"In computer science, the Knuth–Morris–Pratt string-searching algorithm (or KMP algorithm) searches for occurrences of a "word" W
within a main "text string" S
by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters."
来源: 维基百科
我们需要构造一个函数next
,next[n]
表示的是模式串w
中以第n个字符结尾的非前缀子串(substring)与w
的前缀(prefix)能够匹配的最大长度
具体实现过程与思路见OI WIKI
简单说一下比较难懂的在w[i]!=w[j+1]
,为什么指针j
跳向next[j]
显然w[1~j]
是等于i-1
前j
个字母的字串的,但是我们比较发现w[j+1]!=w[i]
我们想要的新的j
是使得除了第\(j+1\)个字符需要与w[i]
比较外,w[1]~w[j]
的前缀等于i-1
(包括i-1
)前j
个字母的后缀(suffix)
又由于i-1
前任意k(k<=next[j])
个字符后缀等于next[j]
前任意k
个字母的后缀
又因为w[1~next[next[j]]]
等于next[j]
前next[next[j]]
长度的后缀,即等于i-1
前next[next[j]]
长度的后缀
所以新的j=next[j]
就满足上述条件
n=strlen(a+1);m=strlen(b+1);//a is the text string
ne[1]=0;//next
for(ri i=2,j=0;i<=m;i++){
while(j>0&&b[i]!=b[j+1])j=ne[j];
if(b[i]==b[j+1])j++;
ne[i]=j;
}
for(ri i=1,j=0;i<=n;i++){
while(j>0&&a[i]!=b[j+1])j=ne[j];
if(a[i]==b[j+1])j++;
fail[i]=j;
if(fail[i]==m){
printf("%d\n",i-m+1);
fail[i]=1;
j=ne[j];
}
}
不想中文介绍了QAQ,OI中常用于数据的离散化(Discretization)
关于更多离散化内容
但是哈希函数有可能发生哈希冲突(Hash Collision),为了避免这种情况,我们可以采取双哈希或是拉链
"A hash function is any function that can be used to map data of arbitrary size onto data of a fixed size. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. Hash functions are often used in combination with a hash table, a common data structure used in computer software for rapid data lookup. Hash functions accelerate table or database lookup by detecting duplicated records in a large file. One such application is finding similar stretches in DNA sequences. They are also useful in cryptography."
来源:维基百科
字符串哈希我一般用Rabin-Karp Hash,更多相关内容在链接里
#define ull unsigned long long
ull x,hash_table[maxn];
char str[maxn];
void make_hash(int n){
x=0;
for(ri i=0;str[i];i++){
x=x*131+a[i]-31;
hash_table[i]=x;
}
return ;
}
对于两个串,我们可以将它们按上述方法映射(mapping)到哈希表后通过二分(binary serach)或是倍增(QAQ这个英语怎么说啊)来快速匹配
主要根据这个可以\(O(1)\)地判断两个字串是否相等
\(w_{str_{i,j}}\)
\(=(\) \(a_i\) \(p^{j-i}\)+\(a_{i+1}\) \(p^{j-i-1}\)+...+\(a_{j}\) \(p^0\))
\(=\) \(w_{pre_{j}}\) \(-\) \(w_{pre_{i-1}}\) \(p^{j-i+1}\)
栈是只有一端能进出元素的线性数据结构,所以它是后进后出(LIFO,Last In First Out)的
有入栈(push)与出栈(pop)两种基本操作
"In computer science, a stack is an abstract data type that serves as a collection of elements, with two principal operations:
The order in which elements come off a stack gives rise to its alternative name, LIFO (last in, first out). Additionally, a peek operation may give access to the top without modifying the stack.[1]The name "stack" for this type of structure comes from the analogy to a set of physical items stacked on top of each other, which makes it easy to take an item off the top of the stack, while getting to an item deeper in the stack may require taking off multiple other items first.[2]"
来源:维基百科
队列是一种"先进先出(FIFO,First In Fitst Out)"的线性数据结构.一般元素从右端入队,从左端出队.为了节省空间实现上可以采用循环队列
(这几个都感觉没什么好说的)
讲一下邻接表(Adjacency List)吧,它可以看成是"带有索引数组的多个数据链表",这种所引就是链表的表头,表头又构成了一个表头(headers)数组
存图常用邻接表或是邻接矩阵(Adjacency Matrix)
邻接表
int num_edge=1;
struct Edge{
int ne,to,dis;
}edge[maxm];
void add_edge(int f,int to,int dis){
edge[++num_edge].ne=h[f];
edge[num_edge].to=to;
edge[num_edge].dis=dis;
h[f]=num_edge;
return;
}
字典树是一种用于实现字符串快速检索的多叉树结构,其节点都拥有字符指针.
"Trie is an efficient information reTrieval data structure.Using trie,search complexities can be brought too optimal limit(O(M),M is key length).However the penalty is on Trie storage requirements."
int trie[maxn][27],tot=1;
bool end[maxn][27];
void Insert(){
int t=tot,x;
for(ri i=1;i<=len;i++){
x=str[i]-'a';
if(trie[t][x]==0)trie[t][x]=++tot;
t=trie[t][x];
}
end[t][x]=1;
return ;
}
bool Match_Ok(){
int t=tot,x;
for(ri i=1;i<=len;i++){
x=str[i]-'a';
t=trie[t][x];
if(!t)return false;
}
return end[t][x];
}
二叉堆是一颗满足堆性质的完全二叉树.根据排序方式可以分为大根堆(Max-Heap)和小根堆(Min-Heap)
对于大根堆,任意一个非根节点的权值都小于等于其父节点的权值;小根堆则反之
"A binary heap is a complete binary tree which satisfies the heap ordering property.The ordering can be one of two types:the min-heap property and the max-heap property.The value of each node of the max-heap is less than or equal to the value of its parent,with the maximum-value element at the root"
来源:cs.cmu.edu
int heap[maxn],n;//max-heap
void up_modify(int p){
while(p>1){
if(heap[p]>heap[p>>1]){
swap(heap[p],heap[p>>1]);
p=p>>1;
}
else break;
}
return ;
}
void insert(int val){
heap[++n]=val;
up(n);
}
void down(int p){
int s=p<<1;
while(s<=n){
if(s<n&&heap[s]<heap[s+1])s++;//choost the maximum one of the two sons
if(heap[s]>heap[p]){
swap(heap[s],heap[p]);
p=s,s=s<<1;
}
else break;
}
return;
}
void extract_top(){
heap[1]=heap[n--];
down(1);
return ;
}
咕
DIjkstra算法是为了解决单源最短路径问题(Single Source Shortest Path,SSSP)的,简单来说,就是求原点到其它所有点(vertex pl.vertices)的最短路(shortest path)
它是基于一个贪心思想,要求所有边都是非负的.大致就是不断拓展一颗最短路生成树
"Dijkstra‘s algorithm is very similar to Prim‘s algorithm for minimum spanning tree(MST).Like Prim‘s MST,we generate a SPT(shortest path tree) with given source as root.We maintain two sets,one set contain vertices included in the shortest path tre,other set includes vertices not yet included in shortest path tree.At every step,we find a vertex which is in the other set and has a mimnum distance from the source."
#define ri register int
struct Vertex{
int id,d;
bool operator <(const Vertex & a)const{
return d>a.d;
}
Vertex(){;}
Vertex(int x,int y){id=x,d=y;}
};
priority_queue <Vertex> q;
void add_edge(int f,int to,int dis){
edge[++num_edge].ne=h[f];
edge[num_edge].to=to;
edge[num_edge].dis=dis;
h[f]=num_edge;
}
int dis[maxn];
bool vis[maxn];
void dijkstra(int s){
int u,v,w;
for(ri i=1;i<=n;i++){
dis[i]=INF,vis[i]=0;
}
dis[s]=0;
q.push(Vertex(s,0));
while(q.size()){
u=q.top().id,w=q.top().d;
q.pop();
if(vis[u])continue;
for(ri i=h[u];i;i=edge[i].ne){
v=edge[i].to;
if(dis[v]>dis[u]+w){
dis[v]=dis[u]+w;
q.push(Vertex(v,dis[v]))
}
}
vis[u]=1;
}
for(ri i=1;i<=n;i++)printf("%d\n",dis[i]);
return ;
}
忽然发现ddl干不完了,因此介绍咕了
queue <int> q;
void spfa(){
int u,v,w;
for(int i=1;i<=n;i++)dis[i]=INF,vis[i]=0;
dis[s]=0,vis[s]=1;
q.push(s);
while(q.size()){
u=q.top();q.pop();
vis[u]=0;
for(int i=h[u];i;i=edge[i].ne){
v=edge[i].to,w=edge[i].dis;
if(dis[v]>dis[u]+w){
dis[v]=dis[u]+w;
if(!dis[v]){q.push(v);dis[v]=1;}
}
}
}
for(int i=1;i<=n;i++)printf("%d\n",dis[i]);
return ;
}
Floyd算法是基于动态规划思想来解决任意两点之间最短路径(All Pairs Shortest Path problem,APSP)的算法
亦可用于可传递的关系(传递闭包),这个也先挖坑吧
memset(d,0x3f,sizeof(d));
for(int i=1;i<=n;i++)dis[i][i]=0;
for(int i=1;i<=m;i++){
scanf("%d %d %d",&x,&y,&z);
dis[x][y]=min(dis[x][y],z);
}
for(int k=1;k<=n;k++){
for(int i=1;i<=n;i++){
for(int j=1;j<=n;j++){
dis[i][j]=min(dis[i][j],dis[i][k]+dis[k][j]);
}
}
}
Kruskal算法是求出一张给定联通边带权无向图的最小生成树算法.那么什么是最小生成树?
"Given a connected and undirected graph, a spanning tree of that graph is a subgraph that is a tree and connects all the vertices together. A single graph can have many different spanning trees. A minimum spanning tree (MST) or minimum weight spanning tree for a weighted, connected and undirected graph is a spanning tree with weight less than or equal to the weight of every other spanning tree. The weight of a spanning tree is the sum of weights given to each edge of the spanning tree."
来源:Geeksforgeeks 强烈安利这个网站,写的浅显易懂而且最重要的是,it is not blocked
Kruskal算法维护无向图的最小生成森林,每次在连接两个森林的边中找一条权值最小的将两个森林连起来.对于连通性我们使用并查集维护
struct Edge{
int u,v,dis;
bool operator <(const Edge & a)const {
return dis<a.dis;
}
}edge[maxm];
int fa[maxn];
int get(int x){
if(fa[x]!=x)fa[x]=get(fa[x]);
return fa[x];
}
void kruskal(){
int u,v,w,cnt=0;
for(int i=1;i<=m;i++){
scanf("%d %d %d",&edge[i].u,&edge[i].v,&edge[i].dis);
}
sort(edge+1,edge+1+m);
for(int i=1;i<=m;i++){
u=edge[i].u,v=edge[i].v;
u=get(u),v=get(v);
if(u!=v){
fa[u]=v;
cnt++;
}
if(cnt==n-1){
break;
}
}
return ;
}
详细介绍咕了
将树上操作转化为线段树,由于重链数量级是\(log N\)的,线段树时间复杂度是\(log N\),总时间复杂度是\(log^2 N\)
还可以同时求LCA,但是操作时有许多细节需要注意,尤其是dfn[]
和rnk[]
int fa[maxn],dfn[maxn],rnk[maxn],w[maxn],dep[maxn],top[maxn],son[maxn],size[maxn];
int cnt=0;
void dfs_1(int now){
int v;size[now]=1;
for(int i=h[now];i;i=edge[i].to){
v=edge[i].to;
if(v==fa[now])continue;
fa[v]=now,dep[v]=dep[now]+1;
dfs_1(v);
size[now]+=size[v];
if(!son[now]||size[v]>size[son[now]])son[now]=v;
}
return ;
}
void dfs_2(int now,int t){
int v;
top[now]=t,dfn[now]=++cnt,rnk[cnt]=now;
if(!son[now])return ;
dfs_2(son[now],t);
for(int i=h[now];i;i=edge[i].ne){
v=edge[i].to;
if(v==son[now]||v==fa[now])continue;
dfs_2(v,v);
}
return ;
}
int L,R;
void operation_on_path(int x,int y){
while(top[x]!=top[y]){
if(dep[top[x]]<dep[top[y]])swap(x,y);
L=dfn[top[x]],R=dfn[x];
operation_on_segmenttree(L,R);
x=fa[top[x]];
}
if(dfn[x]>dfn[y])swap(x,y);//x is the LCA
L=dfn[x],R=dfn[y];
operation_on_segmenttree(L,R);
return;
}
void operation_on_subtree(int x){
L=dfn[x],R=dfn[x]+size[x]-1;
operation_on_segmenttree(L,R);
return ;
}
估计要咕了
标签:pattern ons useful search duplicate sed sel 传递闭包 tps
原文地址:https://www.cnblogs.com/Rye-Catcher/p/10847144.html