标签:
In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters ‘a‘, ‘x‘, ‘u‘ and ‘z‘ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a‘=0, ‘x‘=10, ‘u‘=110, ‘z‘=111}, or in another way as {‘a‘=1, ‘x‘=01, ‘u‘=001, ‘z‘=000}, both compress the string into 14 bits. Another set of code can be given as {‘a‘=0, ‘x‘=11, ‘u‘=100, ‘z‘=101}, but {‘a‘=0, ‘x‘=01, ‘u‘=011, ‘z‘=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.
Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2 <= N <= 63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] ... c[N] f[N]
where c[i] is a character chosen from {‘0‘ - ‘9‘, ‘a‘ - ‘z‘, ‘A‘ - ‘Z‘, ‘_‘}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (<=1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i] is the i-th character and code[i] is a string of ‘0‘s and ‘1‘s.
Output Specification:
For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.
Sample Input:7 A 1 B 1 C 1 D 3 E 3 F 6 G 6 4 A 00000 B 00001 C 0001 D 001 E 01 F 10 G 11 A 01010 B 01011 C 0100 D 011 E 10 F 11 G 00 A 000 B 001 C 010 D 011 E 100 F 101 G 110 A 00000 B 00001 C 0001 D 001 E 00 F 10 G 11Sample Output:
Yes Yes No No
#include <iostream> #include <string> #include <map> #include <queue> #include <vector> #include <algorithm> using namespace::std; struct HuffmanCodesNode { bool isLeaf; int frequence; char c; int d; HuffmanCodesNode *Child[2]; HuffmanCodesNode(char ch,int f){ Child[0]=NULL; Child[1]=NULL; c=ch; frequence=f; isLeaf=true; d=0; } HuffmanCodesNode(HuffmanCodesNode *h1=NULL,HuffmanCodesNode *h2=NULL):isLeaf(false),frequence(0),c(0),d(0){ Child[0]=h1; Child[1]=h2; } void clear(){ if(Child[0]!=NULL){Child[0]->clear();delete Child[0];} if(Child[1]!=NULL){Child[1]->clear();delete Child[1];} } int computeWPL(int depth,unsigned int &WPL){ if(Child[0]==NULL && Child[1]==NULL)WPL+=depth*this->frequence; else{ Child[0]->computeWPL(depth+1, WPL); Child[1]->computeWPL(depth+1, WPL); } return 0; } friend bool operator<(HuffmanCodesNode a,HuffmanCodesNode b){ return a.frequence>b.frequence; } }; unsigned int computeWPLwithFrequencyMap(map<char,int> &f){ priority_queue<HuffmanCodesNode> q; HuffmanCodesNode *p1,*p2; unsigned int WPL=0; for (map<char,int>::iterator iter=f.begin(); iter!=f.end(); ++iter) { HuffmanCodesNode temp(iter->first, iter->second); q.push(temp); } while (q.size()>1) { p1=new HuffmanCodesNode(q.top()); q.pop(); p2=new HuffmanCodesNode(q.top()); q.pop(); HuffmanCodesNode temp(p1,p2); temp.frequence=p1->frequence+p2->frequence; q.push(temp); } p1=new HuffmanCodesNode(q.top()); p1->computeWPL(0, WPL); return WPL; } bool compare(const string& a ,const string& b){ if (a.length() < b.length()) { return true; }else if(a.length() == b.length() && a < b){ return true; } return false; } bool checkPrefix(vector<string>& s){ sort(s.begin(), s.end(),compare); HuffmanCodesNode *root = new HuffmanCodesNode,*p=root; int t; bool check=false; for (vector<string>::iterator i=s.begin();i!=s.end();++i){ p=root; for (string::iterator iter = i->begin(); iter!=i->end(); iter++) { t=*iter-‘0‘; if(p->Child[t]==NULL)p->Child[t]=new HuffmanCodesNode; p=p->Child[t]; if(p->isLeaf){root->clear();return false;} } if(p->Child[0]!=NULL || p->Child[1]!=NULL)check=true; if(check){root->clear();return false;} p->isLeaf = true; } root->clear(); if (!check)return true; else return false; } int main(int argc,const char* argv[]) { ios::sync_with_stdio(false); int N;cin>>N; char c;int frequence; string s; map<char, int> f; for (int i=0; i<N; i++) { cin>>c; cin>>frequence; f[c]=frequence; } unsigned int WPL=computeWPLwithFrequencyMap(f); int M;cin>>M; for (int i=0; i<M; i++) { unsigned int t_WPL=0; vector<string> t; for (int j=0; j<N; j++) { cin>>c; cin>>s; t_WPL+=f[c]*s.size(); t.push_back(s); } if ( t_WPL == WPL && checkPrefix(t)){ cout<<"Yes"<<endl; }else{ cout<<"No"<<endl; } } }
Huffman code的特征有两点
1. WPL(带权路径长度)最小,这个我们可以构建一个Huffman树来计算。
2. 每个编码唯一不具二义性,也就是每个编码都不会是另一个编码的前缀。
我的代码思路是这样
我们首先把输入的权值保存在一个map里面,然后交给
unsigned int computeWPLwithFrequencyMap(map<char,int> &f)
来计算出WPL,函数是通过建立一个Huffman树然后遍历得到WPL值的,应该有更好的方法。
之后用得到的WPL来和每个同学的WPL比较,比我们大的肯定不是Huffman编码了。
之后检测是否有二义性有两个办法(我目前想到两个):
1.按编码长度排序(升序),穷举每个编码是否是后面编码的子串,也就是字符串比较,可以通过kmp比较。
2.建立trie树,看看每个编码的路径中是否有其他的编码。
我用的是办法2,我先用sort()函数进行了按编码长度的升序排列(偷懒了,也可以不用排序),然后依次建立trie,
把编码最后一位所在的节点进行标记。按照Huffman编码,这个肯定是叶节点,如果以后有编码路径经过它,
那这个肯定不是Huffman编码了。
标签:
原文地址:http://www.cnblogs.com/weierpeng/p/4392536.html