Huffman Codes

时间：2015-04-04 18:26:49 阅读：155 评论：0 收藏：0 [点我收藏+]

标签：

Huffman Codes (30)

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters ‘a‘, ‘x‘, ‘u‘ and ‘z‘ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a‘=0, ‘x‘=10, ‘u‘=110, ‘z‘=111}, or in another way as {‘a‘=1, ‘x‘=01, ‘u‘=001, ‘z‘=000}, both compress the string into 14 bits. Another set of code can be given as {‘a‘=0, ‘x‘=11, ‘u‘=100, ‘z‘=101}, but {‘a‘=0, ‘x‘=01, ‘u‘=011, ‘z‘=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2 <= N <= 63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {‘0‘ - ‘9‘, ‘a‘ - ‘z‘, ‘A‘ - ‘Z‘, ‘_‘}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (<=1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is a string of ‘0‘s and ‘1‘s.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Sample Input:

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output:

Yes
Yes
No
No

#include <iostream>
#include <string>
#include <map>
#include <queue>
#include <vector>
#include <algorithm>
using namespace::std;


struct HuffmanCodesNode {
    bool isLeaf;
    int frequence;
    char c;
    int d;
    HuffmanCodesNode *Child[2];

    HuffmanCodesNode(char ch,int f){
        Child[0]=NULL;
        Child[1]=NULL;
        c=ch;
        frequence=f;
        isLeaf=true;
        d=0;
    }
    HuffmanCodesNode(HuffmanCodesNode *h1=NULL,HuffmanCodesNode *h2=NULL):isLeaf(false),frequence(0),c(0),d(0){
        Child[0]=h1;
        Child[1]=h2;
    }

    void clear(){
        if(Child[0]!=NULL){Child[0]->clear();delete Child[0];}
        if(Child[1]!=NULL){Child[1]->clear();delete Child[1];}
        
    }
    int computeWPL(int depth,unsigned int &WPL){
        if(Child[0]==NULL && Child[1]==NULL)WPL+=depth*this->frequence;
        else{
            Child[0]->computeWPL(depth+1, WPL);
            Child[1]->computeWPL(depth+1, WPL);
        }
        return 0;
    }

    friend bool operator<(HuffmanCodesNode a,HuffmanCodesNode b){
        return a.frequence>b.frequence;
    }

};

unsigned int computeWPLwithFrequencyMap(map<char,int> &f){
    priority_queue<HuffmanCodesNode> q;
    HuffmanCodesNode *p1,*p2;
    unsigned int WPL=0;
    for (map<char,int>::iterator iter=f.begin(); iter!=f.end(); ++iter) {
        HuffmanCodesNode temp(iter->first, iter->second);
        q.push(temp);
    }
    while (q.size()>1) {
        p1=new HuffmanCodesNode(q.top());
        q.pop();
        p2=new HuffmanCodesNode(q.top());
        q.pop();
        HuffmanCodesNode temp(p1,p2);
        temp.frequence=p1->frequence+p2->frequence;
        q.push(temp);
    }
    p1=new HuffmanCodesNode(q.top());
    p1->computeWPL(0, WPL);
    return WPL;
}
bool compare(const string& a ,const string& b){
    if (a.length() < b.length()) {
        return true;
    }else if(a.length() == b.length() && a < b){
        return true;
    }
    return false;
}
bool checkPrefix(vector<string>& s){
    sort(s.begin(), s.end(),compare);
    HuffmanCodesNode *root = new HuffmanCodesNode,*p=root;
    int t;
    bool check=false;
    for (vector<string>::iterator i=s.begin();i!=s.end();++i){
        p=root;
        for (string::iterator iter = i->begin(); iter!=i->end(); iter++) {
            t=*iter-‘0‘;
            if(p->Child[t]==NULL)p->Child[t]=new HuffmanCodesNode;
            p=p->Child[t];
            if(p->isLeaf){root->clear();return false;}
        }
        if(p->Child[0]!=NULL || p->Child[1]!=NULL)check=true;
        if(check){root->clear();return false;}
        p->isLeaf = true;
    }
    root->clear();
    if (!check)return true;
    else return false;
}
int main(int argc,const char* argv[])
{
    ios::sync_with_stdio(false);
    
    int N;cin>>N;
    char c;int frequence;
    string s;
    map<char, int> f;
    for (int i=0; i<N; i++) {
        cin>>c;
        cin>>frequence;
        f[c]=frequence;
    }
    unsigned int WPL=computeWPLwithFrequencyMap(f);
    int M;cin>>M;
    for (int i=0; i<M; i++) {
        unsigned int t_WPL=0;
        vector<string> t;
        for (int j=0; j<N; j++) {
            cin>>c;
            cin>>s;
            t_WPL+=f[c]*s.size();
            t.push_back(s);
        }
        if ( t_WPL == WPL && checkPrefix(t)){
            cout<<"Yes"<<endl;
        }else{
            cout<<"No"<<endl;
        }
    }    
}

Huffman code的特征有两点

1. WPL（带权路径长度）最小，这个我们可以构建一个Huffman树来计算。

2. 每个编码唯一不具二义性，也就是每个编码都不会是另一个编码的前缀。

我的代码思路是这样

我们首先把输入的权值保存在一个map里面，然后交给

unsigned int computeWPLwithFrequencyMap(map<char,int> &f)

来计算出WPL，函数是通过建立一个Huffman树然后遍历得到WPL值的，应该有更好的方法。

之后用得到的WPL来和每个同学的WPL比较，比我们大的肯定不是Huffman编码了。

之后检测是否有二义性有两个办法（我目前想到两个）：

1.按编码长度排序（升序），穷举每个编码是否是后面编码的子串，也就是字符串比较，可以通过kmp比较。

2.建立trie树，看看每个编码的路径中是否有其他的编码。

我用的是办法2，我先用sort()函数进行了按编码长度的升序排列（偷懒了，也可以不用排序），然后依次建立trie，

把编码最后一位所在的节点进行标记。按照Huffman编码，这个肯定是叶节点，如果以后有编码路径经过它，

那这个肯定不是Huffman编码了。

Huffman Codes

标签：

原文地址：http://www.cnblogs.com/weierpeng/p/4392536.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行