PTA Huffman Codes

时间：2016-05-13 00:25:31 阅读：223 评论：0 收藏：0 [点我收藏+]

标签：

题目重现

In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification

Each input file contains one test case. For each case, the first line gives an integer $N$ ( $2\le N\le 63$ ), then followed by a line that contains all the $N$ distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer $M$ ( $\le 1000$ ), then followed by $M$ student submissions. Each student submission consists of $N$ lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 ‘0’s and ‘1’s.

Output Specification

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

Sample Input

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output

Yes
Yes
No
No

题目大意

给定词频序列，判定给定的若干组编码方式是否与Huffman编码等效。

要点有两个，一个是要求编码不产生歧义，另一个是总编码长度最短。

解法

求词频序列的Huffman编码长度

单独将词频序列提出来，可以产生一个唯一的Huffman编码长度，这也是最优的长度。

按照Huffman算法，每次提取两个最小的，合并，最后就可以得到这个最短长度。

这里可以使用插入排序，也可以直接用优先队列加速。

判定是否有编码歧义

根据给出的编码方式，构造一个Trie Tree（字典树）。这个字典树的每个节点要存：

bool isVisited; // 是否被访问过
bool isMarked; // 是否被标记占用
Trie *next[2]; // 指向下一级节点

当按照字符串构造时，注意沿途做如下标记：

每当访问到一个节点，isVisited = true;

每当抵达终点，使isMarked = true;

即：

具有 isVisited 标记的节点不能是新编码的终点，否则新编码就是某个编码的前缀子码。
当经过isMarked 标记时，中断，否则某个编码一定是新编码的前缀子码。

这样就可以保证所有的终点都是叶子节点了。

代码实现

PTA Huffman Codes With Trie Tree

PTA Huffman Codes

标签：

原文地址：http://blog.csdn.net/zccz14/article/details/51348084

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行