词频统计

时间：2016-09-05 12:12:08 阅读：135 评论：0 收藏：0 [点我收藏+]

标签：

（1）简要说明：

统计一篇给定的文章中，各个单词出现的次数的算法。用HashMap 来存放出现的单词的次数，Key 是要统计的单词，Value 是单词出现的次数。最后再按照 Key 的升序排列出来。

（2）代码实现：

public class CountOccurrenceOfWords {

public static void main(String[] args) throws Exception {
Map hashMap = null;
BufferedReader infile = null;
StringTokenizer st = null;
String filename = "Text.txt";
String string;
String file = null;
//打开一篇文章，名字是 Test.txt .
infile = new BufferedReader(new FileReader(filename));
while ((string = infile.readLine()) != null) {
file += string; //都出整篇文章，存入String中。
}

hashMap = new HashMap();
// 取出文章中的单词，"," "." "!" " " 为各个单词的分界符。
st = new StringTokenizer(file, ";: ,.!");

while (st.hasMoreTokens()) {
String key = st.nextToken();
if (hashMap.get(key) != null) {
int value = ((Integer) hashMap.get(key)).intValue();
value++;
hashMap.put(key, new Integer(value));
} else {
hashMap.put(key, new Integer(1));
}
}

//按照单词的字母次序输出。
Map treeMap = new TreeMap(hashMap);
Set entrySet = treeMap.entrySet();

Iterator iterator = entrySet.iterator();

while (iterator.hasNext()) {

System.out.println(iterator.next());
}
}
}

（3）部分结果输出：

As=1
But=2
Environment=1
Everybody=1
Fourthly=1
How=1
I=3
In=2
It=1
One=1
Our=1
Ourselves=1
People=1
Point=2
Protect=2
Protecting=1
Secondly=1
So=1
The=4
Then=1
There=1
They=1
Thirdly=1
Though=1

词频统计

标签：

原文地址：http://www.cnblogs.com/yinll314/p/5841347.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行