码迷,mamicode.com
首页 > 系统相关 > 详细

Linux Shell编程实战---统计特定文件中单词的词频

时间:2017-09-08 18:27:48      阅读:193      评论:0      收藏:0      [点我收藏+]

标签:linux

方法1:使用sed

Shell>cat a1.txt

123a123,555

456.333

566555!88,thisis a good boy.


Shell>cat a1.txt|sed ‘s/[[:space:]|[:punct:]]/\n/g‘|sed ‘/^$/d‘|sort|uniq -c|sort -n-k1 -r

      2 555

      1 this

      1 is

      1 good

      1 boy

      1 a123

      1 a

      1 88

      1 566

      1 456

      1 333

      1 123

Shell>


sed ‘s/[[:space:]|[:punct:]]/\n/g‘

[]表示正则表达式集合,[:space:]代表空格。[:punct:]代表标点符号。

[[:space:]|[:punct:]]代表匹配空格或者标点

s/[[:space:]|[:punct:]]/\n/g代表把空格或标点替换成\n换行符


sed ‘/^$/d‘ 删除掉空行



方法2:使用awk

#!/bin/bash


filename=$1


cat$filename|awk ‘{

  #getline var;

  split($0,a,/[[:space:]|[:punct:]]/);

  for(i in a) {

    word=a[i];

    b[word]++;

  }

}

  END{

   printf("%-14s%s\n","Word","Count");

    for(i in b) {

        printf("%-14s%d\n",i,b[i])|"sort-r -n -k2";

    }


  }

运行结果

[root@Test01awk]# cat a1.txt

123a123,555

456.333

566555!88,thisis a good boy.


[root@Test01awk]# ./word_freq.sh a1.txt

Word          Count

555           2

this          1

is            1

good          1

boy           1

a123          1

a             1

88            1

566           1

456           1

333           1

123           1

              1

[root@Test01awk]#



方法3:使用tr

[root@Test01awk]# cat a1.txt

123a123,555

456.333

566i555!88,this is a good boy.


[root@Test01awk]# cat a1.txt |tr ‘[:space:]|[:punct:]‘ ‘\n‘|tr -s ‘\n‘|sort|uniq -c|sort -n-k1 -r

      2 555

      1 this

      1 is

      1 good

      1 boy

      1 a123

      1 a

      1 88

      1 566i

      1 456

      1 333

      1 123

[root@Test01awk]#







本文出自 “微小信的运维之道” 博客,请务必保留此出处http://weixiaoxin.blog.51cto.com/13270051/1963641

Linux Shell编程实战---统计特定文件中单词的词频

标签:linux

原文地址:http://weixiaoxin.blog.51cto.com/13270051/1963641

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!