标签:参数 过滤 处理 tps line eid online oca ati
uniq
a.不加参数只对相邻的相同行去重
uniq test.txt
b.sort命令让重复的行相邻
先用sort做排序,让重复的行相邻,然后用uniq去重
sort test.txt | uniq
用sort -u即可实现该功能,这里的-u就是uniq
c.去重计数
参数-c --count 去重计数
sort test.txt | uniq -c
从文件中过滤重复信息
方法一:
1.文件信息
[root@localhost ~]# cat test.txt
https://www.baidu.com/index.php?tn=monline_3_dg
https://vip.iqiyi.com/waimeizhy1-pc.html/?fv=zz_5993b5deb9f24
https://www.jd.com/?cu=true&utm_source=cps.youmai.com&utm_medium=tuiguang&utm_campaign=t_1000049399_85292009&utm_term=4a4074858f4a46e6bc796373fd8931a2
https://pjjx.1688.com/?tracelog=cps&clickid=988602c34d86e07dd5a6c4e804992287
https://www.ctrip.com/?AllianceID=263200&sid=712562&ouid=&app=0101F00
https://vacations.ctrip.com/grouptravel
https://www.baidu.com/222
https://vip.iqiyi.com/waimei
https://vip.iqiyi.com/waimei
https://pjjx.1688.com/?tracelog=cps
2.使用awk -F 以"/“为分隔符过滤所需列
[root@localhost ~]# awk -F / ‘{print$3}‘ test.txt
www.baidu.com
vip.iqiyi.com
www.jd.com
pjjx.1688.com
www.ctrip.com
vacations.ctrip.com
www.baidu.com
vip.iqiyi.com
vip.iqiyi.com
pjjx.1688.com
3.使用sort排序后用uniq过滤 awk -F / ‘{print$3}‘ test.txt | sort | uniq -c
4.使用awk -F / ‘{print $3}‘ test.log | sort | uniq -c |sort -r将过滤的结果降序输出
方法二:
用cut方法处理
[root@localhost ~]# cut -d / -f3 test.txt
www.baidu.com
vip.iqiyi.com
www.jd.com
pjjx.1688.com
www.ctrip.com
vacations.ctrip.com
www.baidu.com
vip.iqiyi.com
vip.iqiyi.com
pjjx.1688.com
[root@localhost ~]# cut -d / -f3 test.txt | sort -r |uniq -c
标签:参数 过滤 处理 tps line eid online oca ati
原文地址:https://www.cnblogs.com/Simplelearning/p/12291012.html