码迷,mamicode.com
首页 > 其他好文 > 详细

记一次数据处理的过程

时间:2016-11-05 23:58:28      阅读:365      评论:0      收藏:0      [点我收藏+]

标签:手机号码   短信   excel   总监   行业   

    由于所在公司是主要做短信行业,平时和手机号码打交道较多,各种奇葩需求也比较多,近期接到一个一个总监的奇葩需求,就是将两个文件中相同的手机号码弄处理,由于编程水平以及excel玩的有限,所以只能自己想其他额办法解决,首先每个文件有好几个字段,不过全是结构化数据,格式如下:

15994710001,2016/11/3 0:24,53100010
15994710001,2016/11/3 0:24,53100010
15001313373,2016/11/3 3:39,53100010
13937713309,2016/11/3 6:16,53100010
13758943333,2016/11/3 7:19,53100010
13868044333,2016/11/3 8:33,53100010
13500732333,2016/11/3 10:29,53100010
13523072333,2016/11/3 10:30,53100010
15138132777,2016/11/3 10:31,53100010
13960985779,2016/11/3 10:45,53100010
此文件有4000多行,
文件2 字段比较多,恰好一部分内容乱码,所以也算保护个人隐私吧。
"311-SD10658"2114781676479382330","13703774555","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"
"311-SD10658"2114781676479382330","15920510111","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"
"311-SD10658"2114781676479382330","18319609333","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"
"311-SD10658"2114781676479382330","15221090555","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"
"311-SD10658"2114781676479382330","13905879555","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"
"311-SD10658"2114781676479382330","13818586777","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"
"311-SD10658"2114781676479382330","13916387773","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"
"311-SD10658"2114781676479382330","13882133333","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"
"311-SD10658"2114781676479382330","18200980999","11λP50rit","1","2016/11/3 10:07:43","2016/11/3 10:07:41","0","DELIVRD"

处理的思路:

由于只是要相同的号码,所以就在linux下用一些文本处理工具对其处理,先将其处理成只含手机号码的文件,然后再做其他的处理

可以用cut或者awk截取相关的列,但是由于awk不是太熟悉,这里就使用cut截取,注意分隔符以及相关的第几列就可以。

然后可以用grep  比较,也试过diff,但是效果

1、统计两个文本文件的相同行

grep -Ff file1 file2


2、统计file2中有,file1中没有的行 比较两个不同的行

grep  -vFf  file2 file1


本文出自 “坚持梦想” 博客,请务必保留此出处http://dreamlinux.blog.51cto.com/9079323/1869844

记一次数据处理的过程

标签:手机号码   短信   excel   总监   行业   

原文地址:http://dreamlinux.blog.51cto.com/9079323/1869844

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!