shell及Python爬虫实例展示

时间：2018-02-25 19:09:51 阅读：189 评论：0 收藏：0 [点我收藏+]

1.shell爬虫实例:

[root@db01 ~]# vim pa.sh 
#!/bin/bash
www_link=http://www.cnblogs.com/clsn/default.html?page=
for i in  {1..8}
do
a=`curl ${www_link}${i} 2>/dev/null|grep homepage|grep -v "ImageLink"|awk -F "[><\"]" ‘{print $7"@"$9}‘ >>bb.txt`#@为自己
指定的分隔符.这行是获取内容及内容网址
done
egrep -v "pager" bb.txt >ma.txt #将处理后，只剩内容和内容网址的放在一个文件里
b=`sed  "s# ##g" ma.txt`   #将文件里的空格去掉，因为for循环会将每行的空格前后作为两个变量，而不是一行为一个变量，这个坑花
了我好长时间。
for i in $b
do
  c=`echo $i|awk -F @ ‘{print $1}‘` #c=内容网址
  d=`echo $i|awk -F @ ‘{print $2}‘` #d=内容
  echo "<a href=‘${c}‘ target=‘_blank‘>${d}</a> " >>cc.txt #cc.txt为生成a标签的文本
done

爬虫结果显示：归档文件中惨绿少年的爬虫结果

shell及Python爬虫实例展示

标签：blog body sed 一个指定 a标签文件 def http

原文地址：https://www.cnblogs.com/machangwei-8/p/8469698.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行