JavaScript--模拟网络爬虫

时间：2016-05-22 20:09:05 阅读：161 评论：0 收藏：0 [点我收藏+]

标签：

 1 <!doctype html>
 2 <html>
 3  <head>
 4   <meta charset="UTF-8">
 5   <title>Document</title>
 6   <script>
 7     function getUrls(){
 8       var reg=
 9         /<a\s+[^>]*?href=[‘"]([^‘"]+?)[‘"][^>]*?>/g;
10       var arr=null;//声明变量arr，初始化为null
11       //获取body元素的内容，保存在变量html中
12       var html=document.body.innerHTML;
13       //反复查找html中符合reg规则的字符串，保存在arr中，如果arr不等于null，就继续找
14       while((arr=reg.exec(html))!=null){
15         //arr: ["<a ....>","http://..."]
16         //输出本次找到的a元素
17         console.log(arr[1]);
18                   //RegExp.$1 取出本次匹配的第一个分组的子内容
19       }
20     }
21   </script>
22  </head>
23  <body>
24   <link href="index.css"/><body><a class="header" href="http://tedu.cn">go to tedu</a><h1>welcome</h1><a name="top"></a><div>Hello World</div><a href="http://tmooc.cn" target="_blank">go to tmooc</a></body>
25   <button onclick="getUrls()">开始爬虫</button>
26  </body>
27 </html>

JavaScript--模拟网络爬虫

标签：

原文地址：http://www.cnblogs.com/chenzeyongjsj/p/5517499.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行