PHP爬虫抓取

时间：2017-03-08 17:45:11 阅读：253 评论：0 收藏：0 [点我收藏+]

标签：com images pass 定义技术 close 网上 end href

目标：利用PHP解决网站列表内容抓取

描述：在群里看到小伙伴问到关于抓取网站列表内容，我就想起了当时工作关于文章采集的问题，但是后面想想又不对，这是列表抓取，于是就想起了大神们经常说的说的“爬虫”，我想一定可以解决小伙伴的问题，因为是php小白，所以在网上找了很多爬虫的写法，但是太长了不想看，受个别启发看到了fopen()方法，那么这个方法是干嘛的，查找得出“把指定文件或者url资源绑定到资源流上”，额好像不错，就看开始找流的相关用法，看到stream_get_contents(),方法定义是这样的“把资源流转化成字符串”；这下明白，利用他们就可以解决页面抓取了，不过不知道是自己能力问题还是什么发现，效率不是很高呢，测试500个链接10000个数据耗时接近10分钟；

代码：

<?php
define("URL", "http://www.zcy.gov.cn/search?shopId=6&pageNo=");
$start_time = microtime(true);
$array = Array();
echo ‘<ul class="view-mode-thumb">‘;
for($index = 1; $index <= 5; $index++){
    $url = URL.$index;
    $fp = fopen($url, ‘r‘);
    $all_Str = stream_get_contents($fp, -1, -1);
    // $len = strlen($all_Str);
    $preg =‘/<li class="product\s*\w*" style="\s*\w*\s*">.*?<\/li>/s‘;
    $int = @preg_match_all($preg, $all_Str, $arr);
    $array = array_merge($array, $arr[0]);
}

foreach ($array as $key => $value) {
    # code...
    echo $value;
}

echo ‘</ul>‘;
$end_time = microtime(true);
echo "<script>console.log(‘文件数目:".count($array)."  耗用时间:".($end_time-$start_time)."s‘)</script>";
// fpassthru($fp);
fclose($fp);

exit();
?>

效果图：

技术分享

说明：由于个人能力问题，写法上可能不是很完善，望见谅！

PHP爬虫抓取

标签：com images pass 定义技术 close 网上 end href

原文地址：http://www.cnblogs.com/dream-w/p/6520435.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行