码迷,mamicode.com
首页 > Windows程序 > 详细

c# 爬虫

时间:2020-05-24 11:33:10      阅读:66      评论:0      收藏:0      [点我收藏+]

标签:des   tty   gil   爬取   res   utf-8   www   select   like   

      刚学c#不久,想体验一下使用c#语言来爬虫,之前是用python来爬取的。(其实就是语法不一样而已,??)
      下面写了个简单例子
爬取图片

创建dotnet new console --name crawler
安装dotnet add package HtmlAgilityPack --version 1.11.23

string url = "https://www.iqiyi.com/dianying_new/i_list_paihangbang.html";
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
request.Timeout = 30 * 1000;
request.UserAgent = @"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
                    Chrome/81.0.4044.122 Safari/537.36";
request.ContentType = "text/html; charset=utf-8";
request.CookieContainer = new CookieContainer();
string html;
using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
    if(response.StatusCode != HttpStatusCode.OK)
    {
        return;
    }
    else
    {
        StreamReader sr = new StreamReader(response.GetResponseStream(),Encoding.GetEncoding("utf-8"));
        html = sr.ReadToEnd();
        sr.Close();
    }
}

HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
string li_xpath = "//*[@id=‘widget-tab-0‘]/div[2]/div/div[1]/ul/li";
HtmlNodeCollection liNodeList =  document.DocumentNode.SelectNodes(li_xpath);
foreach(var liNode in liNodeList)
{
    string img_xpath = "//*/a/img";
    HtmlDocument imgDocument = new HtmlDocument();
    imgDocument.LoadHtml(liNode.OuterHtml);
    HtmlNode imgNode = imgDocument.DocumentNode.SelectSingleNode(img_xpath);
    if (imgNode.Attributes["src"] != null)
    {
        string imgUrl = imgNode.Attributes["src"].Value;
    }
}

c# 爬虫

标签:des   tty   gil   爬取   res   utf-8   www   select   like   

原文地址:https://www.cnblogs.com/hwxing/p/12949020.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!