c# 爬虫

时间：2020-05-24 11:33:10 阅读：66 评论：0 收藏：0 [点我收藏+]

标签：des tty gil 爬取 res utf-8 www select like

      刚学c#不久，想体验一下使用c#语言来爬虫，之前是用python来爬取的。（其实就是语法不一样而已，??）
      下面写了个简单例子

爬取图片

创建dotnet new console --name crawler
安装dotnet add package HtmlAgilityPack --version 1.11.23

string url = "https://www.iqiyi.com/dianying_new/i_list_paihangbang.html";
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
request.Timeout = 30 * 1000;
request.UserAgent = @"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
                    Chrome/81.0.4044.122 Safari/537.36";
request.ContentType = "text/html; charset=utf-8";
request.CookieContainer = new CookieContainer();
string html;
using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
    if(response.StatusCode != HttpStatusCode.OK)
    {
        return;
    }
    else
    {
        StreamReader sr = new StreamReader(response.GetResponseStream(),Encoding.GetEncoding("utf-8"));
        html = sr.ReadToEnd();
        sr.Close();
    }
}

HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
string li_xpath = "//*[@id=‘widget-tab-0‘]/div[2]/div/div[1]/ul/li";
HtmlNodeCollection liNodeList =  document.DocumentNode.SelectNodes(li_xpath);
foreach(var liNode in liNodeList)
{
    string img_xpath = "//*/a/img";
    HtmlDocument imgDocument = new HtmlDocument();
    imgDocument.LoadHtml(liNode.OuterHtml);
    HtmlNode imgNode = imgDocument.DocumentNode.SelectSingleNode(img_xpath);
    if (imgNode.Attributes["src"] != null)
    {
        string imgUrl = imgNode.Attributes["src"].Value;
    }
}

c# 爬虫

标签：des tty gil 爬取 res utf-8 www select like

原文地址：https://www.cnblogs.com/hwxing/p/12949020.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行