HtmlAgilityPack HTML操作类库的使用

时间：2016-03-30 22:26:50 阅读：223 评论：0 收藏：0 [点我收藏+]

标签：

1、读取网络中html网页内容，获取网页中元素body内的html，处理所有img元素的src属性后以字符串返回

                    if (l_sWenBenHtmlFtpPath.Substring(l_sWenBenHtmlFtpPath.LastIndexOf(".") + 1) == "html")　　
                    {
                        HtmlWeb htmlWeb = new HtmlWeb();
                        HtmlDocument htmlDoc = htmlWeb.Load(l_sWenBenHtmlFtpPath);
                        HtmlNode htmlNode = htmlDoc.DocumentNode;                                                
                        HtmlNodeCollection nodes = htmlNode.SelectNodes("//body");  //使用xpath语法进行查询
                        if (nodes != null)
                        {
                            foreach (HtmlNode bodyTag in nodes)
                            {                                
                                HtmlNodeCollection nodes2 = htmlNode.SelectNodes("//img");  //使用xpath语法进行查询                                
                                if (nodes2 != null)
                                {
                                    foreach (HtmlNode imgTag in nodes2)
                                    {
                                        string imgHttpPath = imgTag.Attributes["src"].Value;
                                        imgTag.Attributes["src"].Value = l_sWenBenHtmlFtpPath.Substring(0, l_sWenBenHtmlFtpPath.LastIndexOf("/") + 1) + imgHttpPath;
                                    }
                                }
                                l_sWenBenHtml = bodyTag.InnerHtml;
                            }
                        }
                    }

2、通过HtmlAgilityPack Html操作类库将html格式的字符串加载为html文档对象，再对html dom进行操作

                                //1.解码前台提交的html字串
                                string sDecodeString = HttpUtility.HtmlDecode(HttpUtility.UrlDecode(sEncodeString));
                                //2.拼接成完整的html字串
                                sDecodeString = @"<!DOCTYPE html><html><head><meta http-equiv=""content-type"" content=""text/html;charset=UTF-8""/>"
                                    + @"</head><body><div>" 
　　　　　　　　　　　　　　　　　　+ sDecodeString + @"</div></body></html>";
                                //3.处理html的img标签的src属性-C#的HTML DOM操作
                                HtmlDocument doc = new HtmlDocument();
                                doc.LoadHtml(sDecodeString.Replace("\n", " "));
                                HtmlNode node = doc.DocumentNode;
                                HtmlNodeCollection nodes = node.SelectNodes("//img");   //使用xpath语法进行查询
                                if (nodes != null)  //没有img节点时出错
                                {
                                    //处理html字符串中img标签的src属性
                                    foreach (HtmlNode imgTag in nodes)
                                    {
                                        string imgHttpPath = imgTag.Attributes["src"].Value;
                                        imgHttpPath = imgHttpPath.Substring(imgHttpPath.LastIndexOf("/") + 1);
                                        imgTag.Attributes["src"].Value = imgHttpPath;                                      
                                    }
                                }
                                //4.获取处理后的html字符串
                                sHtmlString = node.OuterHtml;    //处理img中src属性后的html字符串
　　　　　　　　　　　　　　　　//5.将字符串存入html格式的文件中
　　　　　　　　　　　　　　　　//do something

持续更新中，敬请期待...

HtmlAgilityPack HTML操作类库的使用

标签：

原文地址：http://www.cnblogs.com/yuzhihui/p/5339103.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行