码迷,mamicode.com
首页 > 编程语言 > 详细

Java对html标签的过滤和清洗

时间:2018-10-31 18:27:30      阅读:179      评论:0      收藏:0      [点我收藏+]

标签:category   转化   name   void   uil   ade   port   col   cap   

OWASP HTML Sanitizer 是一个简单快捷的java类库,主要用于放置XSS

优点如下:

  1.使用简单。不需要繁琐的xml配置,只用在代码中少量的编码

  2.由Mike Samuel(谷歌工程师)维护

  3.通过了AntiSamy超过95%的UT覆盖

  4.高性能,低内存消耗

  5.是 AntiSamy DOM性能的4倍

 

1.POM中增加

        <!--html标签过滤-->
        <dependency>
            <groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
            <artifactId>owasp-java-html-sanitizer</artifactId>
            <version>r136</version>
        </dependency>

2.工具类

import org.owasp.html.ElementPolicy;
import org.owasp.html.HtmlPolicyBuilder;
import org.owasp.html.PolicyFactory;

import java.util.List;

/**
 * @author : RandySun
 * @date : 2018-10-08  10:32
 * Comment :
 */
public class HtmlUtils {

    //允许的标签
    private static final String[] allowedTags = {"h1", "h2", "h3", "h4", "h5", "h6",
            "span", "strong",
            "img", "video", "source",
            "blockquote", "p", "div",
            "ul", "ol", "li",
            "table", "thead", "caption", "tbody", "tr", "th", "td", "br",
            "a"
    };

    //需要转化的标签
    private static final String[] needTransformTags = {"article", "aside", "command","datalist","details","figcaption", "figure",
            "footer","header", "hgroup","section","summary"};

    //带有超链接的标签
    private static final String[] linkTags = {"img","video","source","a"};
    public static String sanitizeHtml(String htmlContent){
        PolicyFactory policy = new HtmlPolicyBuilder()
                //所有允许的标签
                .allowElements(allowedTags)
                //内容标签转化为div
                .allowElements( new ElementPolicy() {
                    @Override
                    public String apply(String elementName, List<String> attributes){
                        return "div";
                    }
                },needTransformTags)
                .allowAttributes("src","href","target").onElements(linkTags)
                //校验链接中的是否为http
                .allowUrlProtocols("https")
                .toFactory();
        String safeHTML = policy.sanitize(htmlContent);
        return safeHTML;
    }

    public static void main(String[] args){
        String inputHtml = "<img src=\"https://a.jpb\"/>";
        System.out.println(sanitizeHtml(inputHtml));
    }
}

 其中.allowElements(allowedTags)是添加所有允许的html标签,

以下是需要转化的标签,把needTransformTags中的内容全部转化为div
//内容标签转化为div
.allowElements( new ElementPolicy() {
@Override
public String apply(String elementName, List<String> attributes){
return "div";
}
},needTransformTags)

 

.allowAttributes("src","href","target").onElements(linkTags)是在特定的标签上允许的属性

.allowUrlProtocols("https")表示href或者src链接中只允许https协议


 

Java对html标签的过滤和清洗

标签:category   转化   name   void   uil   ade   port   col   cap   

原文地址:https://www.cnblogs.com/qizhelongdeyang/p/9884716.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!