Code Monkey home page Code Monkey logo

java-html-sanitizer's Introduction

OWASP Java HTML Sanitizer

Coverage Status CII Best Practices Maven Central

A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.

The existing dependencies are on guava and JSR 305. The other jars are only needed by the test suite. The JSR 305 dependency is a compile-only dependency, only needed for annotations.

This code was written with security best practices in mind, has an extensive test suite, and has undergone adversarial security review.

Table Of Contents

Getting Started

Getting Started includes instructions on how to get started with or without Maven.

Prepackaged Policies

You can use prepackaged policies:

PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS);
String safeHTML = policy.sanitize(untrustedHTML);

Crafting a policy

The tests show how to configure your own policy:

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("a")
    .allowUrlProtocols("https")
    .allowAttributes("href").onElements("a")
    .requireRelNofollowOnLinks()
    .toFactory();
String safeHTML = policy.sanitize(untrustedHTML);

Custom Policies

You can write custom policies to do things like changing h1s to divs with a certain class:

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("p")
    .allowElements(
        (String elementName, List<String> attrs) -> {
          // Add a class attribute.
          attrs.add("class");
          attrs.add("header-" + elementName);
          // Return elementName to include, null to drop.
          return "div";
        }, "h1", "h2", "h3", "h4", "h5", "h6")
    .toFactory();
String safeHTML = policy.sanitize(untrustedHTML);

Please note that the elements "a", "font", "img", "input" and "span" need to be explicitly whitelisted using the allowWithoutAttributes() method if you want them to be allowed through the filter when these elements do not include any attributes.

Attribute policies allow running custom code too. Adding an attribute policy will not water down any default policy like style or URL attribute checks.

new HtmlPolicyBuilder = new HtmlPolicyBuilder()
    .allowElement("div", "span")
    .allowAttributes("data-foo")
        .matching(
            (String elementName, String attributeName, String value) -> {
              // Return value for the attribute or null to drop.
            })
        .onElements("div", "span")
    .build()

Preprocessors

Preprocessors allow inserting text and large scale structural changes.

new HtmlPolicyBuilder = new HtmlPolicyBuilder()
    // Use a preprocessor to be backwards compatible with the
    // <plaintext> element which 
    .withPreprocessor(
        (HtmlStreamEventReceiver r) -> {
          // Provide user with info about links before they click.
          // Before:                       <a href="https://example.com/...">
          // After:  (https://example.com) <a href="https://example.com/...">
          return new HtmlStreamEventReceiverWrapper(r) {
            @Override public void openTag(String elementName, List<String> attrs) {
              if ("a".equals(elementName)) {
                for (int i = 0, n = attrs.size(); i < n; i += 2) {
                  if ("href".equals(attrs.get(i)) {
                    String url = attrs.get(i + 1);
                    String origin;
                    try {
                      URI uri = new URI(url);
                      String scheme = uri.getScheme();
                      String authority = uri.getRawAuthority();
                      if (scheme == null && authority == null) {
                        origin = null;
                      } else {
                        origin = (scheme != null ? scheme + ":" : "")
                               + (authority != null ? "//" + authority : "");
                      }
                    } catch (URISyntaxException ex) {
                      origin = "about:invalid";
                    }
                    if (origin != null) {
                      text(" (" + origin + ") ");
                    }
                  }
                }
              }
              super.openTag(elementName, attrs);
            }
          };
        }
    .allowElement("a")
    ...
    .build()

Preprocessing happens before a policy is applied, so cannot affect the security of the output.

Telemetry

When a policy rejects an element or attribute it notifies an HtmlChangeListener.

You can use this to keep track of policy violation trends and find out when someone is making an effort to breach your security.

PolicyFactory myPolicyFactory = ...;
// If you need to associate reports with some context, you can do so.
MyContextClass myContext = ...;

String sanitizedHtml = myPolicyFactory.sanitize(
    unsanitizedHtml,
    new HtmlChangeListener<MyContextClass>() {
      @Override
      public void discardedTag(MyContextClass context, String elementName) {
        // ...
      }
      @Override
      public void discardedAttributes(
          MyContextClass context, String elementName, String... attributeNames) {
        // ...
      }
    },
    myContext);

Note: If a string sanitizes with no change notifications, it is not the case that the input string is necessarily safe to use. Only use the output of the sanitizer.

The sanitizer ensures that the output is in a sub-set of HTML that commonly used HTML parsers will agree on the meaning of, but the absence of notifications does not mean that the input is in such a sub-set, only that it does not contain elements or attributes that were removed.

See "Why sanitize when you can validate" for more on this topic.

Questions?

If you wish to report a vulnerability, please see AttackReviewGroundRules.

Subscribe to the mailing list to be notified of known Vulnerabilities and important updates.

Contributing

If you would like to contribute, please ping @mvsamuel or @manicode.

We welcome issue reports and PRs. PRs that change behavior or that add functionality should include both positive and negative tests.

Please be aware that contributions fall under the Apache 2.0 License.

Credits

Thanks to everyone who has helped with criticism and code

java-html-sanitizer's People

Contributors

0xflotus avatar benapple avatar chuckdumont avatar cure53 avatar edbaker83 avatar jamesdaily avatar jed204 avatar jmanico avatar jshields-squarespace avatar lillesand avatar mikesamuel avatar pukomuko avatar rnnds avatar ronabop avatar sbearcsiro avatar yangbongsoo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.