Code Monkey home page Code Monkey logo

Comments (2)

krizhanovsky avatar krizhanovsky commented on August 10, 2024

Motivation

#628 implements strict alphabet validation. However, for example, if an alphabet prohibits space (x20) in URI, then a request can bypass the validator using simple hex encoding GET /foo%20bar HTTP/1.1. One of dangerous real life example could be a response splitting attack:

/redir_lang.jsp?lang=foobar%0d%0aContent-Length:%200%0d%0a%0d%0a
HTTP/1.1%20200%20OK%0d%0aContent-Type:%20text/html%0d%0a
Content-Length:%2019%0d%0a%0d%0a<html>Shazam</html>

Allowed characters (bytes) must be taken from the same configuration options as for #628.

The encodings must be validated, see for example validate_url_encoding() from ModSecurity/apache2/re_operators.c.

Traffic normalization for intrusion detection is well studied, see for example Network Intrusion Detection: Evasion,
Traffic Normalization, and End-to-End Protocol Semantics
for L3-L4 NIDS.

HPACK & QPACK

Huffman decoder and encoders should be reviewed: at the moment we use 1-character decoding table which shows better performance than for nghttp2 and Nginx decoders #1176 (comment) . However, LiteSpeed uses large tables and batching to speedup Huffman encoding and decoding. Probably allowed characters (in sense of #628), already decoded (in sense of this issue) can be encoded in the large table. Also see #1207.

References:

Modes of operation

To not to hurt performance in cases which don't require strong security, the feature should be configurable per-vhost and per-location in the same sense as #688.

The transformation logic (as described in RFC 7230 5.7.2) for cookies and URI must be done by a configuration option (see also #902 ):

    http_norm <uri|cookie>
    content_security_mode <strict|transform|log>

, e.g.

    http_norm uri cookie;
    content_security_mode strict;

Following checks and transformations must be done:

  • url - decode percent-encoded string (double percent hex and first/second/double nibble hex aren't allowed). Messages with wrong hex strigs (e.g. http://foo.com/?%product=select 1,2,3, % isn't followed by 2 hex digets) must be blocked. Spaces may be represented in many ways, e.g. with + or %20 (see HTML URL Encoding Reference) - we don't need to do anything with it. RFC 3986 allows percent encoding in all parts of URI, but it's unclear how to deal e.g. with UTF-8 hostname, so we decode URI abs_path only.

  • utf8 - validate UTF-8 encoding: decode percent-encoded and validate UTF-8 bytes;

  • path - remove path traversals like /../ or // (see ngx_http_parse_complex_uri()) and translate \ to /.

  • pollution (subject for #1276) - take the 1st polluted HTTP parameter for URI or POST in content_security_mode=transform mode. In validation mode (w/o content_security_mode=log attribute) just builds a map of the parameters and ensures that there is no pollution. In content_security_mode=transform mode rewrites the URI (available for URIs only) and drops a request and writes a warning for content_security_mode=strict. HTTP parameter fragmentation, e.g.

      http://test.com/url?a=1+select&b=1+from&c=base
    

    is left for application-side WAF.

Additional alphabets must be introduced to validate the strings after all the decodings. These alphabets may prevent double percent encodings (e.g. %2541 which is essencialy %41 after the first hex decoding and A after the second) by prohibiting %.

path must be executed after string decoding, e.g. path /a/b/abba/%2e%2e/abba must be decoded and after that .. removed. Also allowed alphabets must be verified after the decodings to block messages with CR, LF or zero byte.

Implementation requirements

If none of the normalization option is specified, then the HTTP parser must not perform detailed processing and just validate allowed alphabet as now, i.e. there must be zero performance overhead if normalization isn't required by configuration.

All the decoders and log and strict modes must copy an observed string to some buffer, because we need to forward percent-encoded URI. Since all the encodings are larger than an original data, the content_security_mode=transform mode must percent-recode decoded string in-place rewriting the original string. skb fragmentation should be used to handle data gap between shortened URI and HTTP/ part. The fragmentation must be done only once when all the decoders finish. A fallback to full data copying if number of fragments per buffer (#498) grows to more than a compile-time constant.

The normalization must be done before the cache processing to minimize different URI keys stored in the cache.

Since it's unwished to grow current HTTP parser states set, the logic must be done in the plugable (external) FSM by conditional unlikely jump (no need to support the compilation directive any more).

Also please fix the TODO for URI abs_path for more accurate filtering of injection attacks, e.g. it'd be good to be able to prohibit / in query string.

SIMD

There are SIMD implementations of UTF-8 validation or recoding (e.g. to/from UTF-16 or UTF-32). See for example

  1. https://github.com/lemire/fastvalidate-utf-8 and the paper https://arxiv.org/pdf/2010.03090.pdf
  2. https://r-libre.teluq.ca/2400/3/Transcoding%20Billions%20of%20Unicode%20Characters%20per%20Second%20with%20SIMD%20Instructions.pdf
  3. https://nullprogram.com/blog/2017/10/06/
  4. Adventures in SIMD-Thinking (part 2 of 2) - Bob Steagall - CppCon 2020, UTF-8 to UTF-32 Conversion Algorithm

However, probably it makes sense to sacrifice SIMD to do percent-decoding, UTF-8 validation, validation of allowed character sets (in sense of #628 ) and transformations (path or arguments) in single pass.

Tests and docs

Please update https://github.com/tempesta-tech/tempesta/wiki/Web-security Wiki page on finishing the task.

  • At least one functional test for detection a response splitting attack is required.
  • Functional test for custom character sets tempesta-tech/tempesta-test#3
  • Tempesta must block requests like GET /vulnerabilities/xss_d/?default=/%0aSet-Cookie:crlf-injection HTTP/2

Further possible extensions

We leave back-end server personality normalization for further development if there will be any real requests. Probably this won't be needed since we're going to provide full Web server functionality and leave really heavy processing logic to dedicated WAF solutions.

HTTP responses also aren't normalized - we target initial attacks filtering instead of filtration of their consequences.

Also the decoders set is very restricted, e.g. there is no lower case conversion or Microsoft %U decoding or unicode normalization, so please keep in mind possible further extension of the decoder.

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

The issue was wrongly closed

from tempesta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.