Comments (2)
Motivation
#628 implements strict alphabet validation. However, for example, if an alphabet prohibits space (x20
) in URI, then a request can bypass the validator using simple hex encoding GET /foo%20bar HTTP/1.1
. One of dangerous real life example could be a response splitting attack:
/redir_lang.jsp?lang=foobar%0d%0aContent-Length:%200%0d%0a%0d%0a
HTTP/1.1%20200%20OK%0d%0aContent-Type:%20text/html%0d%0a
Content-Length:%2019%0d%0a%0d%0a<html>Shazam</html>
Allowed characters (bytes) must be taken from the same configuration options as for #628.
The encodings must be validated, see for example validate_url_encoding()
from ModSecurity/apache2/re_operators.c
.
Traffic normalization for intrusion detection is well studied, see for example Network Intrusion Detection: Evasion,
Traffic Normalization, and End-to-End Protocol Semantics for L3-L4 NIDS.
HPACK & QPACK
Huffman decoder and encoders should be reviewed: at the moment we use 1-character decoding table which shows better performance than for nghttp2 and Nginx decoders #1176 (comment) . However, LiteSpeed uses large tables and batching to speedup Huffman encoding and decoding. Probably allowed characters (in sense of #628), already decoded (in sense of this issue) can be encoded in the large table. Also see #1207.
References:
- HTTP_IDS_evasions.pdf
- http_parameter_pollution.pdf
- A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages!
Modes of operation
To not to hurt performance in cases which don't require strong security, the feature should be configurable per-vhost and per-location in the same sense as #688.
The transformation logic (as described in RFC 7230 5.7.2) for cookies and URI must be done by a configuration option (see also #902 ):
http_norm <uri|cookie>
content_security_mode <strict|transform|log>
, e.g.
http_norm uri cookie;
content_security_mode strict;
Following checks and transformations must be done:
-
url
- decode percent-encoded string (double percent hex and first/second/double nibble hex aren't allowed). Messages with wrong hex strigs (e.g.http://foo.com/?%product=select 1,2,3
,%
isn't followed by 2 hex digets) must be blocked. Spaces may be represented in many ways, e.g. with+
or%20
(see HTML URL Encoding Reference) - we don't need to do anything with it. RFC 3986 allows percent encoding in all parts of URI, but it's unclear how to deal e.g. with UTF-8 hostname, so we decode URI abs_path only. -
utf8
- validate UTF-8 encoding: decode percent-encoded and validate UTF-8 bytes; -
path
- remove path traversals like/../
or//
(seengx_http_parse_complex_uri()
) and translate\
to/
. -
pollution
(subject for #1276) - take the 1st polluted HTTP parameter for URI or POST incontent_security_mode=transform
mode. In validation mode (w/ocontent_security_mode=log
attribute) just builds a map of the parameters and ensures that there is no pollution. Incontent_security_mode=transform
mode rewrites the URI (available for URIs only) and drops a request and writes a warning forcontent_security_mode=strict
. HTTP parameter fragmentation, e.g.http://test.com/url?a=1+select&b=1+from&c=base
is left for application-side WAF.
Additional alphabets must be introduced to validate the strings after all the decodings. These alphabets may prevent double percent encodings (e.g. %2541
which is essencialy %41
after the first hex decoding and A
after the second) by prohibiting %
.
path
must be executed after string decoding, e.g. path /a/b/abba/%2e%2e/abba
must be decoded and after that ..
removed. Also allowed alphabets must be verified after the decodings to block messages with CR, LF or zero byte.
Implementation requirements
If none of the normalization option is specified, then the HTTP parser must not perform detailed processing and just validate allowed alphabet as now, i.e. there must be zero performance overhead if normalization isn't required by configuration.
All the decoders and log
and strict
modes must copy an observed string to some buffer, because we need to forward percent-encoded URI. Since all the encodings are larger than an original data, the content_security_mode=transform
mode must percent-recode decoded string in-place rewriting the original string. skb fragmentation should be used to handle data gap between shortened URI and HTTP/
part. The fragmentation must be done only once when all the decoders finish. A fallback to full data copying if number of fragments per buffer (#498) grows to more than a compile-time constant.
The normalization must be done before the cache processing to minimize different URI keys stored in the cache.
Since it's unwished to grow current HTTP parser states set, the logic must be done in the plugable (external) FSM by conditional unlikely jump (no need to support the compilation directive any more).
Also please fix the TODO for URI abs_path for more accurate filtering of injection attacks, e.g. it'd be good to be able to prohibit /
in query string.
SIMD
There are SIMD implementations of UTF-8 validation or recoding (e.g. to/from UTF-16 or UTF-32). See for example
- https://github.com/lemire/fastvalidate-utf-8 and the paper https://arxiv.org/pdf/2010.03090.pdf
- https://r-libre.teluq.ca/2400/3/Transcoding%20Billions%20of%20Unicode%20Characters%20per%20Second%20with%20SIMD%20Instructions.pdf
- https://nullprogram.com/blog/2017/10/06/
- Adventures in SIMD-Thinking (part 2 of 2) - Bob Steagall - CppCon 2020, UTF-8 to UTF-32 Conversion Algorithm
However, probably it makes sense to sacrifice SIMD to do percent-decoding, UTF-8 validation, validation of allowed character sets (in sense of #628 ) and transformations (path or arguments) in single pass.
Tests and docs
Please update https://github.com/tempesta-tech/tempesta/wiki/Web-security Wiki page on finishing the task.
- At least one functional test for detection a response splitting attack is required.
- Functional test for custom character sets tempesta-tech/tempesta-test#3
- Tempesta must block requests like
GET /vulnerabilities/xss_d/?default=/%0aSet-Cookie:crlf-injection HTTP/2
Further possible extensions
We leave back-end server personality normalization for further development if there will be any real requests. Probably this won't be needed since we're going to provide full Web server functionality and leave really heavy processing logic to dedicated WAF solutions.
HTTP responses also aren't normalized - we target initial attacks filtering instead of filtration of their consequences.
Also the decoders set is very restricted, e.g. there is no lower case conversion or Microsoft %U decoding or unicode normalization, so please keep in mind possible further extension of the decoder.
from tempesta.
The issue was wrongly closed
from tempesta.
Related Issues (20)
- Invalid server connection reference counting
- Restart tempesta under heavy load HOT 3
- New bug under heavy load HOT 2
- Warning: Cannot delete bytes from skb
- BUG: kernel NULL pointer dereference: tfw_hpack_set_entry HOT 2
- Create scripts to collect dumps from production server
- Kernel bug during streams shrink HOT 1
- BUG at /home/tempesta/tempesta/fw/http_stream.h:325!
- Exceeding space in case when there are a lot of clients HOT 1
- NULL dereference in tfw_h2_sched_activate_stream HOT 1
- BUG: kernel NULL pointer dereference: __tcp_get_metrics
- BUG: unable to handle page fault for address: crypto_aead_decrypt
- BUG: kernel NULL pointer dereference: tfw_pool_destroy
- UBSAN warning
- Poor dmesg usage in tempesta.sh HOT 2
- BUG: KASAN: slab-out-of-bounds in ttls_hs_done+0x16/0x30 [tempesta_tls] HOT 1
- BUG: kernel NULL pointer dereference: tfw_tls_encrypt
- bug in `ttls_ctx_init`
- BUG: kernel NULL pointer dereference: tfw_hpack_add_index
- [STUDY NOTES] Analyze the kernel Crash Report HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tempesta.