Code Monkey home page Code Monkey logo

Comments (13)

vdmit11 avatar vdmit11 commented on August 10, 2024

Well, the TFW_SCHED_MAX_SERVERS was added for a purpose.
A fixed-size array of servers has certain advantages over a linked list:

  • You can use binary search (although linked list may be transofrmed to a skip list, but this is more complicated).
  • You can allocate per-CPU arrays easily. The small value of TFW_SCHED_MAX_SERVERS allows to do that statically. Dynamic per-CPU linked lists are not that easy.
  • All the memory is packed together which is good for caching, hence the performance is better.

Of course, we can re-allocate the array as needed and so on,
but for me it looks easier to have two separate modules for the two cases:

  • A module for small groups of servers that are online most of the time.
  • Another module for large groups of servers that go offline all the time.

These two cases require different implementations and involve different optimizations, so I think we really need separate modules.

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024
    ....to have two separate modules for the two cases:

    A module for small groups of servers that are online most of the time.
    Another module for large groups of servers that go offline all the time.

This is not a different logic, this is just a different cases, so they should be treated in the same code base. Probably, you just can allocate array for small server set or use hash table or trees to handle thousands of servers. But the different containers should be processed by the same logic.

Or please make an example of logic which is fundamentally different for the cases.

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

The system should dynamically establish new connections to busy upstream servers and also dynamically shrink redundant connections (also applicable for forward proxy case).

UPD. It still has sense to be able to change number of connections to upstream servers. However, Tempesta FW will not support forward proxying. With wide HTTPS usage forward proxying is limited by corporate networks and other small installation which do not process millions requests per second. There is no ISP usage any more. So this is completely different use case with different environment and requirements.

UPD 2. I created a new issue #710 for the functionality, so no need to implement it this time.

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

As we've seen in our performance benchmarks and shown in third-party benchmarks HTTP servers, like Nginx or Apache HTTPD, shows quite low performance on 4 concurrent connection, so our current default 4 server connections and 32 as a maximum number are just inadequate. I'd say 32 as default connections with VMs running Tempesta together with a user space HTTP server in mind, and 32768 (USHORT_MAX - 1024, which is 64512 is the maximum number of ephimeral ports.

The main consequence of the issue is that all current scheduling algorithms must be reworked to support dynamically sized arrays.

A naive solution could be to keep schedulers data per CPU and establish number of upstream connections equal to N * CPU_NUM. However, Tempesta FW can service thousands of weak virtualized servers, so if it's running on for example 128 cores hardware, then we have to maintain too many redundant connections and will cause unnecessary load onto the weak servers.

The issue relates to #51 since that also updates schedulers code.

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

While the 2-tier schedulers are certainly should be modified to support dynamically sized arrays, the real performance issue is with HTTP scheduler which in practice must be able to process thousands of server groups. The problem is in tfw_http_match_req() which traverses the list of thousands rules and perform string matching against each item. The matcher must be reworked to handle rules in a hash table, such that we can make a quick jump by a rule key. The key can be calculate by the string and ID of the HTTP field.

In current milestone these constants should be eliminated in PRs #670 and #666.

UPD. This comment is separate into a new issue #732, so it shouldn't be done in context of #76.

from tempesta.

vankoven avatar vankoven commented on August 10, 2024

All the requirements are already implemented or moved to separated issues/task.

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

It seems the issue is done, but we still have no results from #680 test. Let's close it if the test shows that we really able to efficiently handle 1M hosts.

from tempesta.

vladtcvs avatar vladtcvs commented on August 10, 2024

Creating many backends, with 1 backend in server group, causes problems. Creating 16 interfaces with 64 ports on interface, makes problem:

ERROR: start() for module 'sock_srv' returned the error: -12 - ENOMEM

8x32: TCP: Too many orphaned sockets and kmemleak messages
8x128: much more TCP: Too many orphaned sockets and much more kmemleak

backends created with nginx, single nginx per interface, nginx config contains server {} for each port

ports used: 16384, 16375, etc for each interface

from tempesta.

vladtcvs avatar vladtcvs commented on August 10, 2024

testing: test_1M.py from vlts-680-1M

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

I didn't notice TCP: Too many orphaned sockets messages, but using tempesta_fw.conf generated by @ikoveshnikov 's script and his nginx.conf (the both are attached), I see that sysctl -w net.tempesta.state=stop (on --restart) or sysctl -w net.tempesta.state=start (on --reload) takes about 20 seconds for 1000 backends config. The call stack for the sysctl process:

[<ffffffffafecb533>] __wait_rcu_gp+0xc3/0xf0
[<ffffffffafecfc9c>] synchronize_sched.part.65+0x3c/0x60
[<ffffffffafecfdf0>] synchronize_sched+0x30/0x90
[<ffffffffc027f169>] tfw_sched_ratio_del_grp+0x49/0x80 [tfw_sched_ratio]
[<ffffffffc043e462>] tfw_sg_release+0x22/0x80 [tempesta_fw]
[<ffffffffc043e512>] tfw_sg_release_all+0x52/0xb0 [tempesta_fw]
[<ffffffffc0443656>] tfw_sock_srv_stop+0xb6/0xd0 [tempesta_fw]
[<ffffffffc043c19c>] tfw_ctlfn_state_io+0x19c/0x530 [tempesta_fw]
[<ffffffffb004e025>] proc_sys_call_handler+0xe5/0x100
[<ffffffffb004e04f>] proc_sys_write+0xf/0x20
[<ffffffffaffca322>] __vfs_write+0x32/0x160
[<ffffffffaffcb660>] vfs_write+0xb0/0x190
[<ffffffffaffcca83>] SyS_write+0x53/0xc0
[<ffffffffb03dd72e>] entry_SYSCALL_64_fastpath+0x1c/0xb1

scrip_cfg.tar.gz

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

After the fix 6d11ff1 perf top shows for 100K servers reconfiguration:

    76.25%  [kernel]            [k] strcasecmp
    16.01%  [tempesta_fw]       [k] tfw_cfgop_begin_srv_group
     5.64%  [tempesta_fw]       [k] tfw_sg_lookup_reconfig

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

After the fix 94b18ed performance profile became:

    62.33%  [tempesta_fw]  [k] tfw_cfgop_begin_srv_group
     9.25%  [tempesta_fw]  [k] tfw_apm_prcntl_tmfn
     7.98%  [tempesta_fw]  [k] __tfw_stricmp_avx2

However, reloading 10K server groups takes about 30 seconds, the same as for full restart. tempesta_fw.conf for 10K servers takes about 1MB, so all the parsing and server groups manipulations, e.g. tfw_cfgop_begin_srv_group(), takes time.

from tempesta.

krizhanovsky avatar krizhanovsky commented on August 10, 2024

With the commit c58993a (also https://github.com/tempesta-tech/linux-4.9.35-tfw/commit/f20d5703592ce3078d3415edbc5b2703f614d9b7 for the kernel) I still cannot normally start Tempesta FW with 30K backends using configuration #680 (comment) . (Surely it'd be better to use many IP addresses and ports to avoid lock contention on single TCP socket.) The system hangs on softirq softlockups. Only following patch allows to "normally" start Tempesta FW:

diff --git a/tempesta_fw/apm.c b/tempesta_fw/apm.c
index b82a3ce..5f78ee1 100644
--- a/tempesta_fw/apm.c
+++ b/tempesta_fw/apm.c
@@ -1034,9 +1034,10 @@ tfw_apm_add_srv(TfwServer *srv)
 
        /* Start the timer for the percentile calculation. */
        set_bit(TFW_APM_DATA_F_REARM, &data->flags);
+       goto AK_DBG;
        setup_timer(&data->timer, tfw_apm_prcntl_tmfn, (unsigned long)data);
        mod_timer(&data->timer, jiffies + TFW_APM_TIMER_INTVL);
-
+AK_DBG:
        srv->apmref = data;
 
        return 0;
diff --git a/tempesta_fw/sock_srv.c b/tempesta_fw/sock_srv.c
index dc9e0ba..3b4e361 100644
--- a/tempesta_fw/sock_srv.c
+++ b/tempesta_fw/sock_srv.c
@@ -227,7 +227,12 @@ tfw_sock_srv_connect_try_later(TfwSrvConn *srv_conn)
        /* Don't rearm the reconnection timer if we're about to shutdown. */
        if (unlikely(!ss_active()))
                return;
-
+{
+       static unsigned long delta = 0;
+       timeout = 1000 + delta;
+       delta += 10;
+       goto AK_DBG_end;
+}
        if (srv_conn->recns < ARRAY_SIZE(tfw_srv_tmo_vals)) {
                if (srv_conn->recns)
                        TFW_DBG_ADDR("Cannot establish connection",
@@ -249,7 +254,7 @@ tfw_sock_srv_connect_try_later(TfwSrvConn *srv_conn)
                timeout = tfw_srv_tmo_vals[ARRAY_SIZE(tfw_srv_tmo_vals) - 1];
        }
        srv_conn->recns++;
-
+AK_DBG_end:
        mod_timer(&srv_conn->timer, jiffies + msecs_to_jiffies(timeout));
 }
 
@@ -2119,7 +2124,7 @@ static TfwCfgSpec tfw_srv_group_specs[] = {
        },
        {
                .name = "server_connect_retries",
-               .deflt = "10",
+               .deflt = "1", // AK_DBG "10",
                .handler = tfw_cfgop_in_conn_retries,
                .spec_ext = &(TfwCfgSpecInt) {
                        .range = { 0, INT_MAX },

The reason is #736: TIMER_SOFTIRQ is the higest priority softirq functions and we setup about 60K timers for the test of 30K groups and all the timers aren't so lightweight. So the timers just block any activity in the system and don't allow it to make progress.

from tempesta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.