fortinet / autoscale-core Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 5.0 3.13 MB

Autoscale - Core Module

License: MIT License

TypeScript 99.71% Groovy 0.08% JavaScript 0.21%

autoscale-core's People

Contributors

Stargazers

Watchers

Forkers

fortinetps jaydenliang manikantanandyala ccaiccie

autoscale-core's Issues

heartbeat calculation improvement based on heartbeat request sequence and timestamp

internal issue id: 0741713

It will make use of the two new properties in the incoming heartbeat request sending from FOS (at earliest 7.0.2)

Example request body:
{"instance": "i-0bd8fd4d2a36a841f", "interval": 30, "sequence":968, "time":"2021-08-19T04:40:38Z"}

use the new properties to determine heartbeat sequence and timing in order to keep or discard a certain heartbeat request.

the change should be compatible with existing FOS versions (7.0.1 and older) which send the heartbeat request in the format:
{"instance": "i-0bd8fd4d2a36a841f", "interval": 30}

code should not break without the presence of the new properties.

(for AWS) simplify the nic attachment process as AWS has support for multiple nic deployment in ASG launch template

see it here: Amazon EC2 Auto Scaling now supports attaching multiple network interfaces at launch

remove caching of load settings

internal issue id: 0753007

incorrect value showed in logs for heartbeat sync

internal issue id: 0733213

the code snippet below for the logs for heartbeat sync result displays an incorrect heartbeat expected arrive time:

in the file: autoscale-context.ts, line 307 to 316:

this.proxy.logAsInfo(
Heartbeat sync result: ${this.result}, +
heartbeat sequence: ${oldSeq}->${targetHealthCheckRecord.seq}, +
heartbeat expected arrive time: ${targetHealthCheckRecord.nextHeartbeatTime} ms, +
heartbeat actual arrive time: ${heartbeatArriveTime} ms, +
heartbeat delay allowance: ${delayAllowance} ms, +
heartbeat calculated delay: ${delay} ms, +
heartbeat interval: ${oldInterval}->${newInterval} ms, +
heartbeat loss count: ${oldLossCount}->${newLossCount}.
);

the heartbeat expected arrive time actually displays the time for the next heartbeat but it should display the time for the current heartbeat so it results in confusions for people during troubleshootings.

also suggest that change heartbeat calculated delay: ${delay} ms to heartbeat calculated delay: XXXX ms (after deducted XXXX delay allowance). Hope that it can give a much clearer sense for whoever is inspecting the logs.

heartbeat timing calculation will base on previous arrival time

heatbeat calculation will for next heartbeat will base on the current arrival time.
then save it to the db for reference to be used by the next heartbeat.

license assignment bug

this issue is reproduced in the azure implementation (3.3.2).

the outcome:
one FortiGate is assigned a license which is physically deleted from the license file storage location.

steps to reproduce:

have 2 license files uploaded to the storage. (license A and B)
keep the scaling group 0 vm before scaling out
scale out 1 BYOL vm and wait for it to be assigned a license. (say, A)
scale in this vm and wait for a moment (say, 10 minutes)
physically delete the license file A
scale out 1 BYOL vm and wait for it to be assigned a license.
check the license assignment, if the lisence assigned during step 6 is B, repeat step 4, then repeat step 6 and 7, otherwise, move on to step 8
there is the chance to see the license A is still assigned, which shouldn't be allowed since the license file is physically unavailable.

expected behavior:
physically deleted license file should not be assigned in any case.

introduce the 'user-custom' configset directory

Scenario:
The project provides a set of pre-configured configset files by default for the initial deployment. However, in the real life, users should be able to adjust the initial deployment to meet their needs. To allow for the Autoscale project to maintain the pre-configured configset from version to version, we'd also like to extend the project beyond this point to provide the capability of user customization.

The idea is:
Autoscale will look for the 'user-custom' sub-directory within the configset location. If the 'user-custom' directory exists, all configset files inside it will be loaded.

This change should be implemented in the core level. Every platform-specific implement will inherit it without additional changes in the code.

out-of-sync vm recovery (when vm termination is disabled)

If vm termination is disabled, we allow the out-of-sync vm to become in-sync if the following conditions are met:

the vm is still sending heartbeat request to the Autoscale.
all those heartbeat requests arrive on time.
the total number of all those heartbeat requests arrived on time reaches a health check recovery threshold (set in the Autoscale setting)
no late heartbeat happened during the recovering, or the count of on-time heartbeats will be reset, and the recoverying will start over.

aws sns integration for unhealthy vm instance

as a following improvement for #62 , it publishes an sns message to subscribers for the unhealthy vm instance.

add handling for vmss scaling events to improve accuracy

internal issue id: 0739446

background:
When a vm has been deleted, the corsesponding autoscale record still remains in the DB. The record should be ideally deleted.
And there's API cache to reduce the ARM API call consumption, if a vm no longer exists, such cache should be ideally deleted.

solution:
create Azure VMSS alerts rules that is triggered by vmss activity logs and sends events to the Autoscale handler Azure function.
add handling in the function for each of such events (can be developed one by one).

initially, the first event to be handled should be action Microsoft.Compute/virtualMachineScaleSets/delete/action

[azure] custom asset directory updates

the custom-asset-container, and custom-asset-directory need some changes regarding the following (incorrect) behaviors:

current behavior:
the custom-asset-container is now fixed to 'configet'. that means if users want to use the custom asset feature, all custom assets must be placed inside the 'configset' container in the storage account.

expected behavior:
should be also able to customize the container instead of using the fixed 'configset' container

deal with vm ip changes conditions

there are conditions that cause vm IP (private or public) to change.

if ip changed on the primary vm, any impact on the current primary election? will it start a new primary election? will the current secondary still be able to connect to the primary?

3.2.5 causes webpack error in dependent project

the exports in the package.json will cause webpack error. see the following code block:
"exports": { ".": "./dist/index.js", "./azure/": "./dist/azure/", "./aws/": "./dist/aws/" },

The resolution is removing the trailing / on both key and value of "./azure/": "./dist/azure/", and "./aws/": "./dist/aws/"

Allow for target configuration based on primary / secondary load balancer status

The ability to configure the fortigate based on if the instance is a primary or a secondary instance would allow for the configuration on a per instance basis, for instance ip-sec vpn tunnels.

allow to keep unhealthy vm in the scaling groups instead of terminating it

add a toggle in the settings to allow for keeping the unhealthy vms in the auto scaling group instead of terminating them (terminating is currently enforced by default)

Add extra debug logs for other service processing time

internal bug id: 0735941

add extra log output for other services called within one function invocation to show the indepth processing time of each service call.

0735911 - Lambda function is unable to update the delay interval properly

internal bug id: 0735911

new hb interval not saving to the db.

multiple primary records can be observed that happened at almost the same time

relate to fortinet/fortigate-autoscale-azure#34

add one Autoscale Setting for each of external load balancer ip and internal load balancer ip

background:

currently the autoscale settings can save the loadbalancer dns in the db. These load balancer dns are used for the AWS load balancing solution.

It requires to save load balancer ip in order to support the Autoscaling + load balancing in Azure platform.

Primary election notification email shows incorrect information

internal issue id: 0768171

There are 3 issues with the primary election notification email message

There is a typo in the email subject:

Autoscale Primary Election Occurred (Sucess)

The primary VM ID is frequently incorrect
The primary VM IP address(internal) is frequently incorrect(if 2 is true)

Attached is the Email received from amazon sns service. The Email is showing an incorrect VM as the primary.

To Reproduce:

Trigger an event that will start the primary election process(terminate primary vm, reboot, shutdown)
After the primary election is complete, a notification email will be sent to the email address supplied with the template

make no-vm-termination can work without the new setting item 'TerminateUnhealthyVm'

#62 requires a new setting item TerminateUnhealthyVm (terminate-unhealthy-vm). Making the 'no-vm-termination' action default and not depending on the new setting item is the solution for backward compatibility with v3.2.8.

Replacing the source code of v3.2.8 with the latest code in the running lambda function is the simplest way to incorporate this feature.

a brand new vm elected as the new primary will lead to existing configuration loss

internal issue id: 0768444
issue description:
There is the case that a set of running vms in the cluster, with one primary and multiple secondary device all in-sync properly. The configuration is modifed and different from the initial bootstrap configuration. The configuration is current, working in good effects.
If a new device is scaled out and it happens to be elected as the next primary device, the configuration in other device will be gone due to that the new primary is set the primary before able to sync the latest configuration from others. As the consequence, the configuration on the new primary, which is just the initial bootstrap configuration, will be synced aross the secondary device.

expected behaviour:
such new vm must be excluded from primary election, assigned the secondary role, wait until it has been in-sync.

[azure] peer invocation module failed for missing function key

issue:
peer invocation between functions within the same the Functioin App does not include access keys properly so such invocations fail if the Function App has set a secured access level (e.g. function-level, or admin-level)

expected:
peer invocations will be made with a proper access key.

improve primary election facilitating the device sync info

depends on the completion of #84