Code Monkey home page Code Monkey logo

k8s-autoscale's People

Contributors

ahal avatar bhearsum avatar dandarnell avatar dependabot[bot] avatar escapewindow avatar gbrownmozilla avatar jcristau avatar jfx2006 avatar jmaher avatar johanlorenzo avatar masterwayz avatar nthomas-mozilla avatar srfraser avatar tomprince avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k8s-autoscale's Issues

Noop API request should be skipped

Right now we submit PATCH requests even when the desired amount of replicas is the same with amount of running replicas. It does nothing, but better to skip the entire API call instead.

Lint configs in CI

It would be great to check the configs against some schema, so we don't make silly mistakes,

running tasks get spuriously killed

From time to time we hit an issue where a scriptworker task gets killed: when it periodically scans each pool, k8s-autoscale counts pending tasks and running workers, and if it thinks there are too many running workers it tells k8s to stop them. scriptworker gets SIGUSR1, which tells it to stop after the current task. However if it's not done after terminationGracePeriodSeconds (currently 20 minutes, except for treescript where it's 1 hour), it gets SIGTERM and terminates the running task, which then has to be rerun for no good reason.

Use poll_interval to sleep between runs

ATM we use a hardcoded period to sleep between polls. It would be better to use async and use separate poll_interval (already in the configs) per worker type.

Add a LICENSE file

This Mozilla repository has been identified as lacking a LICENSE.md file. This repository does have licensing information in the README.md file. To make it easier for users (and scanning tools) to find licensing information please add a LICENSE.md file with that information to the root directory of the project.

Mozilla staff can access more information in our Software Licensing Runbook – search for “Licensing Runbook” in Confluence to find it.

If you have any questions you can contact Daniel Nazer who can be reached at dnazer on Mozilla email or Slack.

READMELIC-2023-01

k8s-autoscale hung

the last few log messages were:

{
  "insertId": "n1eotk8hjonkexfb2",
  "jsonPayload": {
    "Type": "k8s_autoscale.main",
    "Fields": {
      "min_replicas": 0,
      "provisioner": "scriptworker-k8s",
      "deployment_name": "bouncer-prod-relengworker-firefoxci-comm-3-1",
      "deployment_namespace": "prod-bouncer",
      "msg": "Handling worker type. Getting the number of running replicas...",
      "worker_type": "comm-3-bouncer"
    },
    "Hostname": "k8s-autoscale-prod-relengworker-app-1-8566fb748b-wskc2",
    "EnvVersion": "2.0",
    "Timestamp": 1600935067087724500,
    "Pid": 1,
    "Severity": 6,
    "Logger": "Dockerflow"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "cluster_name": "relengworker-prod-v1",
      "project_id": "moz-fx-relengworker-prod-a67d",
      "container_name": "k8s-autoscale",
      "location": "us-west1",
      "namespace_name": "prod-k8s-autoscale",
      "pod_name": "k8s-autoscale-prod-relengworker-app-1-8566fb748b-wskc2"
    }
  },
  "timestamp": "2020-09-24T08:11:07.087987874Z",
  "severity": "INFO",
  "labels": {
    "k8s-pod/pod-template-hash": "8566fb748b",
    "k8s-pod/app_kubernetes_io/managed-by": "jenkins",
    "k8s-pod/app_kubernetes_io/part-of": "k8s-autoscale",
    "k8s-pod/fullname": "k8s-autoscale-prod-relengworker-app-1",
    "k8s-pod/jenkins-build-id": "1373",
    "k8s-pod/app_kubernetes_io/name": "k8s-autoscale",
    "k8s-pod/app_kubernetes_io/version": "1.0.0",
    "k8s-pod/app_kubernetes_io/instance": "prod",
    "k8s-pod/app_kubernetes_io/component": "scriptworker"
  },
  "logName": "projects/moz-fx-relengworker-prod-a67d/logs/stdout",
  "receiveTimestamp": "2020-09-24T08:11:12.049170032Z"
}
{
  "insertId": "n1eotk8hjonkexfb3",
  "jsonPayload": {
    "Logger": "Dockerflow",
    "EnvVersion": "2.0",
    "Fields": {
      "running": 0,
      "provisioner": "scriptworker-k8s",
      "deployment_name": "bouncer-prod-relengworker-firefoxci-comm-3-1",
      "min_replicas": 0,
      "msg": "Calculating capacity",
      "worker_type": "comm-3-bouncer",
      "deployment_namespace": "prod-bouncer"
    },
    "Severity": 6,
    "Timestamp": 1600935067106041900,
    "Pid": 1,
    "Hostname": "k8s-autoscale-prod-relengworker-app-1-8566fb748b-wskc2",
    "Type": "k8s_autoscale.main"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "cluster_name": "relengworker-prod-v1",
      "namespace_name": "prod-k8s-autoscale",
      "container_name": "k8s-autoscale",
      "project_id": "moz-fx-relengworker-prod-a67d",
      "location": "us-west1",
      "pod_name": "k8s-autoscale-prod-relengworker-app-1-8566fb748b-wskc2"
    }
  },
  "timestamp": "2020-09-24T08:11:07.111627894Z",
  "severity": "INFO",
  "labels": {
    "k8s-pod/app_kubernetes_io/component": "scriptworker",
    "k8s-pod/app_kubernetes_io/instance": "prod",
    "k8s-pod/pod-template-hash": "8566fb748b",
    "k8s-pod/app_kubernetes_io/name": "k8s-autoscale",
    "k8s-pod/app_kubernetes_io/version": "1.0.0",
    "k8s-pod/app_kubernetes_io/managed-by": "jenkins",
    "k8s-pod/app_kubernetes_io/part-of": "k8s-autoscale",
    "k8s-pod/fullname": "k8s-autoscale-prod-relengworker-app-1",
    "k8s-pod/jenkins-build-id": "1373"
  },
  "logName": "projects/moz-fx-relengworker-prod-a67d/logs/stdout",
  "receiveTimestamp": "2020-09-24T08:11:12.049170032Z"
}
{
  "insertId": "n1eotk8hjonkexfb4",
  "jsonPayload": {
    "Severity": 6,
    "Logger": "Dockerflow",
    "Fields": {
      "provisioner": "scriptworker-k8s",
      "worker_type": "comm-3-bouncer",
      "msg": "Checking pending",
      "deployment_namespace": "prod-bouncer",
      "running": 0,
      "min_replicas": 0,
      "capacity": 1,
      "deployment_name": "bouncer-prod-relengworker-firefoxci-comm-3-1"
    },
    "Type": "k8s_autoscale.main",
    "EnvVersion": "2.0",
    "Timestamp": 1600935067106374100,
    "Pid": 1,
    "Hostname": "k8s-autoscale-prod-relengworker-app-1-8566fb748b-wskc2"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "container_name": "k8s-autoscale",
      "cluster_name": "relengworker-prod-v1",
      "project_id": "moz-fx-relengworker-prod-a67d",
      "location": "us-west1",
      "namespace_name": "prod-k8s-autoscale",
      "pod_name": "k8s-autoscale-prod-relengworker-app-1-8566fb748b-wskc2"
    }
  },
  "timestamp": "2020-09-24T08:11:07.111681044Z",
  "severity": "INFO",
  "labels": {
    "k8s-pod/app_kubernetes_io/component": "scriptworker",
    "k8s-pod/app_kubernetes_io/part-of": "k8s-autoscale",
    "k8s-pod/app_kubernetes_io/managed-by": "jenkins",
    "k8s-pod/app_kubernetes_io/name": "k8s-autoscale",
    "k8s-pod/app_kubernetes_io/instance": "prod",
    "k8s-pod/jenkins-build-id": "1373",
    "k8s-pod/pod-template-hash": "8566fb748b",
    "k8s-pod/fullname": "k8s-autoscale-prod-relengworker-app-1",
    "k8s-pod/app_kubernetes_io/version": "1.0.0"
  },
  "logName": "projects/moz-fx-relengworker-prod-a67d/logs/stdout",
  "receiveTimestamp": "2020-09-24T08:11:12.049170032Z"
}

Add missing comm-1 worker types

I'd like to get to a point where the entire release process can be tested on try-comm-central. There are a couple missing worker types.

comm-1-beetmover
comm-1-bouncer
comm-1-tree

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.