Code Monkey home page Code Monkey logo

terraform-aws-eks's Introduction

AWS EKS Terraform module

Terraform module which creates AWS EKS (Kubernetes) resources

SWUbanner

External Documentation

Please note that we strive to provide a comprehensive suite of documentation for configuring and utilizing the module(s) defined here, and that documentation regarding EKS (including EKS managed node group, self managed node group, and Fargate profile) and/or Kubernetes features, usage, etc. are better left up to their respective sources:

Reference Architecture

The examples provided under examples/ provide a comprehensive suite of configurations that demonstrate nearly all of the possible different configurations and settings that can be used with this module. However, these examples are not representative of clusters that you would normally find in use for production workloads. For reference architectures that utilize this module, please see the following:

Usage

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "my-cluster"
  cluster_version = "1.29"

  cluster_endpoint_public_access  = true

  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
    }
  }

  vpc_id                   = "vpc-1234556abcdef"
  subnet_ids               = ["subnet-abcde012", "subnet-bcde012a", "subnet-fghi345a"]
  control_plane_subnet_ids = ["subnet-xyzde987", "subnet-slkjf456", "subnet-qeiru789"]

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    instance_types = ["m6i.large", "m5.large", "m5n.large", "m5zn.large"]
  }

  eks_managed_node_groups = {
    example = {
      min_size     = 1
      max_size     = 10
      desired_size = 1

      instance_types = ["t3.large"]
      capacity_type  = "SPOT"
    }
  }

  # Cluster access entry
  # To add the current caller identity as an administrator
  enable_cluster_creator_admin_permissions = true

  access_entries = {
    # One access entry with a policy associated
    example = {
      kubernetes_groups = []
      principal_arn     = "arn:aws:iam::123456789012:role/something"

      policy_associations = {
        example = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"
          access_scope = {
            namespaces = ["default"]
            type       = "namespace"
          }
        }
      }
    }
  }

  tags = {
    Environment = "dev"
    Terraform   = "true"
  }
}

Cluster Access Entry

When enabling authentication_mode = "API_AND_CONFIG_MAP", EKS will automatically create an access entry for the IAM role(s) used by managed nodegroup(s) and Fargate profile(s). There are no additional actions required by users. For self-managed nodegroups and the Karpenter sub-module, this project automatically adds the access entry on behalf of users so there are no additional actions required by users.

On clusters that were created prior to CAM support, there will be an existing access entry for the cluster creator. This was previously not visible when using aws-auth ConfigMap, but will become visible when access entry is enabled.

Bootstrap Cluster Creator Admin Permissions

Setting the bootstrap_cluster_creator_admin_permissions is a one time operation when the cluster is created; it cannot be modified later through the EKS API. In this project we are hardcoding this to false. If users wish to achieve the same functionality, we will do that through an access entry which can be enabled or disabled at any time of their choosing using the variable enable_cluster_creator_admin_permissions

Enabling EFA Support

When enabling EFA support via enable_efa_support = true, there are two locations this can be specified - one at the cluster level, and one at the nodegroup level. Enabling at the cluster level will add the EFA required ingress/egress rules to the shared security group created for the nodegroup(s). Enabling at the nodegroup level will do the following (per nodegroup where enabled):

  1. All EFA interfaces supported by the instance will be exposed on the launch template used by the nodegroup
  2. A placement group with strategy = "clustered" per EFA requirements is created and passed to the launch template used by the nodegroup
  3. Data sources will reverse lookup the availability zones that support the instance type selected based on the subnets provided, ensuring that only the associated subnets are passed to the launch template and therefore used by the placement group. This avoids the placement group being created in an availability zone that does not support the instance type selected.

Tip

Use the aws-efa-k8s-device-plugin Helm chart to expose the EFA interfaces on the nodes as an extended resource, and allow pods to request the interfaces be mounted to their containers.

The EKS AL2 GPU AMI comes with the necessary EFA components pre-installed - you just need to expose the EFA devices on the nodes via their launch templates, ensure the required EFA security group rules are in place, and deploy the aws-efa-k8s-device-plugin in order to start utilizing EFA within your cluster. Your application container will need to have the necessary libraries and runtime in order to utilize communication over the EFA interfaces (NCCL, aws-ofi-nccl, hwloc, libfabric, aws-neuornx-collectives, CUDA, etc.).

If you disable the creation and use of the managed nodegroup custom launch template (create_launch_template = false and/or use_custom_launch_template = false), this will interfere with the EFA functionality provided. In addition, if you do not supply an instance_type for self-managed nodegroup(s), or instance_types for the managed nodegroup(s), this will also interfere with the functionality. In order to support the EFA functionality provided by enable_efa_support = true, you must utilize the custom launch template created/provided by this module, and supply an instance_type/instance_types for the respective nodegroup.

The logic behind supporting EFA uses a data source to lookup the instance type to retrieve the number of interfaces that the instance supports in order to enumerate and expose those interfaces on the launch template created. For managed nodegroups where a list of instance types are supported, the first instance type in the list is used to calculate the number of EFA interfaces supported. Mixing instance types with varying number of interfaces is not recommended for EFA (or in some cases, mixing instance types is not supported - i.e. - p5.48xlarge and p4d.24xlarge). In addition to exposing the EFA interfaces and updating the security group rules, a placement group is created per the EFA requirements and only the availability zones that support the instance type selected are used in the subnets provided to the nodegroup.

In order to enable EFA support, you will have to specify enable_efa_support = true on both the cluster and each nodegroup that you wish to enable EFA support for:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  # Truncated for brevity ...

  # Adds the EFA required security group rules to the shared
  # security group created for the nodegroup(s)
  enable_efa_support = true

  eks_managed_node_groups = {
    example = {
      instance_types = ["p5.48xlarge"]

      # Exposes all EFA interfaces on the launch template created by the nodegroup(s)
      # This would expose all 32 EFA interfaces for the p5.48xlarge instance type
      enable_efa_support = true

      pre_bootstrap_user_data = <<-EOT
        # Mount NVME instance store volumes since they are typically
        # available on instance types that support EFA
        setup-local-disks raid0
      EOT

      # EFA should only be enabled when connecting 2 or more nodes
      # Do not use EFA on a single node workload
      min_size     = 2
      max_size     = 10
      desired_size = 2
    }
  }
}

Examples

Contributing

We are grateful to the community for contributing bugfixes and improvements! Please see below to learn how you can take part.

Requirements

Name Version
terraform >= 1.3.2
aws >= 5.40
time >= 0.9
tls >= 3.0

Providers

Name Version
aws >= 5.40
time >= 0.9
tls >= 3.0

Modules

Name Source Version
eks_managed_node_group ./modules/eks-managed-node-group n/a
fargate_profile ./modules/fargate-profile n/a
kms terraform-aws-modules/kms/aws 2.1.0
self_managed_node_group ./modules/self-managed-node-group n/a

Resources

Name Type
aws_cloudwatch_log_group.this resource
aws_ec2_tag.cluster_primary_security_group resource
aws_eks_access_entry.this resource
aws_eks_access_policy_association.this resource
aws_eks_addon.before_compute resource
aws_eks_addon.this resource
aws_eks_cluster.this resource
aws_eks_identity_provider_config.this resource
aws_iam_openid_connect_provider.oidc_provider resource
aws_iam_policy.cluster_encryption resource
aws_iam_policy.cni_ipv6_policy resource
aws_iam_role.this resource
aws_iam_role_policy_attachment.additional resource
aws_iam_role_policy_attachment.cluster_encryption resource
aws_iam_role_policy_attachment.this resource
aws_security_group.cluster resource
aws_security_group.node resource
aws_security_group_rule.cluster resource
aws_security_group_rule.node resource
time_sleep.this resource
aws_caller_identity.current data source
aws_eks_addon_version.this data source
aws_iam_policy_document.assume_role_policy data source
aws_iam_policy_document.cni_ipv6_policy data source
aws_iam_session_context.current data source
aws_partition.current data source
tls_certificate.this data source

Inputs

Name Description Type Default Required
access_entries Map of access entries to add to the cluster any {} no
attach_cluster_encryption_policy Indicates whether or not to attach an additional policy for the cluster IAM role to utilize the encryption key provided bool true no
authentication_mode The authentication mode for the cluster. Valid values are CONFIG_MAP, API or API_AND_CONFIG_MAP string "API_AND_CONFIG_MAP" no
cloudwatch_log_group_class Specified the log class of the log group. Possible values are: STANDARD or INFREQUENT_ACCESS string null no
cloudwatch_log_group_kms_key_id If a KMS Key ARN is set, this key will be used to encrypt the corresponding log group. Please be sure that the KMS Key has an appropriate key policy (https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html) string null no
cloudwatch_log_group_retention_in_days Number of days to retain log events. Default retention - 90 days number 90 no
cloudwatch_log_group_tags A map of additional tags to add to the cloudwatch log group created map(string) {} no
cluster_additional_security_group_ids List of additional, externally created security group IDs to attach to the cluster control plane list(string) [] no
cluster_addons Map of cluster addon configurations to enable for the cluster. Addon name can be the map keys or set with name any {} no
cluster_addons_timeouts Create, update, and delete timeout configurations for the cluster addons map(string) {} no
cluster_enabled_log_types A list of the desired control plane logs to enable. For more information, see Amazon EKS Control Plane Logging documentation (https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html) list(string)
[
"audit",
"api",
"authenticator"
]
no
cluster_encryption_config Configuration block with encryption configuration for the cluster. To disable secret encryption, set this value to {} any
{
"resources": [
"secrets"
]
}
no
cluster_encryption_policy_description Description of the cluster encryption policy created string "Cluster encryption policy to allow cluster role to utilize CMK provided" no
cluster_encryption_policy_name Name to use on cluster encryption policy created string null no
cluster_encryption_policy_path Cluster encryption policy path string null no
cluster_encryption_policy_tags A map of additional tags to add to the cluster encryption policy created map(string) {} no
cluster_encryption_policy_use_name_prefix Determines whether cluster encryption policy name (cluster_encryption_policy_name) is used as a prefix bool true no
cluster_endpoint_private_access Indicates whether or not the Amazon EKS private API server endpoint is enabled bool true no
cluster_endpoint_public_access Indicates whether or not the Amazon EKS public API server endpoint is enabled bool false no
cluster_endpoint_public_access_cidrs List of CIDR blocks which can access the Amazon EKS public API server endpoint list(string)
[
"0.0.0.0/0"
]
no
cluster_identity_providers Map of cluster identity provider configurations to enable for the cluster. Note - this is different/separate from IRSA any {} no
cluster_ip_family The IP family used to assign Kubernetes pod and service addresses. Valid values are ipv4 (default) and ipv6. You can only specify an IP family when you create a cluster, changing this value will force a new cluster to be created string "ipv4" no
cluster_name Name of the EKS cluster string "" no
cluster_security_group_additional_rules List of additional security group rules to add to the cluster security group created. Set source_node_security_group = true inside rules to set the node_security_group as source any {} no
cluster_security_group_description Description of the cluster security group created string "EKS cluster security group" no
cluster_security_group_id Existing security group ID to be attached to the cluster string "" no
cluster_security_group_name Name to use on cluster security group created string null no
cluster_security_group_tags A map of additional tags to add to the cluster security group created map(string) {} no
cluster_security_group_use_name_prefix Determines whether cluster security group name (cluster_security_group_name) is used as a prefix bool true no
cluster_service_ipv4_cidr The CIDR block to assign Kubernetes service IP addresses from. If you don't specify a block, Kubernetes assigns addresses from either the 10.100.0.0/16 or 172.20.0.0/16 CIDR blocks string null no
cluster_service_ipv6_cidr The CIDR block to assign Kubernetes pod and service IP addresses from if ipv6 was specified when the cluster was created. Kubernetes assigns service addresses from the unique local address range (fc00::/7) because you can't specify a custom IPv6 CIDR block when you create the cluster string null no
cluster_tags A map of additional tags to add to the cluster map(string) {} no
cluster_timeouts Create, update, and delete timeout configurations for the cluster map(string) {} no
cluster_version Kubernetes <major>.<minor> version to use for the EKS cluster (i.e.: 1.27) string null no
control_plane_subnet_ids A list of subnet IDs where the EKS cluster control plane (ENIs) will be provisioned. Used for expanding the pool of subnets used by nodes/node groups without replacing the EKS control plane list(string) [] no
create Controls if resources should be created (affects nearly all resources) bool true no
create_cloudwatch_log_group Determines whether a log group is created by this module for the cluster logs. If not, AWS will automatically create one if logging is enabled bool true no
create_cluster_primary_security_group_tags Indicates whether or not to tag the cluster's primary security group. This security group is created by the EKS service, not the module, and therefore tagging is handled after cluster creation bool true no
create_cluster_security_group Determines if a security group is created for the cluster. Note: the EKS service creates a primary security group for the cluster by default bool true no
create_cni_ipv6_iam_policy Determines whether to create an AmazonEKS_CNI_IPv6_Policy bool false no
create_iam_role Determines whether a an IAM role is created or to use an existing IAM role bool true no
create_kms_key Controls if a KMS key for cluster encryption should be created bool true no
create_node_security_group Determines whether to create a security group for the node groups or use the existing node_security_group_id bool true no
custom_oidc_thumbprints Additional list of server certificate thumbprints for the OpenID Connect (OIDC) identity provider's server certificate(s) list(string) [] no
dataplane_wait_duration Duration to wait after the EKS cluster has become active before creating the dataplane components (EKS managed nodegroup(s), self-managed nodegroup(s), Fargate profile(s)) string "30s" no
eks_managed_node_group_defaults Map of EKS managed node group default configurations any {} no
eks_managed_node_groups Map of EKS managed node group definitions to create any {} no
enable_cluster_creator_admin_permissions Indicates whether or not to add the cluster creator (the identity used by Terraform) as an administrator via access entry bool false no
enable_efa_support Determines whether to enable Elastic Fabric Adapter (EFA) support bool false no
enable_irsa Determines whether to create an OpenID Connect Provider for EKS to enable IRSA bool true no
enable_kms_key_rotation Specifies whether key rotation is enabled bool true no
fargate_profile_defaults Map of Fargate Profile default configurations any {} no
fargate_profiles Map of Fargate Profile definitions to create any {} no
iam_role_additional_policies Additional policies to be added to the IAM role map(string) {} no
iam_role_arn Existing IAM role ARN for the cluster. Required if create_iam_role is set to false string null no
iam_role_description Description of the role string null no
iam_role_name Name to use on IAM role created string null no
iam_role_path Cluster IAM role path string null no
iam_role_permissions_boundary ARN of the policy that is used to set the permissions boundary for the IAM role string null no
iam_role_tags A map of additional tags to add to the IAM role created map(string) {} no
iam_role_use_name_prefix Determines whether the IAM role name (iam_role_name) is used as a prefix bool true no
include_oidc_root_ca_thumbprint Determines whether to include the root CA thumbprint in the OpenID Connect (OIDC) identity provider's server certificate(s) bool true no
kms_key_administrators A list of IAM ARNs for key administrators. If no value is provided, the current caller identity is used to ensure at least one key admin is available list(string) [] no
kms_key_aliases A list of aliases to create. Note - due to the use of toset(), values must be static strings and not computed values list(string) [] no
kms_key_deletion_window_in_days The waiting period, specified in number of days. After the waiting period ends, AWS KMS deletes the KMS key. If you specify a value, it must be between 7 and 30, inclusive. If you do not specify a value, it defaults to 30 number null no
kms_key_description The description of the key as viewed in AWS console string null no
kms_key_enable_default_policy Specifies whether to enable the default key policy bool true no
kms_key_override_policy_documents List of IAM policy documents that are merged together into the exported document. In merging, statements with non-blank sids will override statements with the same sid list(string) [] no
kms_key_owners A list of IAM ARNs for those who will have full key permissions (kms:*) list(string) [] no
kms_key_service_users A list of IAM ARNs for key service users list(string) [] no
kms_key_source_policy_documents List of IAM policy documents that are merged together into the exported document. Statements must have unique sids list(string) [] no
kms_key_users A list of IAM ARNs for key users list(string) [] no
node_security_group_additional_rules List of additional security group rules to add to the node security group created. Set source_cluster_security_group = true inside rules to set the cluster_security_group as source any {} no
node_security_group_description Description of the node security group created string "EKS node shared security group" no
node_security_group_enable_recommended_rules Determines whether to enable recommended security group rules for the node security group created. This includes node-to-node TCP ingress on ephemeral ports and allows all egress traffic bool true no
node_security_group_id ID of an existing security group to attach to the node groups created string "" no
node_security_group_name Name to use on node security group created string null no
node_security_group_tags A map of additional tags to add to the node security group created map(string) {} no
node_security_group_use_name_prefix Determines whether node security group name (node_security_group_name) is used as a prefix bool true no
openid_connect_audiences List of OpenID Connect audience client IDs to add to the IRSA provider list(string) [] no
outpost_config Configuration for the AWS Outpost to provision the cluster on any {} no
prefix_separator The separator to use between the prefix and the generated timestamp for resource names string "-" no
putin_khuylo Do you agree that Putin doesn't respect Ukrainian sovereignty and territorial integrity? More info: https://en.wikipedia.org/wiki/Putin_khuylo! bool true no
self_managed_node_group_defaults Map of self-managed node group default configurations any {} no
self_managed_node_groups Map of self-managed node group definitions to create any {} no
subnet_ids A list of subnet IDs where the nodes/node groups will be provisioned. If control_plane_subnet_ids is not provided, the EKS cluster control plane (ENIs) will be provisioned in these subnets list(string) [] no
tags A map of tags to add to all resources map(string) {} no
vpc_id ID of the VPC where the cluster security group will be provisioned string null no

Outputs

Name Description
access_entries Map of access entries created and their attributes
access_policy_associations Map of eks cluster access policy associations created and their attributes
cloudwatch_log_group_arn Arn of cloudwatch log group created
cloudwatch_log_group_name Name of cloudwatch log group created
cluster_addons Map of attribute maps for all EKS cluster addons enabled
cluster_arn The Amazon Resource Name (ARN) of the cluster
cluster_certificate_authority_data Base64 encoded certificate data required to communicate with the cluster
cluster_endpoint Endpoint for your Kubernetes API server
cluster_iam_role_arn IAM role ARN of the EKS cluster
cluster_iam_role_name IAM role name of the EKS cluster
cluster_iam_role_unique_id Stable and unique string identifying the IAM role
cluster_id The ID of the EKS cluster. Note: currently a value is returned only for local EKS clusters created on Outposts
cluster_identity_providers Map of attribute maps for all EKS identity providers enabled
cluster_ip_family The IP family used by the cluster (e.g. ipv4 or ipv6)
cluster_name The name of the EKS cluster
cluster_oidc_issuer_url The URL on the EKS cluster for the OpenID Connect identity provider
cluster_platform_version Platform version for the cluster
cluster_primary_security_group_id Cluster security group that was created by Amazon EKS for the cluster. Managed node groups use this security group for control-plane-to-data-plane communication. Referred to as 'Cluster security group' in the EKS console
cluster_security_group_arn Amazon Resource Name (ARN) of the cluster security group
cluster_security_group_id ID of the cluster security group
cluster_service_cidr The CIDR block where Kubernetes pod and service IP addresses are assigned from
cluster_status Status of the EKS cluster. One of CREATING, ACTIVE, DELETING, FAILED
cluster_tls_certificate_sha1_fingerprint The SHA1 fingerprint of the public key of the cluster's certificate
cluster_version The Kubernetes version for the cluster
eks_managed_node_groups Map of attribute maps for all EKS managed node groups created
eks_managed_node_groups_autoscaling_group_names List of the autoscaling group names created by EKS managed node groups
fargate_profiles Map of attribute maps for all EKS Fargate Profiles created
kms_key_arn The Amazon Resource Name (ARN) of the key
kms_key_id The globally unique identifier for the key
kms_key_policy The IAM resource policy set on the key
node_security_group_arn Amazon Resource Name (ARN) of the node shared security group
node_security_group_id ID of the node shared security group
oidc_provider The OpenID Connect identity provider (issuer URL without leading https://)
oidc_provider_arn The ARN of the OIDC Provider if enable_irsa = true
self_managed_node_groups Map of attribute maps for all self managed node groups created
self_managed_node_groups_autoscaling_group_names List of the autoscaling group names created by self-managed node groups

License

Apache 2 Licensed. See LICENSE for full details.

Additional information for users from Russia and Belarus

terraform-aws-eks's People

Contributors

aliartiza75 avatar antonbabenko avatar archifleks avatar barryib avatar betajobot avatar brandonjbjelland avatar bryantbiggs avatar bshelton229 avatar chenrui333 avatar daroga0002 avatar dpiddockcmp avatar erks avatar huddy avatar jimbeck avatar laverya avatar max-rocket-internet avatar nauxliu avatar ozbillwang avatar rothandrew avatar sc250024 avatar sdavids13 avatar semantic-release-bot avatar shanmugakarna avatar sidprak avatar sppwf avatar stefansedich avatar stevehipwell avatar stijndehaes avatar tculp avatar yutachaos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-aws-eks's Issues

root_block_device missed

I have issues

running EKS cluster for a week, get disk space issue, need feature to control and extend with larger size.

I'm submitting a...

  • [X ] feature request

What is the current behavior?

No control on root block device, disk space is used out quickly

What's the expected behavior?

The root_block_device mapping supports the following:

volume_type - (Optional) The type of volume. Can be "standard", "gp2", or "io1". (Default: "standard").
volume_size - (Optional) The size of the volume in gigabytes.
iops - (Optional) The amount of provisioned IOPS. This must be set with a volume_type of "io1".
delete_on_termination - (Optional) Whether the volume should be destroyed on instance termination (Default: true).

Currently, only enable delete_on_termination

Are you able to fix this problem and submit a PR? Link here if you have already.

Yes, I will

Environment details

  • Affected module version: v1.1.0
  • OS: Ubuntu
  • Terraform version: 0.11.7

Any other relevant info

Is there a way to modify the aws-k8s-cni yaml before creating the worker groups?

I have issues

The AWS CNI by default pre-allocates the max number of IPs per node which results in unnecessary depletion of my IP pool. As of CNI 1.1, you can fix this by setting WARM_IP_TARGET in the aws-k8s-cni.yaml but this needs to be applied before the EC2 instances are created.

Is there a way I can have Terraform apply a k8s config between creating the cluster and creating the worker groups? My current workaround is specifying 0 nodes in the Terraform module, applying my custom aws-k8s-cni.yaml, then changing the worker node count to my actual desired number.

Thanks!

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

The cluster is created using the release aws-k8s-cni.yaml which does not have WARM_IP_TARGET set.

If this is a bug, how to reproduce? Please include a code sample if relevvant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version: 1.3.0
  • OS: MacOS
  • Terraform version: v0.11.7

Any other relevant info

Feature Request: Key Pair for Worker Nodes

I'm submitting a

  • feature request

What is the current behavior

I want to access the worker nodes via SSH using a Key Pair (Private-Public Key). However there is no option which allows me to specify a key pair to be installed on the worker nodes.

What's the expected behavior

Provide an option to specify a key pair (local or already in AWS) and installed it on the worker nodes. It should be possible to set the key name via variable. For inspiration have a look at Terraform-DC/OS module: https://github.com/dcos/terraform-dcos/tree/master/aws#configure-aws-ssh-keys

aws_auth config fails to apply while getting started

I have issues

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

Terraform apply fails with

* module.eks.null_resource.update_config_map_aws_auth: Error running command 'kubectl apply -f ./config-map-aws-auth_beam-eks.yaml --kubeconfig ./kubeconfig_beam-eks': exit status 1. Output: error: unable to recognize "./config-map-aws-auth_beam-eks.yaml": Unauthorized

If this is a bug, how to reproduce? Please include a code sample if relevvant.

This is my configuration for the eks module.

I have a really basic vpc created via terraform-aws-modules/vpc/aws.

module "eks" {
  source       = "terraform-aws-modules/eks/aws"
  cluster_name = "beam-eks"
  subnets      = "${module.vpc.public_subnets}"
  vpc_id       = "${module.vpc.vpc_id}"
}

What's the expected behavior?

Apply succeeds

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version: "1.4.0"
  • OS: MacOS 10.13.3 (17D47)
    Terraform v0.11.8
  • provider.aws v1.33.0
  • provider.http v1.0.1
  • provider.local v1.1.0
  • provider.null v1.0.0
  • provider.template v1.0.0

Generated config not saved to correct output path

I have issues with when generated config-map is output to a folder

I'm submitting a...

[ * ] bug report

What is the current behavior?

The generated config-map-aws-auth***.yaml and kubeconfig.yaml file is saved to the root folder, the file name is appended with the value of the config_output_path variable

If this is a bug, how to reproduce? Please include a code sample if relevvant.

Create a cluster with a custom config_output_path

What's the expected behavior?

The config-map-aws-auth***.yaml file should be saved to the config_output_path

Are you able to fix this problem and submit a PR? Link here if you have already.

aws_auth.tf
line 3 - missing /
filename = "${var.config_output_path}/config-map-aws-auth_${var.cluster_name}.yaml"

line 9 - missing /
command = "kubectl apply -f ${var.config_output_path}/config-map-aws-auth_${var.cluster_name}.yaml --kubeconfig ${var.config_output_path}/kubeconfig_${var.cluster_name}"

kubectl.tf
line 3 - missing /
filename = "${var.config_output_path}/kubeconfig_${var.cluster_name}"

Feature Request: Worker Configuration

I'm submitting a

  • feature request

What is the current behavior

Worker node number and instance type cannot be configured.

What's the expected behavior

Configuration options for worker node number and instance type can be specified in module inputs.

Error running command to update_config_map_aws_auth

I have issues

Please Help! I ran everything with defaults other then setting the VPC and Subnet

I'm submitting a...

  • bug report
  • feature request
  • [X ] support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

I get this error running the Terraform apply
Error: Error applying plan:

1 error(s) occurred:

  • null_resource.update_config_map_aws_auth: Error running command 'kubectl apply -f .//config-map-aws-auth_EKSClusterTest.yaml --kubeconfig .//kubec
    onfig_EKSClusterTest': exit status 1. Output: Unable to connect to the server: getting token: exec: exec: "aws-iam-authenticator": executable file n
    ot found in %PATH%

If this is a bug, how to reproduce? Please include a code sample if relevvant.

What's the expected behavior?

Set the config on the instance

Are you able to fix this problem and submit a PR? Link here if you have already.

No can I please have help?

Environment details

  • Affected module version:
  • OS:
  • Terraform version:

Any other relevant info

First time setting this up!

ASG workers on spot instances

I have issues

It is great that with this module I can use more than one ASG worker pool. Would be nice to be able to use spot instances i.e. for background jobs or any applications that can recover very fast from replaced node.

I'm submitting a...

  • feature request

What is the current behavior?

Cannot use spot instances as worker nodes (or at least do not know how)

What's the expected behavior?

I could define that one (or all) of my worker nodes ASG are using spot instances.

Cluster and worker security group specification doesn't work

I have issues

I am creating an eks cluster with providing a cluster_security_group_id and worker_security_group_id.

I'm submitting a

  • bug report
  • feature request
  • support request

What is the current behavior

When specifying a security group aka cluster_security_group_id = "sg-123" or worker_security_group_id = "sg-123". I get

Error: Error running plan: 1 error(s) occurred:
module.eks.local.cluster_security_group_id: local.cluster_security_group_id: Resource 'aws_security_group.cluster' not found for variable 'aws_security_group.cluster.id'`<br>

If this is a bug, how to reproduce? Please include a code sample

Create an eks cluster with a cluster_security_group_id or worker_security_group_id specified.

Terraform does not support short circut evaluation in it's ternary operator. The fix to the issue is specified here

What's the expected behavior

We should be able to create an EKS cluster while specifying cluster sg or worker sg as the documentation currently specifies.

Environment

  • Affected module version: 1.1.0
  • OS:
  • Terraform version: 0.11.7

Other relevant info

AMI eks-worker-* query returned no results

I have issues

I'm submitting a...

  • bug report

What is the current behavior?

If region is set to us-west-1:

Error: Error refreshing state: 1 error(s) occurred:

  • module.eks.data.aws_ami.eks_worker: 1 error(s) occurred:

  • module.eks.data.aws_ami.eks_worker: data.aws_ami.eks_worker: Your query returned no results. Please change your search criteria and try again.

If this is a bug, how to reproduce? Please include a code sample if relevant.

module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "test-eks-cluster"
vpc_id = "${module.vpc.default_vpc_id}"
subnets = "${module.vpc.public_subnets}"

tags = {
Environment = "test"
Terraform = "true"
}
}

What's the expected behavior?

An available AWS AMI ID.

Are you able to fix this problem and submit a PR? Link here if you have already.

Not sure how.

Environment details

  • Affected module version: 1.3.0
  • OS: Linux
  • Terraform version: 0.11.7

Any other relevant info

Worker ASG names should be exposed

I have issues

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

Worker ASG ARNs are exposed, but not names. ASG names are used in aws_autoscaling_attachment among other things.

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Both are exposed

Are you able to fix this problem and submit a PR? Link here if you have already.

PR: #77

Environment details

  • Affected module version:
  • OS:
  • Terraform version:

Any other relevant info

No any logs exported to Cloudwatch

I have issues

I have difficulty to troubleshooting EKS nodes issue, for example, OutOfDisk issue. I go to cloudwatch, there is no any instance logs or /var/log/message logs.

OutOfDisk Unknown Fri, 13 Jul 2018 00:11:01 +0000 Fri, 13 Jul 2018 00:11:43 +0000 NodeStatusUnknown Kubelet stopped posting node status.

Currently there is no key pair set and I can't login the eks nodes to do further check.

I'm submitting a...

  • feature request
  • support request

What is the current behavior?

No any logs from EKS cluster.

What's the expected behavior?

I need some ways to review the logs when something happened.

Are you able to fix this problem and submit a PR? Link here if you have already.

not sure how to fix this issue, need help.

Environment details

  • Affected module version: v1.1.0
  • OS: Ubuntu
  • Terraform version: 0.11.7

Any other relevant info

Experience with blue/green using this module?

I have issues

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

Currently, in order to achieve blue/green deployment with worker groups (i.e. updating to a new AMI), I have to add a new worker group with the updated AMI, let them spin up, drain the old nodes so pods transition, then scale down the old worker group (set min/max/desired to 0).

This is not a terrible way of doing it but the problem is that the old ASG (and related resources) sticks around forever and there doesn't seem to be a way to clean up the old stuff without major surgery. If I change the AMI 3 times, I now have 3 worker groups - 2 inactive and scaled to 0 and one active.

Is there a better way of doing this with this module? There's a distinct possibility I'm missing some fundamental terraform concepts but this seems like a complex issue to me. My code ends up looking like this after a new worker group is fully deployed and the old is scaled down (you can see how even semi-frequent deployments would make this list long and leave a lot of trailing garbage):

                  map(
                      "name", "k8s-worker-179fc16f",
                      "ami_id", "ami-179fc16f",
                      "asg_desired_capacity", "0",
                      "asg_max_size", "0",
                      "asg_min_size", "0",
                      ),
                  map(
                      "name", "k8s-worker-67a0841f",
                      "ami_id", "ami-67a0841f",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","${lookup(var.worker_sizes, "${terraform.workspace}")}",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}",
                      "root_volume_size", "48"
                      )

Environment details

  • Affected module version: latest with customizations
  • OS: OSX and AL2
  • Terraform version: 0.11.7

Any other relevant info

I have seen other code around the internet that does blue/green ASGs but those are for much simpler use-cases IMO - a create_before_destroy and letting it rip would bring a K8s cluster down. I have no qualms with multiple apply steps - its the cleanup part that I'm after.

It seems the cluster it is running with Authorization enabled (like RBAC) and there is no permissions for the ingress controller. Please check the configuration

I have issues when deploy alb-ingress-controller

It seems the cluster it is running with Authorization enabled (like RBAC) and there is no permissions for the ingress controller. Please check the configuration

I'm submitting a

  • bug report
  • support request

What is the current behavior

I can't deploy alb-ingress-controller with the above error.

If this is a bug, how to reproduce? Please include a code sample

I am not 100% sure it is a bug.

After create EKS cluster with this module, I went through the steps to step 4, I got this error.

This step has additional help, but not sure how to set in EKS cluster with this module.

Deploy the modified alb-ingress-controller.

$ kubectl apply -f alb-ingress-controller.yaml

The manifest above will deploy the controller to the kube-system namespace. If you deploy it outside of kube-system and are using RBAC, you may need to adjust RBAC roles and bindings.

What's the expected behavior

Should work without error.

Environment

  • Affected module version: latest (a80c6e6)
  • OS: AWS EKS
  • Terraform version: 0.11.7

Other relevant info

Avoid using hardcoded value for max pod per node

Right now in the user-data script we have

sed -i s,MAX_PODS,20,g /etc/systemd/system/kubelet.service

The value 20 is hardcoded right now. Since AWS released the numbers in their CloudFormation template, I think we can extract the value and use a lookup function to get the proper value.

A proposal:

locals {
  # Mapping from the node type that we selected and the max number of pods that it can run
  # Taken from https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-06-05/amazon-eks-nodegroup.yaml
  max_pod_per_node = {
    c4.large    = 29
    c4.xlarge   = 58
    c4.2xlarge  = 58
    c4.4xlarge  = 234
    c4.8xlarge  = 234
    c5.large    = 29
    c5.xlarge   = 58
    c5.2xlarge  = 58
    c5.4xlarge  = 234
    c5.9xlarge  = 234
    c5.18xlarge = 737
    i3.large    = 29
    i3.xlarge   = 58
    i3.2xlarge  = 58
    i3.4xlarge  = 234
    i3.8xlarge  = 234
    i3.16xlarge = 737
    m3.medium   = 12
    m3.large    = 29
    m3.xlarge   = 58
    m3.2xlarge  = 118
    m4.large    = 20
    m4.xlarge   = 58
    m4.2xlarge  = 58
    m4.4xlarge  = 234
    m4.10xlarge = 234
    m5.large    = 29
    m5.xlarge   = 58
    m5.2xlarge  = 58
    m5.4xlarge  = 234
    m5.12xlarge = 234
    m5.24xlarge = 737
    p2.xlarge   = 58
    p2.8xlarge  = 234
    p2.16xlarge = 234
    p3.2xlarge  = 58
    p3.8xlarge  = 234
    p3.16xlarge = 234
    r3.xlarge   = 58
    r3.2xlarge  = 58
    r3.4xlarge  = 234
    r3.8xlarge  = 234
    r4.large    = 29
    r4.xlarge   = 58
    r4.2xlarge  = 58
    r4.4xlarge  = 234
    r4.8xlarge  = 234
    r4.16xlarge = 737
    t2.small    = 8
    t2.medium   = 17
    t2.large    = 35
    t2.xlarge   = 44
    t2.2xlarge  = 44
    x1.16xlarge = 234
    x1.32xlarge = 234
  }

  workers_userdata = <<USERDATA
#!/bin/bash -xe
CA_CERTIFICATE_DIRECTORY=/etc/kubernetes/pki
CA_CERTIFICATE_FILE_PATH=$CA_CERTIFICATE_DIRECTORY/ca.crt
mkdir -p $CA_CERTIFICATE_DIRECTORY
echo "${aws_eks_cluster.this.certificate_authority.0.data}" | base64 -d >  $CA_CERTIFICATE_FILE_PATH
INTERNAL_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)
sed -i s,MASTER_ENDPOINT,${aws_eks_cluster.this.endpoint},g /var/lib/kubelet/kubeconfig
sed -i s,CLUSTER_NAME,${var.cluster_name},g /var/lib/kubelet/kubeconfig
sed -i s,REGION,${data.aws_region.current.name},g /etc/systemd/system/kubelet.service
sed -i s,MAX_PODS,${lookup(local.max_pod_per_node, var. workers_instance_type)},g /etc/systemd/system/kubelet.service
sed -i s,MASTER_ENDPOINT,${aws_eks_cluster.this.endpoint},g /etc/systemd/system/kubelet.service
sed -i s,INTERNAL_IP,$INTERNAL_IP,g /etc/systemd/system/kubelet.service
DNS_CLUSTER_IP=10.100.0.10
if [[ $INTERNAL_IP == 10.* ]] ; then DNS_CLUSTER_IP=172.20.0.10; fi
sed -i s,DNS_CLUSTER_IP,$DNS_CLUSTER_IP,g /etc/systemd/system/kubelet.service
sed -i s,CERTIFICATE_AUTHORITY_FILE,$CA_CERTIFICATE_FILE_PATH,g /var/lib/kubelet/kubeconfig
sed -i s,CLIENT_CA_FILE,$CA_CERTIFICATE_FILE_PATH,g  /etc/systemd/system/kubelet.service
systemctl daemon-reload
systemctl restart kubelet kube-proxy
USERDATA
}

@brandoconnor Please let me know if this is OK, I'll create a fork and a pull request later

Bug - EKS can not create load balancers after module provisioned in new AWS account

I have issues

Provisioning EKS cluster in new AWS account will result in an error when attempting to provision a load balancer if no load balancers of any kind have been provisioned before.

I'm submitting a...

  • bug report

What is the current behavior?

No previous load balancers ( i.e. service-link role AWSServiceRoleForElasticLoadBalancing doesn't exist)

AccessDenied: User: <MODULE-PROVISIONED-ROLE> is not authorized to perform: iam:CreateServiceLinkedRole on resource: arn:aws:iam::<ACCOUNT-ID>:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing

because EKS is attempting to create the ELB service-link role for you, and the roles created by the module lack iam:CreateServiceLinkedRole

If this is a bug, how to reproduce? Please include a code sample if relevant.

  • Provision EKS cluster using module into new account (or ensure service-link role AWSServiceRoleForElasticLoadBalancing doesn't exist)
  • Attempt to provision a Service of type LoadBalancer via kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1 
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15.2
        ports:
        - containerPort: 80
---
kind: Service
apiVersion: v1
metadata:
  name: nginxservice
spec:
  type : LoadBalancer
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80

What's the expected behavior?

EKS should provision load balancer.

Module should optionally provision (via flag) a resource "aws_iam_service_linked_role", or include updated IAM policies (iam:CreateServiceLinkedRole) to allow the EKS cluster to provision the required service-link role. Alternatively, if this is deemed not the responsibility of the module, the "Assumptions" section in README.md should note the issue.

Are you able to fix this problem and submit a PR? Link here if you have already.

Possibly, depending on the choice of solution (implementation change, documentation update)

Environment details

  • Affected module version: All

Any other relevant info

AWS Service Link FAQ:
https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/elb-service-linked-roles.html#create-service-linked-role

Security group "workers_ingress_cluster" is very limiting

Currently, in workers.tf, we have this security group:

resource "aws_security_group_rule" "workers_ingress_cluster" {
  description              = "Allow workers Kubelets and pods to receive communication from the cluster control plane."
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.workers.id}"
  source_security_group_id = "${local.cluster_security_group_id}"
  from_port                = 1025
  to_port                  = 65535
  type                     = "ingress"
  count                    = "${var.worker_security_group_id == "" ? 1 : 0}"
}

Basically, this setting makes it impossible for Kubernetes services to access pods that have have containerPort set to anything below 1025, which is a huge issue since so many of them use the 80 port (e.g. nginx). So, from_port should be set to 0, not 1025.

I realize this is copied from CloudFormation in the official EKS guide, so I'll also submit an issue there.

Allow worker nodes to be created in private subnets if eks cluster has both private and public subnets

I have issues

I'm submitting a

  • bug report
  • [x ] feature request
  • support request

What is the current behavior

Based on this guide from aws, it is recommended that you specify both public and private subnets when creating your eks cluster, but that you only create your worker nodes in your private subnets. Current behaviour in this module will use the same subnets for creating the eks cluster as for placing the worker nodes within.

If this is a bug, how to reproduce? Please include a code sample

What's the expected behavior

I believe it would be a good feature to add an additional (optional) list variable to the module called worker_subnets that will be used to create the worker nodes within. This means you can add private and public subnets to the subnets variable, but only add private subnets to the worker_subnets variable.

Environment

  • Affected module version:
  • OS:
  • Terraform version:

Other relevant info

I have a branch with this feature on a fork, I will add a PR to be looked at.

Support for the new amazon-eks-node-* AMI with bootstrap script

I have issues

The new amazon-eks-node-* AMI with bootstrap script has been released. However, it's not backward compatible with the old AMI and doesn't work with this module.

https://aws.amazon.com/blogs/opensource/improvements-eks-worker-node-provisioning/

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

This module only works with the eks-woker-* AMIs.

If this is a bug, how to reproduce? Please include a code sample if relevvant.

N/A

What's the expected behavior?

This module should also work with the new amazon-eks-node-* AMI. The entire userdata.sh.tpl can be reduced to something like this:

# Allow user supplied pre userdata code
${pre_userdata}

# Bootstrap and join the cluster
/etc/eks/bootstrap.sh --b64-cluster-ca '${cluster_auth_base64}' --apiserver-endpoint '${endpoint}' --kubelet-extra-args '${kubelet_extra_args}' '${cluster_name}'

# Allow user supplied userdata code
${additional_userdata}

Are you able to fix this problem and submit a PR? Link here if you have already.

I can contribute, but would like to discuss on how we want to approach backward compatibility first.

Environment details

  • Affected module version: 1.4.0
  • OS: all
  • Terraform version: all

Any other relevant info

See:

Fix for AWS EKS β€œis not authorized to perform: iam:CreateServiceLinkedRole”

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

After deployingeks via this TF module in a brand new AWS account, the internet-facing k8s service I created could not create a load balancer. Turns out it's because this is a brand new AWS account and no ELB has been created in it before and the AWS user guide (as well as this module) assumes that AWSServiceRoleForElasticLoadBalancing already exists.

https://stackoverflow.com/questions/51597410/aws-eks-is-not-authorized-to-perform-iamcreateservicelinkedrole

Recommend adding

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "iam:CreateServiceLinkedRole",
                "Resource": "arn:aws:iam::*:role/aws-service-role/*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "ec2:DescribeAccountAttributes"
                ],
                "Resource": "*"
            }
        ]
    }

To the cluster role policy.

asg size changes should be ignored.

I have issues

asg size changes should be ignored.

I'm submitting a

  • feature request

What is the current behavior

Updated asg sized after deploy, terraform apply detects the changes, which should be ignored.

At least change indesired_capacity should be ignored.

  ~ module.eks.aws_autoscaling_group.workers
      desired_capacity:         "2" => "1"
      max_size:                 "5" => "3"
      min_size:                 "2" => "1"

What's the expected behaviour

Ignore the changes, since we don't want the running system to be re-sized.

Environment

  • Affected module version: 1.0.0
  • OS: ubuntu
  • Terraform version: 0.11.7

Other relevant info

If you are fine to ignore change in desired_capacity, I can raise PR for this feature, please confirm.

kube-proxy doesn't exist in the latest AWS worker node AMI

I have issues

I'm submitting a

  • bug report

What is the current behavior

kube-proxy doesn't exist in the latest AWS worker node AMI, but the userdata teamplate try to restart it, that will encounter error as below

Failed to restart kube-proxy.service: Unit not found.

What's the expected behavior

Remove the kube-proxy from restart step

Automatic deployment of Cluster Autoscaler

I have issues

Although worker nodes are deployed as autoscaling group, when EKS cannot schedule more pods because of missing resources i.e. CPU more nodes are not started by ASG. Would be nice to deploy Cluster Autoscaler automatically (or at least put some information in README how to do this) so we can benefit from ASG.

I'm submitting a...

  • feature request

What is the current behavior?

ASG workers are not started even if EKS cannot schedule more pods because of missing CPU and we are still below the maximum size of workers ASG.

What's the expected behaviour?

The expected behaviour would be:

  1. deploy Cluster Autoscaler: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md
    (
    this permissions needs to be added for EC2 EKS IAM role:
"autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"

)
2. Scale sample application so it needs more than whole CPU from single VM.
3. See that Autoscaler is adding more nodes.

Environment details

ESK in us-east-1

  • Terraform version:
    v0.11.7

AWS Profile in kubeconfig template

I'm submitting a

  • feature request

For my current delivery, the customer has credentials with multiple profiles and not only needs to specify different profiles per cluster, but has no default profile.

It would be great if the kubeconfig.tpl could be modified:

...
users:
- name: aws
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: heptio-authenticator-aws
      args:
        - "token"
        - "-i"
        - "${cluster_name}"
      env:
        - name: AWS_PROFILE
          value: ${aws_profile}

where the default Terraform value to populate the template would be default to ensure no regression.

I wanted to start a discussion before a PR to ensure best path forward on this. Thanks!

Bring your own security group

I have issues...

I'm submitting a

  • bug report
  • feature request
  • support request

What is the current behavior

Currently the module only supports creation of the security groups for the cluster and workers from within the module itself. Some of the rules are 100% necessary and others are just commonplace and therefore useful. The rules rely on dynamic values but could be applied just the same to a security group passed to the module instead of created within the module. This would give flexibility to the module consumer to provide their own security group and a predefined set of rules that might be tighter than what the module currently prescribes.

If this is a bug, how to reproduce? Please include a code sample

NA

What's the expected behavior

The module should be able to accept a security group ID as input for both the cluster and workers with rules defined outside the module.

Environment

  • Affected module version: current (0.2.0)
  • OS: All
  • Terraform version: 0.11.x

Assign public IPs to EKS workers in private subnets.

I have issues

I created EKS cluster in private subnets, we also have discussed about this topic in several tickets, we agree to create EKS workers in private subnets only.

Now it's time to decide, should we keep the feature to assign public IP to EKS workers?

If it is not required any more, I will raise PR to remove this line directly. Otherwise, I have to update with a condition, which way you like?

resource "aws_launch_configuration" "workers" {
  name_prefix                 = "${var.cluster_name}-${lookup(var.worker_groups[count.index], "name", count.index)}"
-  associate_public_ip_address = "${lookup(var.worker_groups[count.index], "public_ip", lookup(var.workers_group_defaults, "public_ip"))}"
  security_groups             = ["${local.worker_security_group_id}"]
  iam_instance_profile        = "${aws_iam_instance_profile.workers.id}"
  image_id                    = "${lookup(var.worker_groups[count.index], "ami_id", data.aws_ami.eks_worker.id)}"
  instance_type               = "${lookup(var.worker_groups[count.index], "instance_type", lookup(var.workers_group_defaults, "instance_type"))}"

I'm submitting a...

  • bug report
  • feature request

What is the current behavior?

When create workers in private subnets, public IPs are assigned to these workers.

Are you able to fix this problem and submit a PR? Link here if you have already.

Yes, I will

Environment details

  • Affected module version: v1.1.0
  • OS: ubuntu
  • Terraform version: 0.11.7

Any other relevant info

Specify multiple cluster/worker security groups

I have issues

I'm submitting a

  • bug report
  • feature request
  • support request

What is the current behavior

The EKS plugin currently supports being able to pass in 1 cluster and worker security group by id.

If this is a bug, how to reproduce? Please include a code sample

What's the expected behavior

I think it would make sense to support specifying an array of security group ids.

Environment

  • Affected module version:
  • OS:
  • Terraform version:

Other relevant info

We have a use case where we need to attach multiple security groups. Some of which are predefined.

Be able to define per ASGs tags

I have issues

Tags currently are too prescriptive. I have a use case where I need to tag different ASGs with different tags. Im using the ability to push these tags down to node labels and taints to drive different workloads on my kubernetes cluster. Atm, it seems that I can only define tags once on the top level EKS module and these tags are used globally throughout. I would like to be able to define tags per ASGs. A sensible place to provide these seems to be the list of worker_groups maps.

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

Tags are defined once in var.tags and used throughout both to tag the cluster resources itself, as well as, all ASGs that are created.

If this is a bug, how to reproduce? Please include a code sample if relevvant.

What's the expected behavior?

Should be able to provide tags in the list of worker_groups maps and these should be used to tagging the corresponding ASGs that are created for each respective worker group. If the tags are not set, they can default to the existing top-level global tag variable that is used.

Are you able to fix this problem and submit a PR? Link here if you have already.

I can submit a PR if this is reasonable.

Environment details

  • Affected module version:
  • OS:
  • Terraform version:

Any other relevant info

How to launch worker in private subnet

In the getting started example

module "vpc" {
  source             = "terraform-aws-modules/vpc/aws"
  version            = "1.14.0"
  name               = "test-vpc"
  cidr               = "10.0.0.0/16"
  azs                = ["${data.aws_availability_zones.available.names[0]}", "${data.aws_availability_zones.available.names[1]}", "${data.aws_availability_zones.available.names[2]}"]
  private_subnets    = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets     = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
  enable_nat_gateway = true
  single_nat_gateway = true
  tags               = "${merge(local.tags, map("kubernetes.io/cluster/${local.cluster_name}", "shared"))}"
}

module "eks" {
  source             = "../.."
  cluster_name       = "${local.cluster_name}"
  subnets            = ["${module.vpc.public_subnets}", "${module.vpc.private_subnets}"]
  tags               = "${local.tags}"
  vpc_id             = "${module.vpc.vpc_id}"
  worker_groups      = "${local.worker_groups}"
  worker_group_count = "1"
  map_roles          = "${var.map_roles}"
  map_users          = "${var.map_users}"
  map_accounts       = "${var.map_accounts}"
}

Both private and public subnets are passed to eks module as a single variable. How does eks module determine which subnets are public and which subnets are private and thus launch worker into private subnets only?

investigate adding create_before_destroy to worker asg to prevent downtime when recreating

I have issues

to change the instance type

I'm submitting a

  • bug report

What is the current behavior

* module.eks.aws_launch_configuration.workers: 1 error(s) occurred:

* aws_launch_configuration.workers: Error creating launch configuration: AlreadyExists: Launch Configuration by this name already exists - A launch configuration already exists with the name eks-path-prod-0
	status code: 400, request id: xxx-xxx-xxx-xxx-xxx

If this is a bug, how to reproduce? Please include a code sample

deploy EKS cluster, then change the instance type and apply again.

What's the expected behavior

Should be no issue.

Environment

  • Affected module version: 1.0.0
  • OS: Ubuntu
  • Terraform version: 0.11.7

Other relevant info

Assumption Missing: Install Kubectl

I have issues

I'm submitting a

  • bug report

What is the current behavior

There is no mention of the requirement to have kubectl installed before running the script. The module will fail while applying the plan.

Error: Error applying plan:

1 error(s) occurred:

* module.eks.null_resource.configure_kubectl: Error running command 'kubectl apply -f .//config-map-aws-auth.yaml --kubeconfig .//kubeconfig': exit status 127. Output: /bin/sh: 1: kubectl: not found

What's the expected behavior

Have the a note and a link to the install instructions in the Assumption section of the README.md.

Cluster DNS does not function

I have issues

I'm submitting a...

  • support request

DNS

How is cluster DNS supposed to work? I have not been able to get pods to resolve any cluster addresses (including kubernetes.default) using EKS. I suspect it's a function of how the AWS VPC CNI works (or doesn't) and figured other people using this module must be running into the same problem however I can't seem to find much on the internet about this in EKS.

locals {
  worker_groups = "${list(
                  map(
                      "name", "k8s-worker",
                      "ami_id", "ami-73a6e20b",
                      "asg_desired_capacity", "5",
                      "asg_max_size", "8",
                      "asg_min_size", "5",
                      "instance_type","m4.large",
                      "key_name", "${aws_key_pair.infra-deployer.key_name}"
                      ),
  )}"
  tags = "${map("Environment", "${terraform.workspace}")}"
}

data "aws_vpc" "vpc" {
  filter {
    name   = "tag:env"
    values = ["${terraform.workspace}"]
  }

  filter {
    name   = "tag:Name"
    values = ["${terraform.workspace}-us-west-2"]
  }
}

data "aws_subnet_ids" "eks_subnets" {
  vpc_id = "${data.aws_vpc.vpc.id}"

  tags {
    env  = "${terraform.workspace}"
    Name = "${terraform.workspace}-eks*"
  }
}

module "eks" {
  source                = "terraform-aws-modules/eks/aws"
  cluster_name          = "${terraform.workspace}"
  subnets               = "${data.aws_subnet_ids.eks_subnets.ids}"
  vpc_id                = "${data.aws_vpc.vpc.id}"
  kubeconfig_aws_authenticator_env_variables = "${map("AWS_PROFILE", "infra-deployer" )}"
  map_accounts          = ["${lookup(var.aws_account_ids, "prod")}"]
  worker_groups         = "${local.worker_groups}"
  tags                  = "${local.tags}"
}

Trying DNS on a brand new cluster:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:		172.20.0.10
Address:	172.20.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- cat /etc/resolv.conf
nameserver 172.20.0.10
search default.svc.cluster.local svc.cluster.local cluster.local staging.thinklumo.com us-west-2.compute.internal
options ndots:5

I've tried too many things to list here and at this point suspect its an issue with EKS so I'm hoping someone has been down this path already.

Are you able to fix this problem and submit a PR? Link here if you have already.

N/A

Environment details

  • Affected module version: latest
  • OS: AL2
  • Terraform version:
Terraform v0.11.7
+ provider.aws v1.25.0

Any other relevant info

Using computed values in worker group parameters results in `value of 'count' cannot be computed` error

I have issues

I'm submitting a...

  • bug report

What is the current behavior?

terraform plan produces this output when any worker group parameters are computed values:

laverya:~/dev$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.http.workstation_external_ip: Refreshing state...
data.aws_region.current: Refreshing state...
data.aws_availability_zones.available: Refreshing state...
data.aws_iam_policy_document.cluster_assume_role_policy: Refreshing state...
data.aws_iam_policy_document.workers_assume_role_policy: Refreshing state...
data.aws_ami.eks_worker: Refreshing state...

Error: Error refreshing state: 1 error(s) occurred:

* module.eks.data.template_file.userdata: data.template_file.userdata: value of 'count' cannot be computed

If this is a bug, how to reproduce? Please include a code sample if relevant.

provider "aws" {
  version = "~> 1.27"
  region  = "us-east-1"
}

data "aws_availability_zones" "available" {}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "1.37.0"
  name    = "eks-vpc"
  cidr    = "10.0.0.0/16"
  azs     = ["${data.aws_availability_zones.available.names[0]}", "${data.aws_availability_zones.available.names[1]}", "${data.aws_availability_zones.available.names[2]}"]

  private_subnets    = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets     = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]

  tags = "${map("kubernetes.io/cluster/terraform-eks", "shared")}"
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "1.3.0"
  cluster_name = "terraform-eks"
  subnets = ["${module.vpc.private_subnets}", "${module.vpc.public_subnets}"]
  tags    = "${map("Environment", "test")}"
  vpc_id = "${module.vpc.vpc_id}"

  worker_groups = [
    {
      name          = "default-m5-large"
      instance_type = "m5.large"

      subnets = ""
      # subnets = "${join(",", module.vpc.private_subnets)}"
    },
  ]
}

Uncomment subnets = "${join(",", module.vpc.private_subnets)}" to replace subnets = "" in the worker_groups config and run terraform plan.

What's the expected behavior?

terraform plan completes and a plan is produced.

Are you able to fix this problem and submit a PR? Link here if you have already.

I have not yet identified the root cause.

Environment details

  • Affected module version: 1.3.0
  • OS: Ubuntu 16.04
  • Terraform version: Terraform v0.11.7

Any other relevant info

This makes it rather difficult to assign subnets to worker groups.

Should we manage k8s resources with this module?

This is a general question about the direction of this module.

We get requests that would require this module to manage or create Kubernetes resources. Some examples:

  • Modifying CNI configuration before worker ASG creation: #96
  • Deploying cluster autoscaler: #71
  • Manage add-ons: #19

I think we should have a clear position on this types of issues.

Include autoscaling related IAM policies for workers for the cluster-autoscaler

Currently we have to add the policy outside this module but I think 90% of people will use the cluster-autoscaler so it would be cool to have it included in this module and perhaps enabled with a variable.
kops currently has this by default here.

The policy would look something like this:

data "aws_iam_policy_document" "eks_node_autoscaling" {
  statement {
    sid    = "eksDemoNodeAll"
    effect = "Allow"

    actions = [
      "autoscaling:DescribeAutoScalingGroups",
      "autoscaling:DescribeAutoScalingInstances",
      "autoscaling:DescribeLaunchConfigurations",
      "autoscaling:DescribeTags",
      "autoscaling:GetAsgForInstance",
    ]

    resources = ["*"]
  }

  statement {
    sid    = "eksDemoNodeOwn"
    effect = "Allow"

    actions = [
      "autoscaling:SetDesiredCapacity",
      "autoscaling:TerminateInstanceInAutoScalingGroup",
      "autoscaling:UpdateAutoScalingGroup",
    ]

    resources = ["*"]

    condition {
      test     = "StringEquals"
      variable = "autoscaling:ResourceTag/Name"
      values   = ["xxxx-eks_asg"]
    }
  }
}

This allows would allow the cluster-autoscaler the access it needs to run correctly.

What do you think?

Allow adding new users, roles, and accounts to the configmap/aws-auth

I have issues

Amazon's EKS access control is managed via the aws-auth configmap which allows multiple IAM users and roles (cross-account capable) to be granted group membership. The current implementation only allows worker node access, this should be configurable to allow more access control rules to be specified per the documentation: https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html

I'm submitting a

  • bug report
  • feature request
  • support request

What is the current behavior

The current implementation only allows worker node access.

If this is a bug, how to reproduce? Please include a code sample

What's the expected behavior

Ability to specify role/user/account mappings for group membership.

Environment

  • Affected module version: 1.1.0
  • OS: Linux
  • Terraform version: 0.11.7

Other relevant info

We should ignore changes to node ASG desired_capacity

I have issues

I'm submitting a

  • feature request

The reason is that after cluster creation, almost everyone will run the k8s node autoscaler. This autoscaler is changing the desired_capacity to suit resources required by the cluster. So when the cluster autoscales, then TF is run again later, you see something like this:

  ~ module.cluster_1.aws_autoscaling_group.workers
      desired_capacity:   "5" => "3"

You can just add a lifecycle statement to resource aws_autoscaling_group.workers:

  lifecycle {
    ignore_changes = [ "desired_capacity" ]
  }

Workstation cidr possibly not doing what's intended?

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevvant.

This is much more a question than an issue. I see the workstation cidr being allowed to access 443 in the security group attached to the eks cluster. I see the same thing in the terraform eks getting started post. My assumption was this would limit access to the kubernetes api (control plane) to that cidr. It doesn't do that and these control planes are fully accessible on the internet. Was the intention to allow only that cidr to access the control plane? I really wish that were possible. I'm likely completely missing the reason to allow that ingress.

What's the expected behavior?

I expected the control plane to be limited to the IP address in the cidr. My expectation may be completely wrong in which case maybe a different variable description could help.

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version:
  • OS:
  • Terraform version:

Any other relevant info

How to define nodeSelector with autoscaling?

I have issues

On how to define nodeSelector with autoscaling?

I'm submitting a

  • support request

What is the current behavior

pods can be deployed to any nodes.

What's the expected behavior

I can manually set nodeSelector with label command to several nodes, but in autoscaling environment, how to work it out?

I found there are codes about worker_groups, but not sure how to use for labelling.

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_test_fixture/main.tf#L19-L34

  # the commented out worker group list below shows an example of how to define
  # multiple worker groups of differing configurations
  # worker_groups = "${list(
  #                   map("asg_desired_capacity", "2",
  #                       "asg_max_size", "10",
  #                       "asg_min_size", "2",
  #                       "instance_type", "m4.xlarge",
  #                       "name", "worker_group_a",
  #                   ),
  #                   map("asg_desired_capacity", "1",
  #                       "asg_max_size", "5",
  #                       "asg_min_size", "1",
  #                       "instance_type", "m4.2xlarge",
  #                       "name", "worker_group_b",
  #                   ),
  # )}"

Environment

  • Affected module version: 1,0.0
  • OS: Ubuntu
  • Terraform version: 0.11.7

Other relevant info

Name tags are too prescriptive; allow them more flexibility but provide sensible defaults

I have issues

I'm submitting a

  • bug report
  • feature request
  • support request

What is the current behavior

It's not possible to define what the Name tag of any resource will absolutely be. This should be able to be user defined.

Other relevant info

I think another variable map containing tag_defaults or a local (since computing is needed) variable will come in handy here. Will explore in next week's cycles.

it works with multiple worker groups in one EKS, thanks.

I need thank for the hide feature that I can manage multiple worker groups in one EKS, test version is v1.3.0.

So below codes work properly.
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_test_fixture/main.tf#L19-L34

My use case is, I need manage two groups of nodes, one for applications, one for monitoring service only. Later will add more node groups (label for nodeSelector ) for different purpose.

I'm submitting a...

  • kudos, thank you, warm fuzzy

Any other relevant info

Should we uncomment these lines or make another test cases?

Use of name_prefix

Currently we have:

resource "aws_iam_role" "workers" {
  name_prefix        = "${aws_eks_cluster.this.name}"
  assume_role_policy = "${data.aws_iam_policy_document.workers_assume_role_policy.json}"
}

resource "aws_iam_instance_profile" "workers" {
  name_prefix = "${aws_eks_cluster.this.name}"
  role        = "${aws_iam_role.workers.name}"
}

Is there a reason to use name_prefix instead of just name? I ask because the resultant names are things like my-cluster-20180808095045107900000005.

We have to create cross account IAM policies for things like ECR and it would be nice to have a predictable and consistent name for the roles πŸ™‚

Not Clear on EC2PrivateDNSName

I have issues

  • [ * ] support request

Should I be exporting my bastion's Ec2 private DNS NAME before I do terraform apply ?
I used bastion host to provision the EKS cluster
I am not clear on EC2PrivateDNSName variable

can't update launch configuration.

I have issues

Recently I upgrade release from 1.1.0 to 1.3.0 and add some changes to launch configuration, such as key name,

I'm submitting a...

  • bug report

What is the current behavior?

* aws_launch_configuration.workers (deposed #0): ResourceInUse: Cannot delete launch configuration project-prod-020180630105107074900000001 because it is attached to AutoScalingGroup project-prod-monitoring
	status code: 400, request id: 46f32656-8661-11e8-9e77-51ef4818b760

If this is a bug, how to reproduce? Please include a code sample if relevvant.

change ami image id, add/remove key pair name, or other which need re-create a new launch configuration.

What's the expected behavior?

Smoothly updated.

Are you able to fix this problem and submit a PR? Link here if you have already.

I am still investigating this issue, if I can fix, will raise PR.

Environment details

  • Affected module version: v1.1.0 -> v1.3.0
  • OS: Ubuntu
  • Terraform version: 0.11.7

Any other relevant info

Here is the fix someone mentioned:

hashicorp/terraform#532 (comment)

Allow pre-userdata script on worker launch config

I have issues

I want to be able to run additional user data before the plugin user data on the worker launch config.
I am behind a proxy and need to configure the proxy information before anything else happens.

I'm submitting a

  • bug report
  • feature request
  • support request

What is the current behavior

The plugin only provides a way to specify additional user data that runs after the plugins user data.

If this is a bug, how to reproduce? Please include a code sample

What's the expected behavior

Environment

  • Affected module version: 1.1.0
  • OS:
  • Terraform version: 0.11.7

Other relevant info

Override the default ingress rule that allows communication with the EKS cluster API.

I have issues

I would prefer to use the default security groups created for the cluster, but do not want the default API/32 to be used.

I'm submitting a

  • bug report
  • feature request
  • support request

What is the current behavior

Currently, if you use the default security groups it will create a security group role that allows communication with the eks cluster over the current API/32 cidr.

If this is a bug, how to reproduce? Please include a code sample

What's the expected behavior

I want to override the API/32 cidr and specify my own.

Environment

  • Affected module version: 1.1.0
  • OS:
  • Terraform version: 0.11.7

Other relevant info

Better support for multiple clusters

I'm submitting a

  • [] feature request

A couple of changes would be it easier to work with multiple clusters.

  1. Include the cluster name in the file name here by default. This way other clusters won't overwrite the same file.
  2. Include the cluster name in the configuration here. This will make some keys in here unique, which makes it easier to merge the configuration without manual adjustments.

Bug: Module ignores custom AMI ID

I have issues

I want to use the Ubuntu EKS image (custom) AMI and my settings are ignored by the module.

I'm submitting a...

  • bug report

What is the current behaviour?

A custom AMI ID is ignored by the module. I tried to set the AMI ID to the following ami_id in the workers_group_defaults section:

workers_group_defaults = {
      ...
      ami_id               = "ami-39397a46"  # Ubuntu Image
      ...
}

The specified AMI is the Ubuntu EKS image for us-east-1. Ubuntu released a statement that they will support and update an image specifically for EKS. The image AMI ID's can be found here: https://cloud-images.ubuntu.com/aws-eks/?_ga=2.56651242.1343651116.1533683680-508754220.1533683680

I would like to use the Ubuntu image instead of the Amazon AMI to make my environment more portable.

If this is a bug, how to reproduce? Please include a code sample if relevant.

Use try to set the ami_id in the workers_group_defaults to use the Ubuntu EKS AMI ID ami-39397a46 (us-east-1) or ami-6d622015 (us-west-2).

What's the expected behaviour?

Custom AMI ID's can be used for worker nodes. Specifically, Ubuntu EKS can be used with this module.

Are you able to fix this problem and submit a PR? Link here if you have already.

Maybe this is just a misunderstanding or it is super easy to fix. If not let me know.

Environment details

  • Affected module version: 1.4.0
  • OS: Ubuntu 18.04 LTS (Container)
  • Terraform version: v0.11.7

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.