[K8S Resource] ClusterAutoScaler

Notice

Recent Posts

Recent Comments

Link

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

ㅡ.ㅡ

[K8S Resource] ClusterAutoScaler 본문

ETC

[K8S Resource] ClusterAutoScaler

ekwkqk12 2023. 10. 22. 15:21

Kubernetes 클러스터의 노드 수를 자동으로 조정하는 기능을 제공하는 도구로 클러스터 내의 워크로드의 수요에 따라 노드의 수를 확장하거나 축소하여 자원 사용량 최적화를 도와준다.

AWS IAM role

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DescribeTags",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplateVersions"
      ],
      "Resource": ["*"]
    },
    <!-- 축소 시킬 asg 그룹 자동으로 지정 -->
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup",
        "ec2:DescribeImages",
        "ec2:GetInstanceTypesFromInstanceRequirements",
        "eks:DescribeNodegroup"
      ],
      "Resource": ["*"]
    }
        <!-- 축소 시킬 asg 그룹 지정 -->
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
            "autoscaling:ResourceTag/k8s.io/cluster-autoscaler/enabled": "true",
            "aws:ResourceTag/k8s.io/cluster-autoscaler/<my-cluster>": "owned"
        }
      }
    }
  ]
}

Cluster Autoscaler 노드 스케일링 조건

Scale In
API 서버에 scan-interval에 설정된 주기로 불필요한 노드가 있는지 검색 후 노드를 삭제한다.
불필요한 노드란** 노드에 생성된 모든 파드의 request(CPU 및 Mem)의 합계가 노드 할당 가능량의 scale-down-utilization-threshold 값(default : 0.5 = 50%)보다 작을 경우
Scale Out
API 서버에 scan-interval에 설정된 주기로 스케줄링할 수 없는 파드가 있는지 확인 후 스케줄링할 노드를 찾고 없을 경우 노드를 확장한다.
스케줄링할 수 없는 파드란 쿠버네티스 스케줄러가 cpu, memory, pod limit 등의 이유로 파드를 수용할 수 있는 노드를 찾을 수 없는 경우 파드의 Conditions/PodScheduled가 false로 설정된다. 이 과정에서 노드가 스케일 업이 되었을 경우 CA는 요청된 노드가 --max-node-provision-time 설정 값(default : 15m) 이내에 클러스터에 등록되지 않은 경우 중단하고 파드가 여전히 보류 중인 경우 다른 그룹으로 노드 확장을 시도하게 된다.
문제가 발생한 파드에 볼륨이 걸려있을 경우
asg를 존별로 분리해야 스케일 아웃이 적용됨

ASG 검색 모드

자동
확장할 ASG의 태그 값을 기반으로 검색하여 작업(asg의 min값 이하로는 삭제하지 않음)
--auto-discovery=asg:tag=tagKey,anotherTagKey

# cluster-autoscaler-multi-asg.yaml
command:
  - ./cluster-autoscaler
  - --v=4
  - --stderrthreshold=info
  - --cloud-provider=aws
  - --skip-nodes-with-local-storage=false
  - --expander=least-waste
  - --auto-discovery=asg:tag=<tagKey>,<anotherTagKey>

수동
확장할 ASG의 이름을 지정하여 작업
—nodes=<min>:<max>:<asg-name>

# cluster-autoscaler-multi-asg.yaml
command:
  - ./cluster-autoscaler
  - --v=4
  - --stderrthreshold=info
  - --cloud-provider=aws
  - --skip-nodes-with-local-storage=false
  - --expander=least-waste
  - --nodes=1:10:k8s-worker-asg-1
  - --nodes=1:3:k8s-worker-asg-2

ASG Scale In/Out 설정 값

node annotation
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
CA가 해당 annotation이 할당된 파드가 실행중인 노드 미삭제 설정
pod annotation
cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
CA가 해당 annotation이 할당된 노드는 미삭제\
scan-interval(default : 10s)
API 서버에 스케일 업/다운 대상을 확인하는 간격
max-graceful-termination-sec(default : 10m)
노드 내 파드가 안전하게 종료될때까지 기다리는 시간으로 해당 시간이 지나면 그냥 종료시킴
skip-nodes-with-system-pods(default : true)
기본적으로 kube-system에 파드를 실행하는 노드를 삭제하지 않음 → false설정으로 삭제 적용
skip-nodes-with-local-storage(default : true)
로컬 스토리지가 있는 파드가 있는 노드를 삭제하지 않음 → false설정으로 삭제 적용
balance-similar-node-groups
zone별로 구성된 asg그룹의 노드를 균형적으로 유지
scale-down-unneeded-time(default : 10m)
utilization 낮은 상태인 노드가 삭제 대상이 되기전 최소 노드 구동 시간(불필요한 축소 방지)
scale-down-unready-time(default : 20m)
unready 상태인 노드가 삭제 대상이 되기전 최소 노드 구동 시간으로 노드 삭제 조건에 부합할때 scale-down-unready-time값을 무시하고 노드 삭제 작업을 진행
scale-down-delay-after
- add(default : 10m)
  확장 후 삭제 대상 평가 작업이 동작하기 까지의 대기 시간
- delete(default : scan-interval)
  삭제 후 삭제 대상 평가 작업이 동작하기 까지의 대기 시간
- failure(default : 3m)
  삭제 작업이 실패 후 삭제 평가 작업이 동작하기 까지의 대기 시간
expander
- random
  무작위로 ASG를 확장
- most-pods
  가장 많은 양의 파드를 스케줄링하는 ASG를 확장
- least-waste
  가장 적은 양의 CPU/MEM 자원을 낭비하는 ASG를 확장
- priority
  사용자가 지정한 우선 순위가 가장 높은 노드 그룹을 확장

Grafana Dashboard

Reference

https://aws.github.io/aws-eks-best-practices/cluster-autoscaling/
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-is-cluster-autoscaler
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md
https://dev.to/zenika/eks-10-tips-to-reduce-the-bill-up-to-90-on-aws-managed-kubernetes-clusters-epe
https://grafana.com/grafana/dashboards/12623-cluster-autoscaler-stats/ (그라파나 대시보드)

저작자표시

'ETC' 카테고리의 다른 글

[ETC] eBPF (0)	2023.12.23
[K8S Event] kubernetes-event-exporter (0)	2023.10.22
[K8S Resource] Goldilocks (0)	2023.10.22
[K8S Resource] Descheduler (0)	2023.10.22
[AWS] EFS (0)	2021.10.21

'ETC' Related Articles

ㅡ.ㅡ

[K8S Resource] ClusterAutoScaler 본문

[K8S Resource] ClusterAutoScaler

AWS IAM role

Cluster Autoscaler 노드 스케일링 조건

ASG 검색 모드

ASG Scale In/Out 설정 값

Grafana Dashboard

Reference

'ETC' 카테고리의 다른 글

티스토리툴바