ㅡ.ㅡ

[Monitoring] Prometheus-Stack 본문

Observabillity

[Monitoring] Prometheus-Stack

ekwkqk12 2021. 10. 31. 01:10

Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm pull prometheus-community/kube-prometheus-stack --untar

ktl create ns proemtheus-stack
helm upgrade --install prometheus-stack -f values.yaml ./ -n prometheus-stack

Email 및 Ingress 설정(values.yaml)

etcd, kubecontroller manager, kubeScheduler는 비활성화
최신 EKS 클러스터는 통신 포트가 변경되어 해당 메트릭값들을 받으려면 아래 kube-proxy metrics 설정과 같이 bind address를 변경해줘야한다. 추가로 proemtheus-stack 내 해당 서비스들의 포트와 서비스모니터링 내 https 설정을해줘야하는거 같다.
kube-controller > 10257
kube-scheduler > 10259

## Rule Disable
33 defaultRules:
34   create: true
35   rules:
36     alertmanager: true
37     etcd: false
38     general: true
39     k8s: true
40     kubeApiserver: true
41     kubeApiserverAvailability: true
42     kubeApiserverError: true
43     kubeApiserverSlos: true
44     kubelet: true
45     kubePrometheusGeneral: true
46     kubePrometheusNodeAlerting: true
47     kubePrometheusNodeRecording: true
48     kubernetesAbsent: true
49     kubernetesApps: true
50     kubernetesResources: true
51     kubernetesStorage: true
52     kubernetesSystem: true
53     kubeScheduler: false
54     kubeStateMetrics: true
55     network: true
56     node: true
57     prometheus: true
58     prometheusOperator: true
59     time: true

965 kubeControllerManager:
966   enabled: false

1123 kubeEtcd:
1124   enabled: false

1190 kubeScheduler:
1191   enabled: false

## Email Alert
154   config:
155     global:
156       resolve_timeout: 5m
157       smtp_from: '송신자'
158       smtp_smarthost: smtp.gmail.com:587
159       smtp_auth_username: '아이디'
160       smtp_auth_password: '비밀번호'
161     route:
162       group_by: ['job', 'node', 'namespace', 'pod_name', 'instance', 'alert']
163       group_wait: 30s
164       group_interval: 30s
165       repeat_interval: 12h
166       receiver: 'alert_email'
167     receivers:
168       - name: 'alert_email'
169         email_configs:
170           - to: '수신자'
171     templates:
172     - '/etc/alertmanager/config/*.tmpl'

## AlertManager
210   ingress:
211     enabled: true
212
213     # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
214     # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
215     # ingressClassName: nginx
216
217     annotations:
218       kubernetes.io/ingress.class: alb
219       alb.ingress.kubernetes.io/target-type: ip
220       alb.ingress.kubernetes.io/scheme: internet-facing
221       alb.ingress.kubernetes.io/group.order: '4'
222       alb.ingress.kubernetes.io/group.name: external-alb
223       alb.ingress.kubernetes.io/success-codes: 200,301,302
224
225     labels: {}
226
227     ## Hosts must be provided if Ingress is enabled.
228     ##
229     hosts:
230       - "도메인"
231
232     ## Paths to use for ingress rules - one path should match the alertmanagerSpec.routePrefix
233     ##
234     paths:
235       - "/*"

## Grafana
649   defaultDashboardsTimezone: Asia/Seoul
650   adminPassword: WEB 비밀번호
651
652   ingress:
653     ## If true, Grafana Ingress will be created
654     ##
655     enabled: true
656
657     ## Annotations for Grafana Ingress
658     ##
659     annotations:
660       kubernetes.io/ingress.class: alb
661       alb.ingress.kubernetes.io/target-type: ip
662       alb.ingress.kubernetes.io/scheme: internet-facing
663       alb.ingress.kubernetes.io/group.order: '5'
664       alb.ingress.kubernetes.io/group.name: external-alb
665       alb.ingress.kubernetes.io/success-codes: 200,301,302
666     ## Labels to be added to the Ingress
667     ##
668     labels: {}
669
670     ## Hostnames.
671     ## Must be provided if Ingress is enable.
672     ##
673     hosts:
674       - "도메인"
675     #hosts: []
676
677     ## Pafor grafana ingress
678     path: "/"


## Prometheus
1900   ingress:
1901     enabled: true
1902
1903     # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
1904     # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
1905     # ingressClassName: nginx
1906
1907     annotations:
1908       kubernetes.io/ingress.class: alb
1909       alb.ingress.kubernetes.io/target-type: ip
1910       alb.ingress.kubernetes.io/scheme: internet-facing
1911       alb.ingress.kubernetes.io/group.order: '6'
1912       alb.ingress.kubernetes.io/group.name: external-alb
1913       alb.ingress.kubernetes.io/success-codes: 200,301,302
1914     labels: {}
1915
1916     ## Hostnames.
1917     ## Must be provided if Ingress is enabled.
1918     ##
1919     hosts:
1920       - "도메인"
1921     ## Paths to use for ingress rules - one path should match the prometheusSpec.routePrefix
1922     ##
1923     paths:
1924       - "/*"

kube-proxy metrics 노출

Promethes에서 kube-proxy의 metric 정보를 수집하기위해 bindaddress 값을 수정

kubectl edit cm kube-proxy-config -n kube-system
metricsBindAddress: 127.0.0.1:10249 > metricsBindAddress: 0.0.0.0:10249

CRD 리소스 제거

Helm Chart를 제거하여도 CRD 리소스는 남아있어 해당 리소스도 같이 제거해줘야한다.

kubectl delete crd alertmanagerconfigs.monitoring.coreos.com 
kubectl delete crd alertmanagers.monitoring.coreos.com 
kubectl delete crd podmonitors.monitoring.coreos.com 
kubectl delete crd probes.monitoring.coreos.com 
kubectl delete crd prometheusos.monitoring 
kubectl delete crd prometheusrules.monitoring.coreos.com 
kubectl delete crd servicemonitors.monitoring.coreos.com 
kubectl delete crd thanosrulers.monitoring.coreos.com

metric도 정상적으로 수집되고 Mail도 정상적으로 오지만 Pod Scale시 Alert이 발생하지않는것을 보아 Alert의 기준을 판단하는 rule을 수정해야되는거같다.... 아래 메일 내용은 Alert Test를 위해 Prometheus를 삭제하였더니 오는 내용이다.

)

'Observabillity' 카테고리의 다른 글

[Tracing] Grafana Tempo  (0) 2023.12.20
[Monitoring] JVM Exporter(Micrometer)  (0) 2022.01.20
[Logging] APP Logging/Fluentd&Fluentbit  (0) 2021.10.17
[K8S] Kubernetes Dashboard  (0) 2021.01.30
[Monitoring] snmp_exporter(fortigate)  (0) 2021.01.23