背景微信公众号:运维开发故事,作者:华仔
在之前的文章中,讲解了如何在k8s上安装vm;但采集指标的组件使用的是opentelemetry,那么vm是否有自己的组件去采集指标呢?实际上,vm是拥有自己的组件可以去替代opentelemetry,它就是vmagent。使用它也可以进行指标的收集,再存储到vmstorage。
什么是vmagent看看官方对vmagent的介绍:
vmagent is a tiny but mighty agent which helps you collect metrics from various sources and store them in VictoriaMetrics or any other Prometheus-compatible storage systems that support the remote_write protocol
直译过来就是 vmagent是一个微小但又十分强大的agent,它可以帮助我们从不同的来源处收集指标,将指标存储在vm或者其他支持remote_write协议的prometheus兼容的存储系统。下面是官方给的一个架构图:vmagent有什么特点
支持作为prometheus的临时替代品,用于从比如nodex_exportor抓取数据
支持从kafka读取数据,也可以把数据写入到kafka
支持通过Prometheus relable 添加,删除和修改标签。支持在将数据发送到远程存储之前过滤数据
支持接受通过vm支持的数据注入协议的数据
支持将收集到的数据复制到多个远端存储系统
与prometheus相比,它使用更少的cpu、内存、磁盘io和网络带宽
。。。
如何安装vmagent 说明这里还是使用上一篇文章提到的环境。只保留vmcluster和node exporter,opentelemetry停止掉,清理掉之前的vm数据,就是说现在的vm是没有数据的。直接在k8s上部署vmagent
安装1、创建namespace和rbac(vmagent需要访问k8s API)
# 创建namespace kubectl create ns monitoring-system # 创建vmagent对应的rbac,yaml文件可以在上一篇文章中下载的安装文件中找到 kubectl apply -f vmagent_rbac.yaml # rbac yaml cat vmagent_rbac.yaml apiVersion: v1 kind: ServiceAccount metadata: name: vmagent namespace: monitoring-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: vmagent rules: - apiGroups: ["","networking.k8s.io","extensions"] resources: - nodes - nodes/metrics - services - endpoints - endpointslices - pods - app - ingresses verbs: ["get", "list", "watch"] - apiGroups: [""] resources: - namespaces - configmaps verbs: ["get"] - nonResourceURLs: ["/metrics","/metrics/resources"] verbs: ["get"] - apiGroups: - route.openshift.io - image.openshift.io resources: - routers/metrics - registry/metrics verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: vmagent roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: vmagent subjects: - kind: ServiceAccount name: vmagent namespace: monitoring-system ---
2、安装vmagent
# 安装vmagent kubectl apply -f vmagent.yaml # vmagent.yaml cat vmagent.yaml apiVersion: operator.victoriametrics.com/v1beta1 kind: VMAgent metadata: name: vm-vmagent namespace: monitoring-system spec: selectAllByDefault: true podmetadata: labels: victoriaMetrics: vmagent replicaCount: 1 serviceAccountName: vmagent image: pullPolicy: IfNotPresent repository: images.huazai.com/release/vmagent tag: v1.63.0-cluster resources: requests: cpu: "250m" memory: "350Mi" limits: cpu: "500m" memory: "850Mi" extraArgs: memory.allowedPercent: "40" remoteWrite: - url: "http://lb_vip:8480/insert/0/prometheus/api/v1/write"
3、检查安装结果
[root@kube-control-1 ~]# kubectl get po -n monitoring-system NAME READY STATUS RESTARTS AGE vmagent-vm-vmagent-55bbbd9f6d-497m5 2/2 Running 4 2d17h # 从命令结果来看,该pod有两个容器,而我们定义vmagent的yaml文件时,看起来只设置了一个,那另外一个容器是干嘛的呢? [root@kube-control-1 ~]# kubectl get po -n monitoring-system -o yaml apiVersion: v1 items: - apiVersion: v1 kind: Pod metadata: creationTimestamp: "2022-01-14T08:15:12Z" generateName: vmagent-vm-vmagent-55bbbd9f6d- labels: app.kubernetes.io/component: monitoring app.kubernetes.io/instance: vm-vmagent app.kubernetes.io/name: vmagent managed-by: vm-operator pod-template-hash: 55bbbd9f6d victoriaMetrics: vmagent manager: kube-controller-manager operation: Update time: "2022-01-14T08:15:12Z" name: vmagent-vm-vmagent-55bbbd9f6d-497m5 namespace: monitoring-system spec: containers: - args: - --reload-url=http://localhost:8429/-/reload - --config-envsubst-file=/etc/vmagent/config_out/vmagent.env.yaml - --watched-dir=/etc/vm/relabeling - --config-file=/etc/vmagent/config/vmagent.yaml.gz command: - /bin/prometheus-config-reloader env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name image: quay.io/prometheus-operator/prometheus-config-reloader:v0.48.1 imagePullPolicy: IfNotPresent name: config-reloader resources: limits: cpu: 100m memory: 25Mi requests: cpu: 100m memory: 25Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsonError volumeMounts: - mountPath: /etc/vmagent/config name: config - mountPath: /etc/vmagent/config_out name: config-out - mountPath: /etc/vm/relabeling name: relabeling-assets readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: vmagent-vm-vmagent-token-6ngn2 readOnly: true - args: - -httpListenAddr=:8429 - -memory.allowedPercent=40 - -promscrape.config=/etc/vmagent/config_out/vmagent.env.yaml - -remoteWrite.maxDiskUsagePerURL=1073741824 - -remoteWrite.tmpDataPath=/tmp/vmagent-remotewrite-data - -remoteWrite.url=http://lb_vip:8480/insert/0/prometheus/api/v1/write image: images.huazai.com/release/vmagent:v1.63.0-cluster imagePullPolicy: IfNotPresent name: vmagent ports: - containerPort: 8429 name: http protocol: TCP resources: limits: cpu: 500m memory: 850Mi requests: cpu: 250m memory: 350Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsonError volumeMounts: - mountPath: /tmp/vmagent-remotewrite-data name: persistent-queue-data - mountPath: /etc/vmagent/config_out name: config-out readOnly: true - mountPath: /etc/vmagent-tls/certs name: tls-assets readOnly: true - mountPath: /etc/vm/relabeling name: relabeling-assets readOnly: true - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: vmagent-vm-vmagent-token-6ngn2 readOnly: true dnsPolicy: ClusterFirst enableServicelinks: true nodeName: kube-control-2 preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: vmagent-vm-vmagent serviceAccountName: vmagent-vm-vmagent terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - emptyDir: {} name: persistent-queue-data - name: config secret: defaultMode: 420 secretName: vmagent-vm-vmagent - name: tls-assets secret: defaultMode: 420 secretName: tls-assets-vmagent-vm-vmagent - emptyDir: {} name: config-out - configMap: defaultMode: 420 name: relabelings-assets-vmagent-vm-vmagent name: relabeling-assets - name: vmagent-vm-vmagent-token-6ngn2 secret: defaultMode: 420 secretName: vmagent-vm-vmagent-token-6ngn2
原来是增加了一个去刷新vm配置的容器,镜像的话暂时先自己提前下载后加载到各节点(暂时没找到可以设置的地方)。看看vmagent默认的抓取配置:
global: scrape_interval: 30s external_labels: prometheus: monitoring-system/vm-vmagent scrape_configs: - job_name: monitoring-system/vmagent-vm-vmagent/0 honor_labels: false kubernetes_sd_configs: - role: endpoints namespaces: names: - monitoring-system metrics_path: /metrics relabel_configs: - action: keep source_labels: - __meta_kubernetes_service_label_app_kubernetes_io_component regex: monitoring - action: keep source_labels: - __meta_kubernetes_service_label_app_kubernetes_io_instance regex: vm-vmagent - action: keep source_labels: - __meta_kubernetes_service_label_app_kubernetes_io_name regex: vmagent - action: keep source_labels: - __meta_kubernetes_service_label_managed_by regex: vm-operator - action: keep source_labels: - __meta_kubernetes_endpoint_port_name regex: http - source_labels: - __meta_kubernetes_endpoint_address_target_kind - __meta_kubernetes_endpoint_address_target_name separator: ; regex: Node;(.*) replacement: target_label: node - source_labels: - __meta_kubernetes_endpoint_address_target_kind - __meta_kubernetes_endpoint_address_target_name separator: ; regex: Pod;(.*) replacement: target_label: pod - source_labels: - __meta_kubernetes_pod_name target_label: pod - source_labels: - __meta_kubernetes_namespace target_label: namespace - source_labels: - __meta_kubernetes_service_name target_label: service - source_labels: - __meta_kubernetes_service_name target_label: job replacement: - target_label: endpoint replacement: http
上面的配置太少了,我们把之前的opentelemetry配置拿过来。通过【inlineScrapeConfig】添加进去。
# 重新安装vmagent kubectl apply -f vmagent.yaml # vmagent.yaml cat vmagent.yaml apiVersion: operator.victoriametrics.com/v1beta1 kind: VMAgent metadata: name: vm-vmagent1 namespace: monitoring-system spec: selectAllByDefault: true podmetadata: labels: victoriaMetrics: vmagent replicaCount: 1 serviceAccountName: vmagent inlineScrapeConfig: | # prometheus 这一段只是为了做个测试。 - job_name: "prometheus" static_configs: - targets: ["ip1:9100","ip2:9100"] - job_name: coredns kubernetes_sd_configs: - namespaces: names: - kube-system role: endpoints relabel_configs: - action: keep regex: coredns;metrics source_labels: - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name - source_labels: - __meta_kubernetes_pod_node_name target_label: node - source_labels: - __meta_kubernetes_pod_host_ip target_label: host_ip - job_name: kube-state-metrics kubernetes_sd_configs: - namespaces: names: - kube-system role: service metric_relabel_configs: - regex: ReplicaSet;([w|-]+)-[0-9|a-z]+ replacement: $ source_labels: - created_by_kind - created_by_name target_label: created_by_name - regex: ReplicaSet replacement: Deployment source_labels: - created_by_kind target_label: created_by_kind relabel_configs: - action: keep regex: kube-state-metrics source_labels: - __meta_kubernetes_service_name - job_name: node-exporter kubernetes_sd_configs: - namespaces: names: - kube-system role: endpoints relabel_configs: - action: keep regex: node-exporter source_labels: - __meta_kubernetes_service_name - source_labels: - __meta_kubernetes_pod_node_name target_label: node - source_labels: - __meta_kubernetes_pod_host_ip target_label: host_ip - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: kube-apiserver kubernetes_sd_configs: - role: node relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master - regex: ([^:]+)(?::d+)? replacement: $:6443 source_labels: - __address__ target_label: __address__ - source_labels: - __meta_kubernetes_node_name target_label: node - source_labels: - __meta_kubernetes_node_address_internalIP target_label: host_ip scheme: https tls_config: insecure_skip_verify: true - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: kube-scheduler kubernetes_sd_configs: - role: node relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master - regex: ([^:]+)(?::d+)? replacement: $:10259 source_labels: - __address__ target_label: __address__ - source_labels: - __meta_kubernetes_node_name target_label: node - source_labels: - __meta_kubernetes_node_address_internalIP target_label: host_ip scheme: https tls_config: insecure_skip_verify: true - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: kubelet kubernetes_sd_configs: - role: node metric_relabel_configs: - action: keep regex: kubelet_.+ source_labels: - __name__ relabel_configs: - source_labels: - __meta_kubernetes_node_address_internalIP target_label: host_ip - source_labels: - __meta_kubernetes_node_name target_label: node scheme: https tls_config: insecure_skip_verify: true - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: cadvisor kubernetes_sd_configs: - role: node metrics_path: /metrics/cadvisor scheme: https tls_config: insecure_skip_verify: true - job_name: ingress-nginx kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: ingres-nginx source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name image: pullPolicy: IfNotPresent repository: images.huazai.com/release/vmagent tag: v1.63.0-cluster resources: requests: cpu: "250m" memory: "350Mi" limits: cpu: "500m" memory: "850Mi" extraArgs: memory.allowedPercent: "40" remoteWrite: - url: "http://lb_vip:8480/insert/0/prometheus/api/v1/write" # 检查vmagent是否正常启动 [root@kube-control-1 ~]# kubectl get po -n monitoring-system NAME READY STATUS RESTARTS AGE vmagent-vm-vmagent-55bbbd9f6d-497m5 2/2 Running 4 50s # 通过vmagent api查看一下targets [root@kube-control-1 opt]# curl http://10.233.108.50:8429/targets job="cadvisor"(4/4up) state=up, endpoint=https://node1:10250/metrics/cadvisor, labels={instance="kube-control-1",job="cadvisor",prometheus="monitoring-system/vm-vmagent1"}, last_scrape=12.500s ago, scrape_duration=0.131s, samples_scraped=8247, error="" state=up, endpoint=https://master1:10250/metrics/cadvisor, labels={instance="kube-control-3",job="cadvisor",prometheus="monitoring-system/vm-vmagent1"}, last_scrape=30.140s ago, scrape_duration=0.152s, samples_scraped=8445, error="" state=up, endpoint=https://master2:10250/metrics/cadvisor, labels={instance="kube-node-1",job="cadvisor",prometheus="monitoring-system/vm-vmagent1"}, last_scrape=19.945s ago, scrape_duration=0.118s, samples_scraped=6445, error="" state=up, endpoint=https://master3:10250/metrics/cadvisor, labels={instance="kube-control-2",job="cadvisor",prometheus="monitoring-system/vm-vmagent1"}, last_scrape=1.918s ago, scrape_duration=0.097s, samples_scraped=6218, error="" job="coredns"(2/2up) state=up, endpoint=http://10.233.108.1:9153/metrics, labels={host_ip="master2",instance="10.233.108.1:9153",job="coredns",node="kube-control-2",prometheus="monitoring-system/vm-vmagent1"}, last_scrape=26.846s ago, scrape_duration=0.003s, samples_scraped=240, error="" state=up, endpoint=http://10.233.109.1:9153/metrics, labels={host_ip="master1",instance="10.233.109.1:9153",job="coredns",node="kube-control-1",prometheus="monitoring-system/vm-vmagent1"}, last_scrape=8.383s ago, scrape_duration=0.002s, samples_scraped=238, error="" ...省略...
4、vmui上查询,看看数据是否正常 如下图,获取到了指标的数据,表明我们的配置是没问题的。
配置刷新说明配置刷新有两种方式,如下:
发送SUGHUP信号给vmagent进程
向http://vmagent:8429/-/reload发送一个http请求
vmagent作为采集指标重要的一环,对它的监控也不可少。vmagent通过http://vmagent-host:8429/metrics暴露了很多指标,如vmagent_remotewrite_conns,远程存储连接;vm_allowed_memory_bytes,可使用的内存大小。我们把一些重要的指标收集起来,通过grafana进行展示,能够更好的帮助我们分析vmagent的状态。同时,vmagent也通过下面的API暴露了重要的状态:
http://vmagent-host:8429/targets 可查看抓取目标是否正常
http://vmagent-host:8429/ready 可查看vmagent是否就绪
问题排查在使用vmagent的过程中,可能会遇到各式各样的问题。官方也为我们提供了一些思路。
监控vmagent,方便我们掌握vmagent的状态,其本身也暴露了指标供抓取和查看
当我们需要从大量的targets抓取指标时,建议增加系统上可以打开文件的最大数量。
当远端存储返回400或409时,vmagent会丢弃数据块
。。。更多请查看官网
总结总体来说,安装是比较简单的,仅需注意一下【inlineScrapeConfig】字段value的配置格式以及配置刷新容器的镜像。文章仅对vmagent做简单介绍,更多特性需要您的发掘;如果您也觉得它不错,动手试试看吧!
全文参考
https://docs.victoriametrics.com/vmagent.html
https://github.com/VictoriaMetrics/operator/blob/master/docs/quick-start.MD#VMAgent
https://github.com/VictoriaMetrics/operator/blob/master/docs/api.MD#vmagentspec
https://prometheus.io/
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)