prometheus2 kubernetes, helm, gpu monitoring 명령어 정리 gpu operator helm repository helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update helm install gpu operator helm install --wait --generate-name \ -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled=false helm delete gpu operator helm ls -n gpu-operator helm delete -n gpu-operator gpu-operator-1675669830 helm custom gpu monitor https://grafana.com/.. 2023. 2. 6. Ubuntu, kubernetes, nvidia gpu monitoring 정리 Ubuntu 20.04 containerd 를 kubernetes cri로 사용 helm, prometheus, grafana 사용 1. nvidia-container-toolkit 설치 (master node, worker node 모두 작업) distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.. 2023. 2. 1. 이전 1 다음