/ k8s-storage / latest / troubleshooting/

Troubleshooting

This page is a collection for some common problems and their solution

not all pods are started - error when creating "/kadalu/templates/csi-driver-object.yaml"

Kadalu-operator spins up several pods like csi-provisioner or csi-nodeplugin. In case you don’t see them expect the operator-pod check the log of the pod.

$ kubectl get pods -n kadalu
NAME                        READY   STATUS    RESTARTS   AGE
operator-68649f4bb6-zq7fp   1/1     Running   0          126m
...
Traceback (most recent call last):
  File "/kadalu/main.py", line 475, in <module>
    main()
  File "/kadalu/main.py", line 458, in main
    deploy_csi_pods(core_v1_client)
  File "/kadalu/main.py", line 394, in deploy_csi_pods
    execute(KUBECTL_CMD, CREATE_CMD, "-f", filename)
  File "/kadalu/kadalulib.py", line 60, in execute
    raise CommandException(proc.returncode, out.strip(), err.strip())
kadalulib.CommandException: [1] b'' b'Error from server (AlreadyExists): error when creating "/kadalu/templates/csi-driver-object.yaml": csidrivers.storage.k8s.io "kadalu" already exists'

If the log complains about ` error when creating "/kadalu/templates/csi-driver-object.yaml"` you might delete the CSIDriver as follows

$ kubectl delete CSIDriver kadalu

Note: Use the cleanup script to properly cleanup kadalu.

Storage cannot be created - Failed to create file system fstype=xfs device=/dev/md3

If storage cannot be created, check the logs. In case of the following error

+ pid=0
+ cmd=/usr/bin/python3
+ script=/kadalu/server.py
+ trap 'kill ${!}; term_handler' SIGTERM
+ pid=6
+ true
+ /usr/bin/python3 /kadalu/server.py
+ wait 7
+ tail -f /dev/null
[2020-01-06 13:21:41,200] ERROR [glusterfsd - 107:create_and_mount_brick] - Failed to create file system fstype=xfs device=/dev/md3
  1. you might check your disk config and ensure that there are no partitions and especially no partition table on the disk. The following command may be handy to delete the partition table

$ dd if=/dev/zero of=/dev/md3 bs=512 count=1
$ wipefs -a -t dos -f /dev/md3/
Note
above, you may need to replace 'md3' with proper device of your choice.

Different Pods and where to look for logs

Kadalu namespace has many pods created if everything is fine, including those of storage pods. Lets look at which pod would have the required information for you when you get into an error!

operator

This pod is the first pod to be started in the namespace, and starts other required pods. This is the pod which keeps a watch on CRD, and starts the storage service too.

If you have any error in starting of storage pods, check the logs here.

csi-provisioner

This pod creates the PV, and assigns the size (quota) to the PV. If PV creation fails, this pod’s log is what we need to check.

csi-nodeplugin

If PVC is successfully created, but it failed to move to Bound state, then this is where the issue can be. This performs the mount of all the PVs.

server-*-N

These are the pods, which has glusterfsd processes running, exporting the storage provided in storage config. One may need to check the logs of server too if PVC creation.

All pods' log using CLI

If you have installed kubectl_kadalu package, then you can do below to get the logs of all pods running in kadalu namespace. It is helpful when one is not sure where to look for errors.

$ kubectl kadalu logs

Quota of PVCs

kadalu uses simple-quota feature of glusterfs, which is present only in kadalu storage releases of glusterfs.

As this is a new feature of glusterfs, there is possibilities where an user can hit a bug which is stopping the usage in in production. Hence, we have provided an option to disable quota limit check on PVCs of a particular storage pool. Please use below steps to get this working.

$ kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- bash
# setfattr -n glusterfs.quota.disable-check -v "1" /mnt/${storage-pool-name}

This disable check is 'runtime' only fix right now, so if the server pods are restarted, this command may need to be issued again. Similarly to enable the check again, just pass the value as "0".

Troubleshooting External GlusterFS with native directory quota

When PVCs can grow outside of defined size while using External GlusterFS, and quota management is delegated to GlusterFS, you have to verify that:

  • Namespace 'Kadalu' is created prior the 'glusterquota-ssh-secret' secret

  • SSH private key and user are defined in 'glusterquota-ssh-secret' secret and kadalu-provisioner has the correct values:

# kubectl get secret glusterquota-ssh-secret -o jsonpath='{.data}' -o json | jq '.data | map_values(@base64d)'
# kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- sh -c 'df -h | grep -P secret-volume; echo $SECRET_GLUSTERQUOTA_SSH_USERNAME'
  • The value of 'ssh-privatekey' is not padded ("=" will be added when the base64 encoded string is too short)

  • glusterquota-ssh-username is a valid user on all Gluster nodes and can execute the gluster binaries

  • Verify that the quota is enabled on the gluster volume

  • KadaluStorage is of type 'External'

  • You gave enough time for GlusterFS to realize that the PVC is full and to block any write

PVC is pending, Error 'No Hosting Volumes available, add more storage' is observed in the logs but there is enough space.

In case the PVC is remaining in pending state and the PV is not created, you can check:

  • Check the logs. Sample error message:

    W0111 07:10:42.877267       1 controller.go:943] Retrying syncing claim "056b1267-2f62-4554-8625-5fc1686b1ac8", failure 0
    E0111 07:10:42.878137       1 controller.go:966] error syncing claim "056b1267-2f62-4554-8625-5fc1686b1ac8": failed to provision volume with StorageClass "kadalu.gluster": rpc error: code = ResourceExhausted desc = No Hosting Volumes available, add more storage
  • Connect to the provisioner and test verify the volume is mounted and write-able:

    kubectl -n kadalu exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- bash
    df -h /mnt/<KadaluStorage's name>
    dd if=/dev/zero of=/mnt/<KadaluStorage's name>/iotest_file bs=1M count=10
  • Verify that the PVC is requested with at least 10% less than the KadaluStorage. Kadalu adheres to Gluster’s reserve requirements (10%) and will refuse to create the PV/PVC if the PVC request > (total size - reserve)

POD is unable to attach or mount volumes "driver name kadalu not found in the list of registered CSI drivers"

Describing a pod shows the following events:

Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Warning  FailedScheduling        10m                 default-scheduler        0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling        10m                 default-scheduler        0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               10m                 default-scheduler        Successfully assigned openshift-monitoring/alertmanager-main-0 to okd4-compute-1
  Normal   SuccessfulAttachVolume  10m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-a047ee57-d5b3-4f37-a217-995e26d2f066"
  Warning  FailedMount             8m32s               kubelet                  Unable to attach or mount volumes: unmounted volumes=[alertmanager], unattached volumes=[alertmanager-trusted-ca-bundle kube-api-access-srkrv config-volume tls-assets alertmanager secret-alertmanager-main-tls secret-alertmanager-main-proxy secret-alertmanager-kube-rbac-proxy]: timed out waiting for the condition
  Warning  FailedMount             6m18s               kubelet                  Unable to attach or mount volumes: unmounted volumes=[alertmanager], unattached volumes=[config-volume tls-assets alertmanager secret-alertmanager-main-tls secret-alertmanager-main-proxy secret-alertmanager-kube-rbac-proxy alertmanager-trusted-ca-bundle kube-api-access-srkrv]: timed out waiting for the condition
  Warning  FailedMount             4m4s                kubelet                  Unable to attach or mount volumes: unmounted volumes=[alertmanager], unattached volumes=[secret-alertmanager-main-tls secret-alertmanager-main-proxy secret-alertmanager-kube-rbac-proxy alertmanager-trusted-ca-bundle kube-api-access-srkrv config-volume tls-assets alertmanager]: timed out waiting for the condition
  Warning  FailedMount             106s                kubelet                  Unable to attach or mount volumes: unmounted volumes=[alertmanager], unattached volumes=[tls-assets alertmanager secret-alertmanager-main-tls secret-alertmanager-main-proxy secret-alertmanager-kube-rbac-proxy alertmanager-trusted-ca-bundle kube-api-access-srkrv config-volume]: timed out waiting for the condition
  Warning  FailedMount             11s (x13 over 10m)  kubelet                  MountVolume.MountDevice failed for volume "pvc-a047ee57-d5b3-4f37-a217-995e26d2f066" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name kadalu not found in the list of registered CSI drivers

Reapply the csi-nodeplugin-platform.yaml manifest from Kadalu Latest Release

© 2022 Kadalu Community(https://github.com/kadalu). All Rights Reserved.