Troubleshooting GitLab Cloud Native chart deployments
What is KubeSOS
KubeSos is a tool that uses kubectl and helm to retrieve GitLab cluster configuration and logs from GitLab Cloud Native chart deployments. This information is then zipped into a tar file and shared with the support team to help with troubleshooting GitLab deployments.
Requirements
- kubectl client v1.16+
- helm 3.3.1+
Usage
You can either download the script:
|
|
Or use curl
:
|
|
Flags | Description | Required | Default |
---|---|---|---|
-n |
namespace | No | “default” |
-r |
helm chart release | No | “gitlab” |
-l app |
application label to match for logs (can be used multiple times) | No | |
-L |
select apps for logs interactively | No | n/a |
-s time |
Only return logs newer than a relative duration like 5s, 2m, or 3h | No | 0=all logs |
-t time_stamp |
Only return logs after a specific date (RFC3339) | No | all logs |
-m maxlines |
Override the default maximum lines output per log (-1 = no limit) | No | 10000 |
-p |
Prepend log entries with pod and container names | No | n/a |
-w log_timeout |
Log generation wait time (seconds). Increase this if log collection does not complete in time | No | 60 |
Data will be archived to kubesos-<timestamp>.tar.gz
Extracting the archive
Use the tar
linux utility to extract the data into a folder
|
|
Troubleshoot a Gitlab installation
There are two main areas to check when troubleshooting a cloud native application like GitLab:
-
Cluster setup: We will assume that the cluster is correctly setup as per our recommendation and enough resources have been allocated to the nodes. We will look at a few commands that would be helpful in confirming this.
-
Application Failures: This will be our primary area of focus and we will be trying to identify why Gitlab is not working or not behaving correctly.
Cluster setup
We recommend a cluster with 8vCPU and 30GB of RAM so a few things to checks is if the nodes have enough resources. Use unix commands like top
, free
to confirm this.
Check if the nodes are registered correctly and verify that all of the nodes you expect to see are present and that they are all in the Ready
state.
|
|
To get detailed information about the overall health of your cluster, use the following command:
|
|
To delve deeper into troubleshooting the cluster have a look at Troubleshoot Clusters which gives you insights as to the logs that you would look into.
Gitlab Requirements
In order to deploy GitLab on Kubernetes, ensure the setup meets the documented requirements.
Checking kubeSOS output
kubectl-check
To check the version of kubectl
installed
|
|
Helm version
|
|
Debugging Pods
Check the current state of the pod through checking the get_pods
file. All pods should be running
or completed
.
|
|
Any pod in pending
status indicates a possible problem which one can confirm by checking the recent events from the describe_pods
file. If a pod is stuck in Pending
it means that it can not be scheduled onto a node. This could be due to lack of resources such as CPU or Memory in your cluster. More on this in Debugging Pods
Services
For services the main thing to confirm is if the loadbalancer
has been assigned an External IP and is not in pending
state.
|
|
or in AWS
|
|
Further checks would involve confirming if all the services have been assigned an endpoint
|
|
Ingress
Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource. Confirm if the hosts and address are configured correctly and if the ingress has been assigned an IP.
|
|
Deployments
To quickly confirm the applications that are setup, check the following file:
|
|
If any of the deployments are not ready use the describe_deployments
file to check the reason for failure. It’s is also worth checking for errors in the describe_pods
file.
|
|
Persistent Volumes and Claims
GitLab uses persistent volumes to store data so if any of the pods are in pending
status, check if the volumes exist and their status is Bound
. Confirm amount of space allocated for each and if required allocate more resources.
|
|
User supplied values
Where there is more than one helm revision (helm history <release>
), we capture the user_supplied_values.yaml
and all_values.yaml
for each revision. This is useful for comparing changes that were applied between revisions. For example:
|
|
The above indicates a change was made to CertManager configuration between revisions 7 and 8.
If the YAML files are not present, it is likely that kubeSOS was not run against the correct namespace or release, helm list -A
will show all helm deployed releases. Be sure to run kubeSOS.sh
with the appropriate -n <namespace>
and -r <release>
options.
Application logs
Finally, kubeSOS.sh
generates all the application logs which can be used to debug specific application issues.
Logging is more limited in a Kubernetes environment, you should note:
- By default, a container’s current log is limited to a size of 10Mb, at which point it is rotated.
- Whilst Kubernetes will rotate logs, it is not possible to retrieve rotated logs remotely via
kubectl logs
, direct access to the node is required (see Additional logs). - Kubernetes will retain the log of a failed container, this is limited to the previous instance of the container only.
It is worth noting also, that kubeSOS.sh
will only obtain logs from pods/containers that are currently running (or completed for init containers). If you find that a log is not present and was not intentionally filtered out, then it’s likely the pod was not active when kubeSOS.sh
was run. Check the file get_pods
to see which pods were active. Note also that empty log files are not added to the archive.
Logs are captured for each container. Many pods run more than one container, for example, webservice
could return five logs:
|
|
Log file naming consists of application name
_container name
.log. The application name
is determined from the app
metadata label assigned to pods. If a container fails, its log is retained, kubeSOS.sh
will retrieve this log via the kubectl logs --previous
option and is identified by *_previous.log
.
Additional logs
As mentioned, kubectl logs
is limited in the logs it can retrieve. Additional logs exist on the worker nodes hosting the containers. These logs are usually found in:
/var/log/containers
on the host node.
f6d18164
)