GitLab CI/CD - Hands-On Lab: Investigating Broken Pipelines
Estimated time to complete: 15 minutes
Preface
Tanuki Enterprises is ready to take the final step in their CI/CD journey: automated deployments to production servers. However, the team quickly encountered major issues:
- Cryptic Error Messages: Jobs fail with unclear errors like “error in libcrypto” that provide no obvious solution
- Trial and Error Debugging: Developers waste hours trying random fixes without a systematic troubleshooting approach
- Hidden Configuration Issues: SSH keys, environment variables, and formatting problems cause silent failures
- No Troubleshooting Framework: The team lacks a structured methodology for diagnosing and resolving pipeline failures
- Fear of Deployments: Without confidence in their ability to debug issues, developers avoid setting up deployment automation
Objectives
In this lab, you’ll learn systematic troubleshooting techniques for CI/CD pipelines. You’ll intentionally introduce a common SSH key formatting error, practice isolating the root cause, leverage GitLab documentation to find solutions, and apply best practices for organizing job scripts using before_script sections for better maintainability.
Task A. Setup the SSH Connection
As a part of this course, you were provided with an SSH key to use for deployments. You will need to add this SSH key to GitLab to use it during your CI/CD process. To do this:
-
Navigate to your project in GitLab.
-
In the left sidebar, go to Settings > CI/CD.
-
Select Expand next to Variables.
-
In Group variables (inherited), you will see a variable named
SSH_PRIVATE_KEYTo demonstrate a common SSH related error, we will copy the SSH key and create a new variable based on its value:
-
You are currently viewing the inherited variables at project level. To interact directly with these group-level variables, click on the group name next to the
SSH_PRIVATE_KEYvariable. -
Select Expand next to Variables.
-
Select the Copy icon next to the value of the
SSH_PRIVATE_KEYvariable . -
Select Add variable.
-
Set the Type to file.
-
In the
keyfield, enterSSH_INVALID_KEY. -
Set the variable Visibility to Visible.
-
In the value, paste your SSH key.
-
Delete the new line at the end of the key value. Doing this will create an error when we try to use the key.
-
Select Add variable.
Your new SSH key variable will now be accessible during any CI/CD jobs you run in your group. Now, let’s create a job to test the SSH connection.
-
Navigate to your CI/CD project.
-
Select your
.gitlab-ci.ymlfile. -
Select Edit > Edit in pipeline editor.
-
Add a new stage named deploy:
stages: - test - build - run - release - deploy -
For this stage, we will have a single job named deploy app. The first thing this job will do is check if there is an SSH agent available, and install one if not.
deploy app: stage: deploy script: - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )' - eval $(ssh-agent -s) -
Next, we will set up a pem file from our SSH key variable, setting the required permissions on the file so it can be used for SSH purposes. Note that we are using the
SSH_INVALID_KEYvariable.deploy app: stage: deploy script: - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )' - eval $(ssh-agent -s) - chmod 400 "$SSH_INVALID_KEY" - ssh-add "$SSH_INVALID_KEY" - mkdir -p ~/.ssh - chmod 700 ~/.ssh -
We can add a simple SSH command to test if the connection is working.
deploy app: stage: deploy script: - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )' - eval $(ssh-agent -s) - chmod 400 "$SSH_INVALID_KEY" - ssh-add "$SSH_INVALID_KEY" - mkdir -p ~/.ssh - chmod 700 ~/.ssh - ssh-keyscan -t rsa,ed25519 $ip >> ~/.ssh/known_hosts - ssh root@$ip 'ls /' -
Finally, we will add in an
environmentkeyword to enable us to track the deployment environment.deploy app: stage: deploy script: - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )' - eval $(ssh-agent -s) - chmod 400 "$SSH_INVALID_KEY" - ssh-add "$SSH_INVALID_KEY" - mkdir -p ~/.ssh - chmod 700 ~/.ssh - ssh-keyscan -t rsa,ed25519 $ip >> ~/.ssh/known_hosts - ssh root@$ip 'ls /' environment: name: prod url: "http://$ip:80"
When you commit these changes, you will see an error in your deploy job. To view this error, navigate to the Build > Pipelines page and view the failed pipeline and job. The output should look similar to the one below:
$ eval $(ssh-agent -s)
Agent pid 3211
$ chmod 400 "$SSH_INVALID_KEY"
$ ssh-add "$SSH_INVALID_KEY"
Error loading key "/builds/training-users/session-eff7bd34/iuztj7px/cicd-demo.tmp/SSH_INVALID_KEY": error in libcrypto
Let’s try to figure out what happened!
Task B.1. Isolate the command that causes the error
The first logical step is to isolate the command that is causing the error. We can see in the logs that the ssh-add "$SSH_INVALID_KEY" command looks to cause the error.
With the command isolated, we can consider some ways to verify the command. One option would be to try running the command locally if possible. This can help rule out potential issues with the runner. For this example, we can assume the runners are working correctly, meaning the issue lies in the actual command itself.
In these cases, often the variable/input of the command is the main source of the error. From the error message, it looks that the key is not formatted correctly. Let’s consult the documentation to see why.
Task B.2. Search the Documentation
Often, common errors will be present in our documentation with solutions to the problems. To find this error:
-
Try searching for the error in the documentation. The first result you get is an article titled: Using SSH keys with GitLab CI/CD. Click onto this page.
-
If you scroll to the Troubleshooting section of this page, you will see a section on the exact error we are facing.
The documentation tells us that the issue is not having a new line at the end of the key. Let’s try adding one and see if it fixes the problem.
-
Return to your variables by selecting Settings > CI/CD > Variables.
-
Select the group next to the
SSH_INVALID_KEYvariable. -
Expand the group variable section and select the Edit icon next to the
SSH_INVALID_KEYvariable. -
Add a new line to the end of the variable value, then select Save Changes.
To test if this fixes the error:
-
Navigate back to your CI/CD project.
-
Select Build > Pipelines from the left sidebar.
-
Select New pipeline.
-
Leave all values as default and select New pipeline again. You will now see the job complete successfully!
Task C. Clean Up Deploy Job
Now that the job has been fixed, it is important to clean up the job so that the steps of the job are more clear. For example, we can move parts of the jobs from the script section to the before_script section.
-
Let’s move the steps from the
'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'tochmod 700 ~/.sshinto abefore_scriptsection. That way, it is clear which parts of the job are for setup, and which are the actual tasks being performed. Also, remember to change the SSH key variable back to the SSH_PRIVATE_KEY.The deploy job should now look like this:
deploy app: stage: deploy before_script: - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )' - eval $(ssh-agent -s) - chmod 400 "$SSH_PRIVATE_KEY" - ssh-add "$SSH_PRIVATE_KEY" - mkdir -p ~/.ssh - chmod 700 ~/.ssh script: - ssh-keyscan -t rsa,ed25519 $ip >> ~/.ssh/known_hosts - ssh root@$ip 'ls /' environment: name: prod url: "http://$ip:80" -
Run the pipeline to make sure the changes did not break anything in the pipeline.
Postface
By working through a real-world SSH deployment error, you developed a systematic troubleshooting framework that transformed their team’s confidence. Your team learned to isolate failing commands, search documentation for known issues, and verify fixes methodically rather than guessing randomly. The structured approach—identify the error, isolate the command, consult documentation, apply the fix, and verify—reduced average debugging time from hours to minutes. Additionally, by refactoring the deploy job to use before_script for setup tasks, your team improved code readability and made their pipelines easier to maintain and troubleshoot in the future. With these troubleshooting skills and a working SSH deployment configuration, Tanuki Enterprises now confidently automates deployments to production, knowing they can quickly diagnose and resolve any pipeline issues that arise.
Lab Guide Complete
You have completed this lab exercise. You can view the other lab guides for this course.
Suggestions?
If you wish to make a change to the lab, please submit your changes via Merge Request.
ed6769aa)
