GitLab CI/CD - Hands-On Lab: Investigating Broken Pipelines

This Hands-On Guide demonstrates how to troubleshoot and fix CI/CD pipelines

Estimated time to complete: 15 minutes

Preface

Tanuki Enterprises is ready to take the final step in their CI/CD journey: automated deployments to production servers. However, the team quickly encountered major issues:

  • Cryptic Error Messages: Jobs fail with unclear errors like “error in libcrypto” that provide no obvious solution
  • Trial and Error Debugging: Developers waste hours trying random fixes without a systematic troubleshooting approach
  • Hidden Configuration Issues: SSH keys, environment variables, and formatting problems cause silent failures
  • No Troubleshooting Framework: The team lacks a structured methodology for diagnosing and resolving pipeline failures
  • Fear of Deployments: Without confidence in their ability to debug issues, developers avoid setting up deployment automation

Objectives

In this lab, you’ll learn systematic troubleshooting techniques for CI/CD pipelines. You’ll intentionally introduce a common SSH key formatting error, practice isolating the root cause, leverage GitLab documentation to find solutions, and apply best practices for organizing job scripts using before_script sections for better maintainability.

Task A. Setup the SSH Connection

As a part of this course, you were provided with an SSH key to use for deployments. You will need to add this SSH key to GitLab to use it during your CI/CD process. To do this:

  1. Navigate to your project in GitLab.

  2. In the left sidebar, go to Settings > CI/CD.

  3. Select Expand next to Variables.

  4. In Group variables (inherited), you will see a variable named SSH_PRIVATE_KEY

    To demonstrate a common SSH related error, we will copy the SSH key and create a new variable based on its value:

  5. You are currently viewing the inherited variables at project level. To interact directly with these group-level variables, click on the group name next to the SSH_PRIVATE_KEY variable.

  6. Select Expand next to Variables.

  7. Select the Copy icon next to the value of the SSH_PRIVATE_KEY variable .

  8. Select Add variable.

  9. Set the Type to file.

  10. In the key field, enter SSH_INVALID_KEY.

  11. Set the variable Visibility to Visible.

  12. In the value, paste your SSH key.

  13. Delete the new line at the end of the key value. Doing this will create an error when we try to use the key.

  14. Select Add variable.

    Your new SSH key variable will now be accessible during any CI/CD jobs you run in your group. Now, let’s create a job to test the SSH connection.

  15. Navigate to your CI/CD project.

  16. Select your .gitlab-ci.yml file.

  17. Select Edit > Edit in pipeline editor.

  18. Add a new stage named deploy:

    stages:
      - test
      - build
      - run
      - release
      - deploy
    
  19. For this stage, we will have a single job named deploy app. The first thing this job will do is check if there is an SSH agent available, and install one if not.

    deploy app:
      stage: deploy
      script: 
        - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
        - eval $(ssh-agent -s)
    
  20. Next, we will set up a pem file from our SSH key variable, setting the required permissions on the file so it can be used for SSH purposes. Note that we are using the SSH_INVALID_KEY variable.

    deploy app:
      stage: deploy
      script: 
        - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
        - eval $(ssh-agent -s)
        - chmod 400 "$SSH_INVALID_KEY"
        - ssh-add "$SSH_INVALID_KEY"
        - mkdir -p ~/.ssh
        - chmod 700 ~/.ssh
    
  21. We can add a simple SSH command to test if the connection is working.

    deploy app:
      stage: deploy
      script: 
        - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
        - eval $(ssh-agent -s)
        - chmod 400 "$SSH_INVALID_KEY"
        - ssh-add "$SSH_INVALID_KEY"
        - mkdir -p ~/.ssh
        - chmod 700 ~/.ssh
        - ssh-keyscan -t rsa,ed25519 $ip >> ~/.ssh/known_hosts
        - ssh root@$ip 'ls /'
    
  22. Finally, we will add in an environment keyword to enable us to track the deployment environment.

    deploy app:
      stage: deploy
      script: 
        - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
        - eval $(ssh-agent -s)
        - chmod 400 "$SSH_INVALID_KEY"
        - ssh-add "$SSH_INVALID_KEY"
        - mkdir -p ~/.ssh
        - chmod 700 ~/.ssh
        - ssh-keyscan -t rsa,ed25519 $ip >> ~/.ssh/known_hosts
        - ssh root@$ip 'ls /'
      environment:
        name: prod
        url: "http://$ip:80"
    

When you commit these changes, you will see an error in your deploy job. To view this error, navigate to the Build > Pipelines page and view the failed pipeline and job. The output should look similar to the one below:

$ eval $(ssh-agent -s)
Agent pid 3211
$ chmod 400 "$SSH_INVALID_KEY"
$ ssh-add "$SSH_INVALID_KEY"
Error loading key "/builds/training-users/session-eff7bd34/iuztj7px/cicd-demo.tmp/SSH_INVALID_KEY": error in libcrypto

Let’s try to figure out what happened!

Task B.1. Isolate the command that causes the error

The first logical step is to isolate the command that is causing the error. We can see in the logs that the ssh-add "$SSH_INVALID_KEY" command looks to cause the error.

With the command isolated, we can consider some ways to verify the command. One option would be to try running the command locally if possible. This can help rule out potential issues with the runner. For this example, we can assume the runners are working correctly, meaning the issue lies in the actual command itself.

In these cases, often the variable/input of the command is the main source of the error. From the error message, it looks that the key is not formatted correctly. Let’s consult the documentation to see why.

Task B.2. Search the Documentation

Often, common errors will be present in our documentation with solutions to the problems. To find this error:

  1. Try searching for the error in the documentation. The first result you get is an article titled: Using SSH keys with GitLab CI/CD. Click onto this page.

  2. If you scroll to the Troubleshooting section of this page, you will see a section on the exact error we are facing.

    The documentation tells us that the issue is not having a new line at the end of the key. Let’s try adding one and see if it fixes the problem.

  3. Return to your variables by selecting Settings > CI/CD > Variables.

  4. Select the group next to the SSH_INVALID_KEY variable.

  5. Expand the group variable section and select the Edit icon next to the SSH_INVALID_KEY variable.

  6. Add a new line to the end of the variable value, then select Save Changes.

    To test if this fixes the error:

  7. Navigate back to your CI/CD project.

  8. Select Build > Pipelines from the left sidebar.

  9. Select New pipeline.

  10. Leave all values as default and select New pipeline again. You will now see the job complete successfully!

Task C. Clean Up Deploy Job

Now that the job has been fixed, it is important to clean up the job so that the steps of the job are more clear. For example, we can move parts of the jobs from the script section to the before_script section.

  1. Let’s move the steps from the 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )' to chmod 700 ~/.ssh into a before_script section. That way, it is clear which parts of the job are for setup, and which are the actual tasks being performed. Also, remember to change the SSH key variable back to the SSH_PRIVATE_KEY.

    The deploy job should now look like this:

    deploy app:
      stage: deploy
      before_script:
        - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
        - eval $(ssh-agent -s)
        - chmod 400 "$SSH_PRIVATE_KEY"
        - ssh-add "$SSH_PRIVATE_KEY"
        - mkdir -p ~/.ssh
        - chmod 700 ~/.ssh
      script:
        - ssh-keyscan -t rsa,ed25519 $ip >> ~/.ssh/known_hosts
        - ssh root@$ip 'ls /'
      environment:
        name: prod
        url: "http://$ip:80"
    
  2. Run the pipeline to make sure the changes did not break anything in the pipeline.

Postface

By working through a real-world SSH deployment error, you developed a systematic troubleshooting framework that transformed their team’s confidence. Your team learned to isolate failing commands, search documentation for known issues, and verify fixes methodically rather than guessing randomly. The structured approach—identify the error, isolate the command, consult documentation, apply the fix, and verify—reduced average debugging time from hours to minutes. Additionally, by refactoring the deploy job to use before_script for setup tasks, your team improved code readability and made their pipelines easier to maintain and troubleshoot in the future. With these troubleshooting skills and a working SSH deployment configuration, Tanuki Enterprises now confidently automates deployments to production, knowing they can quickly diagnose and resolve any pipeline issues that arise.

Lab Guide Complete

You have completed this lab exercise. You can view the other lab guides for this course.

Suggestions?

If you wish to make a change to the lab, please submit your changes via Merge Request.

Last modified December 2, 2025: Fixes CICD Lab (ed6769aa)