Troubleshooting
Video Tutorial
Watch this comprehensive walkthrough of troubleshooting GitLab Duo workflows:
Tools
Duo Flows uses the following logging and monitoring tools:
- LangSmith - collects logs scoped to underlaying graph execution, including information like: LLM completions or tool calls
- Google Cloud (GCP) logs explorer are the logs from the Duo Workflow Service
- Users can provide you with a “session ID” and you can search for this in these GCP logs. For example if the user has a session ID of
123then you can perform a search forresource.labels.service_name="duo-workflow-svc" and jsonPayload.workflow_id="1234"to get all log entries related to that user session.
- Users can provide you with a “session ID” and you can search for this in these GCP logs. For example if the user has a session ID of
- In the GCP logs you will also see the
correlation_id. Thiscorrelation_idcan be used to correlate to logs from Rails and workhorse. You can find these logs at https://log.gprd.gitlab.net/. In the dropdown on the top left corner you can selectpubsub-rails-inf-gprd-*for rails logs orpubsub-workhorse-inf-gprd-*for workhorse logs. - In Kibana, use the plus button next to the search bar to filter by
json.correlation_id.keyword. You can find more Kibana tips at https://handbook.gitlab.com/handbook/engineering/monitoring/#logs and https://handbook.gitlab.com/handbook/support/workflows/kibana/. - Sentry error tracking collects error traces for:
- Runway monitoring dashboard - this a grafana dashboard that tracks hardware resource consumption for Duo Workflow Service
- Tableau dashboard for internal events tracking - displays aggregated data collected with internal event tracking, showing additional product metrics like total number of workflows, or distribution between differnt workflow outcomes
Google Cloud (GCP) Logs explorer
Following projects hold logs for different pieces of Runway deployments:
gitlab-runway-production- holds logs for production runway deploymentsgitlab-runway-staging- holds logs for staging runway deployments
When browsing Runway logs, you can narrow scope to piece of infrastructure that is of interest to you using following filters:
-
To filter only load balancer logs use:
resource.type="http_load_balancer" resource.labels.forwarding_rule_name="duo-workflow-https" -
To filter only Duo Workflow Service deployment logs use:
resource.labels.service_name="duo-workflow-svc"
gRPCurl
The grpcurl is a cli tool that enables you to interact with gRPC servers just like curl does for http ones.
An example usage of grpcurl for Agent Foundations is shown in the example below:
- Agent Foundations credentails can be obtained via
curl
curl -X POST -H "Authorization: Bearer $GITLAB_API_PRIVATE_TOKEN" https://gitlab.com/api/v4/ai/duo_workflows/direct_access
- With credentials assigned to environment variables
grpcurlcan be used to start bidirectional channel to Duo Workflow Service
grpcurl -keepalive-time 20 -H "x-gitlab-global-user-id":"$GLOBAL_USER_ID" \
-H "x-gitlab-instance-id":"ea8bf81......." -H "x-gitlab-realm":"saas" \
-H "x-gitlab-authentication-type":"oidc" \
-H authorization:"bearer $GRPC_TOKEN" -d @ -vv -proto ../duo-workflow-service/contract/contract.proto
-import-path ../duo-workflow-service/contract cloud.gitlab.com:443 DuoWorkflow/ExecuteWorkflow
Resolved method descriptor:
rpc ExecuteWorkflow ( stream .ClientEvent ) returns ( stream .Action );
Request metadata to send:
authorization: bearer eyJhbGc.....
x-gitlab-authentication-type: oidc
x-gitlab-global-user-id: Rf9.........
x-gitlab-instance-id: ea8bf810-..........
x-gitlab-realm: saas
- With channel being established messages can be sent via stdin
{
"startRequest": {
"workflowID": "12344",
"goal": "create hello world in go",
"workflowMetadata": "{\"extended_logging\":true,\"git_sha\":\"e621c52bb0f3af0a102a06cf2e485aa961f60d8c\",\"git_url\":\"gitlab.com/gitlab-org/analytics-section/analytics-instrumentation/metric-dictionary.git\"}"
}
}
Enhanced Logging for Team Members
Enabling Enhanced Logging
To enable enhanced logging for better troubleshooting, use the following Slack command within the #production slack channel:
/chatops run feature set duo_workflow_extended_logging --user=your_user_name true
Replace
your_user_namewith your actual GitLab username.
This will enable detailed tracing in LangSmith, which provides the most comprehensive view of workflow execution including LLM completions, tool calls, and execution flow.
Important Privacy and Security Considerations
⚠️ Data Privacy Warning: Extended logging captures detailed workflow execution data in LangSmith, including LLM completions, tool calls, prompts, and model responses.
- NO RED DATA: Do not use Agentic Chat with RED data when the
duo_workflow_extended_loggingfeature flag is enabled - Forward-only logging: This feature only logs new interactions after it’s enabled. If you experienced an issue before enabling the flag, you’ll need to reproduce the issue after enabling the enhanced logging
- Access restrictions: Only GitLab AI Engineering team members have access to the LangSmith logs for troubleshooting purposes
What to Share with AI Engineers
When requesting assistance, please provide:
- Workflow ID: Essential for tracing the specific execution
- Expected vs. Actual behavior: What you expected to happen vs. what actually happened
- Steps to reproduce: If the issue is reproducible
- Timestamp: When the issue occurred (helps narrow down logs)
- Any error messages: Screenshots or copied text of error messages
Getting Your Workflow ID
The workflow_id is the same as the session_id. For Agentic Duo Chat, it can be found in the UI.
Tips and tricks
A typical investigation around problematic Agent Foundations execution follows steps listed below:
Based on a user report:
- Ask the user for the
workflow_idfor the problematic workflow which is displayed in the list of workflows - Use the
workflow_idfrom previous step to filter down langsmith traces by applying a filter formetadataandthread_id=[workflow_id] - Use the
workflow_idfrom 1st step to filter down logs in gcp logs explorerjsonPayload.workflow_id="123456789"
Based on a Sentry issue:
- Use Agent Foundations Sentry issue to locate problematic workflow’s
correlation_id. - Use the
correlation_idfrom previous step to filter down logs in gcp logs explorer, example filter:jsonPayload.correlation_id="e7171f28-706d-4a47-be25-29d9b3751c0e"
In addition one can use a workflow’s workflow_id that is being recorded either in sentry or in log explorer to filter down LangSmith logs using thread_id filter in metadata and comparing it against workflow_id.
Past in depth investigations
- Faulty network proxy via Cloudflare investigation issue
b5d640f9)
