Secret Detection Service: General FAQs

This page contains answers to the general questions about the Secret Detection Service. This runbook can be used by anyone who want to understand the technical aspects about the service.

General FAQs

Where is the service deployed?

The service is deployed on Runway which internally uses Google Cloud Run to manage containers.
In how many environments will the service be deployed?

The service is deployed in Staging (https://secret-detection.staging.runway.gitlab.net) and Production (https://secret-detection.production.runway.gitlab.net).
In what regions will the service be deployed in the production environment?

The service is deployed only at us-east1, the same region where the GitLab Rails monolith is deployed too.
Does the service use environment variables?

Yes. Currently, it uses the ENV(non-sensitive) var to determine the active environment the application is running and the AUTH_TOKEN(sensitive) var to match it against the token embedded in the API request.
Where are environment variables stored in the service and who has access to modify them?

Non-sensitive variables are stored in env-<environment>.yml files in the project repository whereas sensitive variables are stored in the Hashicorp Vault. Currently, the ~“group::secret detection” team has access to the vault.
Where exactly in the vault do I find SD service variables?

You can find them here for the staging environment, and here for the production environment.
Are the service APIs publicly accessible?

It depends on the environment. Currently, the service in the staging environment is publicly accessible and we might change it to private later. To reduce security exposure, the service in the production environment is made accessible only by the GitLab Rails monolith instance.
What category of APIs does the service expose?

The service exposes only gRPC endpoints for Secret Detection scans. Read here for more details.
Is there an authentication process to access service APIs?

Yes. A basic form of token-based authentication where the client is expected to embed a token in the request which is matched against the AUTH_TOKEN in the service. Note that the authentication is only applied to secret detection-related RPC endpoints. Read more about it here.
How do I ensure that the service is running? Is there a health-check endpoint to confirm?

The service exposes a health check RPC endpoint used by Runway to ensure the service’s health. You can find it here. However, for the production environment, we might have to rely on logs for health check failures since the instance isn’t accessible publicly. Alternatively, we could access it from a teleport console since Rails monolith can access the service.
Which GitLab services access SD service?

Only the Rails monolith service accesses the SD service to invoke scans.
Where can I access the service logs?

We can access them through Google Cloud Run logs. We can view logs through GCP Logs Explorer too to get custom filtering/querying abilities.
Where can I access the service dashboard for monitoring purposes?

You can find them here for the staging environment and here for the production environment.
Is there any rate limiting added to the APIS?

Application-level rate limiting is not added, however, Cloud Run defines rate limiting for the instances under which the service is covered.
What are the Service Level Indicators(SLIs) for the service to determine the availability?

We are using Runway’s default SLIs(runway_ingress) which contains the Apdex Score, Request Rate, and Error Rate.
What are the Service Level Objectives(SLOs) configured for the service?

SLOs for the Service are set to meet 99.9% (0.999) of the Apdex Score and 99.9% of (0.999) Error Ratio. These are default SLO values configured by Runway and we are sticking to the default values as they seemed sufficient for the service. We can change it here whenever necessary.
Do we have alerts configured in case there is an SLO violation?

Yes. The alerts are configured here. The alerts are triggered for Apdex violations, Error Rate violations, Traffic ceased (Server signal present, traffic none), and Traffic absent (No server signal, including Health Checks) in the past 30 minutes.
What happens in the event of an SLO violation?

In case of an SLO violation incident, the alertmanager fires all alerts to the #feed_alerts-general slack channel, and also a copy of it will be sent to #g_secure-secret-detection slack channel.
What severity will be assigned to the triggered alert?

Since the service borrows Runway’s SLI defaults, the defaults also include setting severity as S4. We can change it to a different severity appropriate to our needs (requires Readiness Review approval).
Does it page the SRE team in the event of an SLO violation?

No. Only the alerts with severity S1 or S2 are paged to the SRE team. The group::secret detection team will be responsible for monitoring incidents.

Additional References

Last modified October 28, 2024: Add Secret Detection Service related runbooks (257e22f5)

View page source - Edit this page - please contribute.