Datadog

Connecting runtime data to understand impact and blast radius of changes

By extracting trace data from Application Performance Monitoring (APM) sources like Datadog and parsing it to match code we can identify impacted running services as part of the review workflow. The objective is to identify errors in the application and correlate them with specific changes in the codebase.

Integration

To integrate Baz with your Datadog organization, you must create an API key and an app key (). Ensure the app key is attached to a which has the Datadog Read Only Role or a custom role which has the APM Read scope. To integrate, create the app key and api key, then go to the and fill in the integration form. The integration should show up momentarily.

How it works

1. Span Search Request

Initialization: The process begins by creating a SpanSearchRequest, which specifies the query parameters such as the time range and the filter criteria for searching spans (units of work in trace data).
Request Attributes: The request includes attributes like the query string, start and end times, pagination details, and sorting criteria.

2. Retrieving Span Data

API Interaction: The search request is sent to the Datadog API to retrieve spans that match the specified criteria. The API response contains detailed span data, including attributes like start and end timestamps, service name, environment, and custom attributes related to errors and HTTP requests.

3. Parsing Span Data

Deserialization: The response from Datadog is deserialized into a structured format using predefined data structures. This includes extracting specific attributes from each span, such as error messages, stack traces, code references, and Git information.
Error and Code Attributes: Key attributes extracted from the span data include:
- Error Messages: Descriptions of errors that occurred during the span.
- Code References: File paths and line numbers indicating where in the code the error occurred.
- Git Attributes: Information about the commit SHA and repository URL to correlate the span with specific code changes.

4. Correlating Errors with Code Changes

Code Attribution: Using the extracted code references and Git attributes, the process matches errors to specific lines of code. This involves identifying the file and line number where the error occurred and linking it to the corresponding code in the repository.
Entry Points: Additional context is provided by identifying the entry points of the span, such as HTTP routes or messaging systems, which helps in understanding the broader impact of the error.

5. Creating Span Objects

Span Representation: Each span is represented as an object containing detailed information about the error, including the duration of the span, the service and environment it occurred in, and any associated error stack traces.
Entry Point Details: The entry point of each span (e.g., HTTP route or messaging event) is also recorded, providing context for where and how the error occurred in the application.

Conclusion

The ability to correlate errors with specific code changes enhances the debugging process, making it easier to pinpoint the root cause of issues. This approach leverages advanced data extraction and parsing techniques to provide a comprehensive view of application errors, significantly improving the ability to maintain and improve the codebase.

PreviousIntegrations NextGitHub Actions

Last updated 7 months ago