Default Reviewers

Baz's managed default reviewers analyze change requests for global naming, typing and logic bugs

Overview

Default Reviewers use an agentic retrieval and analysis system to process code changes within the context of the entire codebase. Code is divided into manageable chunks using a LangChain-based framework, with tree-sitter handling parsing for supported languages like Python. For embedding and similarity search, Baz relies on Voyage-Code-3, a model optimized for code representation. This setup enables Baz to analyze pull requests while accounting for dependencies and broader repository context, identifying issues such as breaking changes, outdated comments, and log errors.

Baz automates several steps in the code review process by integrating directly with GitHub. It evaluates outdated comments based on commit metadata and prior comment payloads, determining whether issues have been resolved. Log errors are identified by parsing GitHub Actions logs and attaching detailed comments at the relevant lines. Baz also identifies specific issues like typos, generic variable names, missing test assertions, and type mismatches. These insights are delivered as structured comments, enabling developers to address them directly in the GitHub interface.

The system is designed for efficient processing and scalability. Repository and organization data are stored in a single multi-tenant table, filtered by organization ID, repository name, and file path. Embeddings are stored in a pgvector database, enabling similarity searches to locate relevant code sections. When files are updated, Baz reprocesses only the changed files, ensuring minimal overhead while maintaining up-to-date insights. This approach supports a wide range of use cases and scales to handle large repositories effectively.

Data Model

Default Reviewers are underpinned by a structured data model that persists code elements across commits. This model ensures that Baz can accurately trace, detect, and evaluate changes across the codebase. Common use cases include:

API Endpoint Mapping Each API endpoint is identified and linked to its corresponding entry point in the code, such as function definitions or class methods. This mapping is established through in-file connections (e.g., callables linked to function definitions) and cross-module imports.
Parameter and Return Type Linking Function parameters and return types are traced back to their definitions, accounting for alias imports, re-exports, and class hierarchies. This linkage supports complex data structures like destructured TypeScript parameters or JSON payloads in Rust.
Change Identification Every change in the codebase is associated with an element ID that corresponds to a function, parameter, or return type. Baz identifies the enclosing range of the change and evaluates whether the affected element is API-related.
Change Evaluation Changes are evaluated using an LLM, with Baz determining whether a modification is breaking. The data model supports both naïve and pre-processed approaches, such as recursively checking relationships or pre-marking API-relevant elements.

PreviousModel Context Protocol (Beta)NextCustom Reviewers (Beta)

Last updated 5 days ago

Overview

Data Model

API Endpoint Mapping Each API endpoint is identified and linked to its corresponding entry point in the code, such as function definitions or class methods. This mapping is established through in-file connections (e.g., callables linked to function definitions) and cross-module imports.

Parameter and Return Type Linking Function parameters and return types are traced back to their definitions, accounting for alias imports, re-exports, and class hierarchies. This linkage supports complex data structures like destructured TypeScript parameters or JSON payloads in Rust.

Change Identification Every change in the codebase is associated with an element ID that corresponds to a function, parameter, or return type. Baz identifies the enclosing range of the change and evaluates whether the affected element is API-related.

Change Evaluation Changes are evaluated using an LLM, with Baz determining whether a modification is breaking. The data model supports both naïve and pre-processed approaches, such as recursively checking relationships or pre-marking API-relevant elements.