08 Release Engineering¶

Recap¶

Review¶

The key topics include the role of release engineers, the principles guiding release engineering at Google, tools like Rapid, Blaze/Bazel, Midas Package Manager (MPM), and Sisyphus. Also, there's a focus on philosophies like self-service models, high velocity, hermetic builds, and policy enforcement.

The role of release engineers, core principles, tools and workflows, configuration management, and key takeaways. The user might be looking for a concise summary that highlights how Google approaches release engineering, emphasizing automation, scalability, and reliability.

The four major principles: self-service model, high velocity, hermetic builds, and enforcement of policies. Also, mention the tools like Rapid, Blaze, MPM, and Sisyphus, explaining their roles in the release process. The deployment strategies like canary deployments and the use of Sisyphus for rollouts are important too.

The user might also be interested in the collaboration between SREs and release engineers, especially in managing configurations and ensuring reliable releases. The conclusion about starting release engineering early in the project lifecycle is a key takeaway.

Highlight the importance of hermetic builds, automation, and the balance between velocity and reliability.

Definitions¶

1. Role of Release Engineers¶

Purpose: Ensure software is built, tested, packaged, and deployed reliably, consistently, and automatically.
Responsibilities:
- Define release processes and best practices.
- Develop tools (e.g., Rapid, Midas Package Manager) for scalable, repeatable releases.
- Collaborate with SREs and developers to align release practices with reliability goals.

2. Core Principles of Release Engineering¶

Self-Service Model:
- Teams own their release processes via automated tools (e.g., Rapid).
- Enables high velocity with minimal manual intervention.
High Velocity:
- Frequent releases (e.g., hourly builds) reduce delta between versions, easing testing and troubleshooting.
- "Push on Green": Deploy automatically when tests pass.
Hermetic Builds:
- Builds are isolated from external dependencies (e.g., libraries, tools).
- Ensures reproducibility: same code + same revision = identical binaries.
- Cherry-picking: Fix bugs in production by rebuilding the original revision with specific changes.
Policy Enforcement:
- Security via access controls (code reviews, gated operations).
- Audit trails track changes (e.g., who approved a cherry-pick).

3. Tools & Workflow¶

Blaze/Bazel:
- Google’s build tool for compiling code (C++, Java, Go, etc.).
- Defines dependencies and build targets.
Rapid:
- Automates releases using blueprints (config files defining build, test, deployment rules).
- Integrates with Borg for scalable execution.
- Workflow:
  1. Create a release branch from the mainline.
  2. Compile code and run tests in parallel.
  3. Deploy via canary (small rollout) or full production push.****
Midas Package Manager (MPM):
- Packages binaries/configurations with unique hashes and labels (e.g., canary, production).
- Labels auto-update to the latest version (e.g., deploying canary always fetches the newest labeled package).
Sisyphus:
- SRE-developed rollout framework for complex deployments (e.g., phased geographic rollouts).

4. Configuration Management¶

Strategies to avoid instability:

Mainline Configuration:
- Store configs in the main repository; decouples binary releases from config changes.
- Risk: Skew between checked-in and running configs.
Configs in MPM Packages:
- Bundle configs with binaries for simplicity.
Configuration Packages:
- Snapshot configs alongside binaries using build IDs.
- Enables independent updates (e.g., fix a flag without rebuilding the binary).
External Stores:
- Dynamic configs stored in Chubby, Bigtable, or source-based filesystems.

5. Deployment Strategies¶

Risk-Based Rollouts:
- Canary: Deploy to small clusters first.
- Exponential: Rapid expansion for user-facing services.
- Multi-Day: Gradual rollouts for critical infrastructure.
Rollback:
- Automated via Sisyphus or MPM labels (revert to a prior version).

6. Key Takeaways¶

Automation ≠ Autonomy: Tools like Rapid and Sisyphus reduce toil but require human oversight for edge cases.
Hermeticity is Critical: Ensures builds are reproducible and auditable.
Start Early: Integrate release engineering at the project’s start to avoid retrofitting.
Collaboration: SREs, developers, and release engineers must align on policies and tools.

Example Workflow: Code commit → Blaze build → Rapid tests → MPM packaging → Sisyphus rollout → Production.

This chapter underscores that release engineering is foundational to reliability, not an afterthought. By automating builds, enforcing policies, and decoupling configurations, Google ensures scalable, consistent releases while minimizing human error.