This document describes the automated agents, controllers, and background services used in the ERT (Errata Reliability Team) release-tests project. These automated systems help manage OpenShift z-stream releases by handling testing, notifications, and release workflows.
- Overview
- Automated Agents and Services
- OAR CLI Tool (OpenShift Automatic Release)
- OAR Core Modules
- Integration and Workflow
- Deployment Considerations
- Monitoring and Observability
- Troubleshooting
- Version Compatibility
The ERT release-tests project contains several automated agents and services that work together to streamline the OpenShift release process. These systems monitor release streams, trigger tests, aggregate results, send notifications, and support release operations.
Location: oar/controller/detector.py
Purpose: Automatically detects new z-stream releases by comparing the latest z-stream version from ART (OpenShift ART team) with the latest stable version from the release stream.
How it works:
- Fetches the latest z-stream version from a GitHub repository file maintained by ART
- Fetches the latest stable version from the OpenShift release stream API
- Compares versions using semantic versioning
- When a new z-stream release is detected, automatically creates a test report to kickoff the QE release flow (see create-test-report CLI command)
Trigger: Can be run on-demand via CLI command:
oarctl start-release-detector -r <minor-release>Key Components:
ReleaseDetector.get_latest_zstream_version()- Gets latest version from ARTReleaseDetector.get_latest_stable_version()- Gets latest stable from release streamReleaseDetector.compare_versions()- Compares versionsReleaseDetector.start()- Entry point that orchestrates detection and report creation
Location: prow/job/controller.py
Purpose: Monitors OpenShift release streams for new builds (nightly or stable) and automatically triggers Prow test jobs when new builds are detected.
How it works:
- Continuously monitors release streams for new builds across multiple architectures (amd64, arm64, multi, ppc64le, s390x)
- Compares the latest build with the current tracked build
- When a new build is detected:
- Updates the current build tracking file in GitHub
- Triggers configured Prow jobs based on the job registry
- Saves test result tracking files to GitHub
- Supports both nightly builds and stable builds
- Handles upgrade jobs and regular installation jobs
Trigger: Can be run as a continuous monitoring service or on-demand:
jobctl start-controller -r <release> --nightly --arch <architecture>Key Components:
JobController- Main controller classJobController.get_latest_build()- Fetches latest build from release streamJobController.get_current_build()- Gets currently tracked build from GitHubJobController.trigger_prow_jobs()- Triggers all configured test jobs for a buildTestJobRegistry- Manages test job definitions per release/architectureGithubUtil- Handles file operations in the GitHub repository
Environment Variables Required:
GITHUB_TOKEN- GitHub authenticationAPITOKEN- Prow/Gangway API authentication
Location: prow/job/controller.py
Purpose: Continuously processes test results from Prow jobs, determines pass/fail status, implements retry logic for failed tests, and promotes builds when all required tests pass.
How it works:
- Scans GitHub repository for test result files
- For each test result file:
- Checks if all jobs are completed
- Fetches detailed test results from GCS artifacts
- Implements retry logic for failed non-optional jobs
- Calculates metrics (total, success, failed, pending, required)
- Determines if build is QE accepted (all required jobs pass)
- Updates releasepayload with QE acceptance label in OpenShift cluster
- Marks result files as aggregated when complete
- Deletes result files for recycled/old nightly builds
Trigger: Runs as a continuous monitoring service:
jobctl start-aggregator --arch <architecture>Key Components:
TestResultAggregator- Main aggregator classTestResultAggregator.start()- Main processing loopTestMetrics- Tracks test result statisticsProwJobResult- Represents individual Prow job resultsTestJobResult- Represents test job with retriesArtifacts- Fetches test reports from GCS
Environment Variables Required:
GITHUB_TOKEN- GitHub authenticationAPITOKEN- Prow/Gangway API authenticationGCS_CRED_FILE- Google Cloud Storage credentials for artifact access
Acceptance Criteria: A build is marked as "QE Accepted" when all required (non-optional) jobs pass, either on first run or after retries.
Location: oar/notificator/jira_notificator.py
Purpose: Automatically monitors Jira issues in ON_QA status and sends escalating notifications to QA contacts, team leads, and managers when issues remain unverified for extended periods.
How it works:
- Queries Jira for OCPBUGS issues in ON_QA status for active z-stream releases
- Tracks when issues transitioned to ON_QA status
- Implements a three-tier escalation process based on weekday hours:
- After 24 weekday hours: Notifies QA Contact
- After 48 weekday hours: Notifies Team Lead (currently still notifies QA contact)
- After 72 weekday hours: Notifies Manager
- If contacts are missing, falls back to notifying assignee and their manager
- Integrates with LDAP to look up manager information
- Only counts weekday hours (Monday-Friday) for time calculations
- Adds Jira comments with @mentions to notify responsible people
Trigger: Can be run on-demand or scheduled via cron:
oarctl jira-notificator [--dry-run] [--from-date YYYY-MM-DD]Key Components:
NotificationService- Main notification service classNotificationType- Enum defining notification types (QA_CONTACT, TEAM_LEAD, MANAGER, ASSIGNEE)NotificationService.check_issue_and_notify_responsible_people()- Main logicNotificationService.is_more_than_24_weekday_hours()- Calculates weekday hoursNotificationService.get_on_qa_issues()- Queries Jira for ON_QA issuesLdapHelper- LDAP integration for manager lookup
Environment Variables Required:
JIRA_TOKEN- Jira authentication token
Options:
--dry-run- Test mode that doesn't send actual Jira comments--from-date- Only process issues that transitioned to ON_QA after this date--search-batch-size- Number of issues to fetch per batch (default: 100)
Location: tools/slack_message_receiver.py
Purpose: Provides a Slack bot interface that listens for messages in Slack channels and executes OAR CLI commands or answers questions using an AI model.
How it works:
- Connects to Slack using Socket Mode (WebSocket connection)
- Listens for messages and @mentions in configured channels
- Detects OAR commands (messages starting with
oaroroarctl) - Executes OAR commands and posts results back to Slack thread
- Optionally integrates with LLM (Language Model) to answer general questions
- Sets environment variables to enable background Slack notifications from OAR commands
- Splits large command outputs into multiple messages to avoid Slack limits
Trigger: Runs as a long-running service/daemon:
python tools/slack_message_receiver.pyKey Components:
SocketModeClient- Slack WebSocket clientprocess()- Main message handleris_oar_related_message()- Detects OAR commandssend_prompt_to_ai_model()- Optional LLM integrationget_username()- Fetches Slack user information
Environment Variables Required:
SLACK_APP_TOKEN- Slack app-level token for Socket ModeSLACK_BOT_TOKEN- Slack bot token for API calls
Optional (for AI features):
MODEL_API_BASE- OpenAI-compatible API endpointMODEL_API_KEY- API authentication keyMODEL_API_NAME- Model name to use
Features:
- Command execution in Slack threads
- Automatic code formatting of command output
- Thread-based responses to keep conversations organized
- Optional AI-powered question answering
Location: tools/auto_release_test_result_checker.py
Purpose: Monitors the GitHub repository for rejected test results and sends Slack notifications when builds fail QE acceptance criteria.
How it works:
- Scans the
_releasesdirectory in GitHub repository for test result JSON files - Checks if test results have
"accepted": false(build rejected) - Ignores manually promoted builds
- Tracks previously notified files to avoid duplicate notifications
- Sends Slack notifications with links to rejected build result files
- Limits initial notifications when first run
- Sorts files by name (reverse order) to process newest first
Trigger: Typically run on a schedule (cron job):
python tools/auto_release_test_result_checker.py \
--repo-name openshift/release-tests \
--slack-channel <channel-name> \
[--notified-file-path notified_files.txt] \
[--limit 5]Key Components:
TestResultChecker- Main checker classTestResultChecker.iterate_test_result_files()- Scans and processes filesTestResultChecker.check_file()- Checks individual result fileTestResultChecker.send_slack_notification()- Sends notification- File tracking system to avoid duplicate notifications
Environment Variables Required:
GITHUB_TOKEN- GitHub authenticationSLACK_BOT_TOKEN- Slack bot token
Options:
--repo-name- GitHub repository (format: owner/repo)--slack-channel- Target Slack channel for notifications--notified-file-path- File to track previously notified builds (default:notified_files.txt)--limit- Maximum notifications on first run (default: 5)--path- Path in repo to scan (default:_releases)--branch- Branch to scan (default:record)
The OAR CLI is the primary command-line interface for managing OpenShift z-stream releases. It provides a comprehensive set of commands that automate QE tasks throughout the release lifecycle.
Location: oar/cli/
Purpose: Provides interactive commands for release managers and QE engineers to execute release tasks, manage advisories, handle bug tracking, trigger tests, and coordinate release workflows.
git clone git@github.com:openshift/release-tests.git
cd release-tests
pip3 install -e .OAR CLI:
- OAR_JWK - Encryption key for config_store.json (stored in Bitwarden: openshift-qe-trt-env-vars)
- JIRA_TOKEN - Jira personal access token for API access
- GCP_SA_FILE - Google Cloud Platform service account credentials file path (optional for new releases using StateBox; required for old releases with Google Sheets)
- SLACK_BOT_TOKEN - Slack bot token for sending notifications
- JENKINS_USER - Jenkins username (email)
- JENKINS_TOKEN - Jenkins API token
- GITLAB_TOKEN - GitLab personal access token for shipment MR operations
- Kerberos ticket - Required for Errata Tool and LDAP access:
kinit $kid@$domain - Bugzilla credentials - Cached in
~/.config/python-bugzilla/bugzillarc
Release Detector:
- All OAR CLI environment variables (calls
create-test-reportcommand) - GITHUB_TOKEN - GitHub personal access token for monitoring repository file changes
Job Controller:
- GITHUB_TOKEN - GitHub personal access token for repository operations
- APITOKEN - Prow/Gangway API token for triggering test jobs
Test Result Aggregator:
- GITHUB_TOKEN - GitHub personal access token for repository operations
- APITOKEN - Prow/Gangway API token for triggering test jobs
- GCS_CRED_FILE - Google Cloud Storage credentials file path for test artifact access
Jira Notificator:
- JIRA_TOKEN - Jira personal access token for API access
- Kerberos ticket - For LDAP manager lookup
Slack Message Receiver (Release Bot):
- SLACK_APP_TOKEN - Slack app-level token for Socket Mode
- SLACK_BOT_TOKEN - Slack bot token for API calls
- All OAR CLI environment variables (executes OAR commands)
Test Result Checker:
- GITHUB_TOKEN - GitHub personal access token for repository operations
- SLACK_BOT_TOKEN - Slack bot token for sending notifications
Note: OAR_SLACK_CHANNEL and OAR_SLACK_THREAD are set internally by the Slack bot when executing commands and should not be configured manually by users.
The OAR CLI is built using Click framework with a group-based command structure:
oar -r <release-version> [OPTIONS] COMMAND [ARGS]Global Options:
-r, --release- z-stream release version (e.g., 4.13.6) - Required-v, --debug- Enable debug logging-V, --version- Show version information-h, --help- Show help message
Command: oar -r <release> create-test-report
Purpose: Initializes release state tracking for a new z-stream release.
What it does:
- For new releases: Creates StateBox at
_releases/{y-stream}/statebox/{release}.yamlwith:- Metadata from ConfigStore (advisories, JIRA ticket, candidate builds, shipment MR)
- Initial task status (all "Not Started")
- Empty issues list for tracking blocking problems
- Sends Slack notification to QE release team
- For old releases: Detects existing Google Sheets test report (backward compatibility)
- Automatically determines whether to use StateBox or Google Sheets
Output:
- New releases: StateBox GitHub URL
- Old releases: Google Sheets URL (if exists)
Note: StateBox is the primary system for new releases. Google Sheets only used for legacy releases created before StateBox migration.
Command: oar -r <release> take-ownership -e <owner-email>
Purpose: Assigns ownership of advisory and related JIRA subtasks to a specified person.
Options:
-e, --email- Email address of the new owner
What it does:
- Takes ownership of the release advisory in Errata Tool
- Updates ownership of QE-related JIRA subtasks created by ART team
- Ensures proper assignment throughout the release workflow
Command: oar -r <release> update-bug-list
Purpose: Synchronizes bug status between Bugzilla/Jira and the test report, and sends notifications.
What it does:
- Fetches latest bug status from Bugzilla/Jira
- Updates test report with current bug states (Verified/Closed)
- Appends newly attached bugs to the report
- Sends Slack notifications to QA Contacts for bugs needing attention
- Should be run multiple times throughout the release cycle
Use Case: Run periodically to keep bug tracking up-to-date (Errata flow only).
Note: Will raise error if executed for Konflux flow releases.
Command:
oar -r <release> image-consistency-check
oar -r <release> image-consistency-check -n <build-number>Purpose: Verifies that images in the release payload are consistent with advisory contents.
Options:
-n, --build-number- Jenkins build number to check status (for subsequent runs)
What it does:
- Triggers a Jenkins job to verify image consistency
- Compares images in release payload with images in advisories
- Returns build number on first run
- Can check job status on subsequent runs with build number
Workflow:
- First run: Triggers job, returns build number
- Subsequent runs: Check status using
-n <build-number>
Command: oar -r <release> check-greenwave-cvp-tests
Purpose: Validates that all Greenwave CVP (Container Verification Pipeline) tests have passed or been waived.
What it does:
- Checks CVP test status for all advisories in the release
- Reports tests with status: PASSED, WAIVED, or FAILED
- Provides test IDs and advisory numbers for failed tests
- Enables triggering "Refetch" for failed tests
Expected Result: All tests should be PASSED or WAIVED before proceeding.
Note: If tests fail after refetch, contact CVP team via Google Spaces [CVP].
Command: oar -r <release> check-cve-tracker-bug
Purpose: Identifies any CVE tracker bugs that may have been missed for the current release.
What it does:
- Calls
rh-elliottto scan for CVE tracker bugs - Checks if all CVE tracker bugs are properly tracked
- Sends Slack notification to ART team if any bugs are found missing
- Helps ensure security vulnerabilities are properly tracked
Command: oar -r <release> push-to-cdn-staging
Purpose: Triggers the push job to promote release artifacts to CDN staging environment.
Prerequisites: All Greenwave CVP tests must be PASSED or WAIVED.
What it does:
- Triggers push job for default "stage" target
- Does not interrupt existing running jobs
- Prepares release for stage testing
Command:
oar -r <release> stage-testing
oar -r <release> stage-testing -n <build-number>Purpose: Triggers the stage pipeline to perform testing in the staging environment.
Options:
-n, --build-number- Jenkins build number to check status (for subsequent runs)
What it does:
- Triggers Jenkins stage testing pipeline
- Returns build number on first run
- Can check job status on subsequent runs with build number
Workflow:
- First run: Triggers job, returns build number
- Subsequent runs: Check status using
-n <build-number>
Command: oar -r <release> image-signed-check
Purpose: Verifies that the release payload images are properly signed.
What it does:
- Automatically retrieves digest of stable build
- Checks if the signed image can be found on the mirror site
- Validates signature integrity
- Ensures release meets signing requirements
Command: oar -r <release> drop-bugs
Purpose: Manages bugs that are not yet verified, intelligently handling high-severity cases.
What it does:
- Scans all bugs from advisories that are not verified
- Identifies "high severity" bugs:
- Critical priority bugs
- CVE Tracker bugs
- Customer Case-related bugs
- For high severity bugs: Sends Slack notification to bug owner for confirmation
- For other bugs: Automatically drops them from the advisory
- Helps clean up advisory before release
Command: oar -r <release> change-advisory-status
Purpose: Changes advisory status (e.g., to REL_PREP) and finalizes QE tasks.
What it does:
- Changes advisory status in Errata Tool
- Closes QE-related JIRA subtasks
- Checks for blocking secalerts (for RHSA advisories)
- Throws appropriate error messages if validation fails
- Final step before release approval
A typical z-stream release workflow using OAR commands:
1. create-test-report # Initialize release tracking
2. take-ownership # Assign ownership
3. update-bug-list # First bug sync
(wait for builds/testing)
4. update-bug-list # Periodic bug updates (run multiple times)
5. image-consistency-check # Verify payload images
6. check-greenwave-cvp-tests # Validate CVP tests
7. check-cve-tracker-bug # Verify CVE coverage
8. push-to-cdn-staging # Push to staging
9. stage-testing # Run stage tests
10. image-signed-check # Verify signatures
11. drop-bugs # Clean up unverified bugs
12. change-advisory-status # Finalize and approve
Location: oar/core/configstore.py
The ConfigStore class manages release-specific configuration:
- Loads encrypted configuration from
config_store.json - Stores advisory information, Jira references, Google Sheet URLs
- Provides context for all OAR commands
- Uses JWE encryption for sensitive data (decrypted using
OAR_JWKenvironment variable)
StateBox (Primary for New Releases):
All OAR commands use util.log_task_status() to output status markers:
- Logs format:
"task [{Display Name}] status is changed to [{Status}]" cli_result_callbackautomatically parses last line of command output- Auto-updates StateBox task status without explicit StateBox calls
- Task states: "Not Started" → "In Progress" → "Pass" / "Fail"
- Task results and timestamps recorded in StateBox YAML
- Issues tracked with blocker/non-blocker classification
Google Sheets (Legacy Releases Only):
- Old releases use
WorksheetManagerfor test report tracking - Task status updated via explicit
report.update_task_status()calls - Overall status tracked as "Green" / "Red"
- Provides backward compatibility for releases before StateBox migration
The OAR CLI can be:
- Executed manually by release managers and QE engineers
- Triggered automatically by the Slack bot (via
tools/slack_message_receiver.py) - Called programmatically by automated agents (e.g., Release Detector calls
create-test-report)
The OAR CLI is built on a foundation of core modules located in oar/core/ that provide essential functionality for interacting with external systems and managing release workflows.
All core modules follow a consistent pattern:
- Manager/Helper classes that encapsulate API interactions
- Exception handling with custom exception types
- Integration with ConfigStore for configuration management
- Logging for observability
Purpose: Manages interactions with Red Hat Errata Tool for advisory operations.
Key Classes:
AdvisoryManager- High-level manager for advisory operationsAdvisory- Wrapper around Erratum with extended functionality
Key Functionality:
- Get/update advisories for a release
- Change advisory ownership (QE email)
- Check Greenwave CVP test status
- Push advisories to CDN (stage/live)
- Change advisory status (QE → REL_PREP → SHIPPED LIVE)
- Drop bugs from advisories
- Check CVE tracker bugs using Elliott
- Validate advisory health grades (A/B/C/D/F)
- Check for blocking security alerts
- Manage advisory dependencies (blocking advisories)
Key Methods:
get_advisories()- Get all advisories for current releasechange_ad_owners()- Change QA owner across all advisoriescheck_greenwave_cvp_tests()- Validate CVP test resultspush_to_cdn_staging()- Trigger CDN push jobschange_advisory_status()- Move advisory through workflow statesdrop_bugs()- Remove unverified non-CVE bugscheck_cve_tracker_bug()- Find missing CVE tracker bugs
Dependencies: Requires Kerberos ticket for Errata Tool access
Purpose: Manages Google Sheets test reports for tracking release progress.
Key Classes:
WorksheetManager- Creates and manages test report worksheetsTestReport- Wrapper for worksheet operations
Key Functionality:
- Create test reports from templates
- Update advisory/shipment information
- Update candidate build information
- Track bug status (ON_QA, Verified, Closed, Dropped)
- Update task checklist status (Pass/Fail/In Progress)
- Manage overall status (Green/Red)
- Add hyperlinks with advanced formatting
- Track CVE tracker bugs
- Support both Errata and Konflux workflows
Key Methods:
create_test_report()- Create new report from templateupdate_bug_list()- Sync bug status with Jira/Bugzillaupdate_task_status()- Update checklist item statusare_all_bugs_verified()- Check if all bugs are verifiedupdate_cell_with_hyperlinks()- Advanced cell formatting with links
Dependencies: Requires Google Service Account credentials
Purpose: Manages interactions with Red Hat Jira for issue tracking.
Key Classes:
JiraManager- Jira API client wrapperJiraIssue- Wrapper for Jira issue with helper methods
Key Functionality:
- Get/create/update Jira issues
- Query issue status and metadata
- Change issue assignees
- Manage ART subtasks
- Identify high-severity issues (Critical, Blocker, Customer Cases, CVE)
- Filter issues by verification status
- Create CVP failure tracking issues
Key Methods:
get_issue()- Fetch issue by keycreate_issue()- Create new Jira issueget_sub_tasks()- Get subtasks from parent ticketchange_assignee_of_qe_subtasks()- Reassign QE subtasksclose_qe_subtasks()- Close QE subtasks when release completesget_unverified_cve_issues()- Find unverified CVE bugsget_high_severity_and_can_drop_issues()- Categorize bugs for dropping
Issue Classification:
is_cve_tracker()- CVE security bugsis_critical_issue()- Critical/Blocker priority or TestBlocker labelis_customer_case()- Has SFDC cases attachedis_high_severity_issue()- Any of the above
Purpose: Sends notifications via Slack (and optionally email) for release events.
Key Classes:
NotificationManager- High-level notification orchestratorSlackClient- Slack API wrapperMessageHelper- Formats messages for different notification types
Key Functionality:
- Send Slack messages to channels and threads
- User/group ID lookup for @mentions
- Format messages with hyperlinks
- Split large messages to respect Slack limits
- Support for threaded responses (OAR_SLACK_CHANNEL/OAR_SLACK_THREAD env vars)
- Message template system for consistent formatting
Notification Types:
- New StateBox creation (new releases)
- New test report creation (legacy releases)
- Ownership changes (advisories/subtasks)
- Bug verification requests
- High severity bug confirmations
- CVE tracker bug alerts
- Advisory health warnings
- Jenkins build status
- Shipment MR updates
- Release approval completion
Key Methods:
share_new_statebox()- Notify about new StateBox creation (new releases)share_new_report()- Notify about new test report (legacy releases)share_bugs_to_be_verified()- Remind QA contacts to verify bugsshare_high_severity_bugs()- Confirm dropping high-severity bugsshare_new_cve_tracker_bugs()- Alert about missing CVE trackersshare_release_approval_completion()- Notify about release completion
Purpose: Manages GitLab merge requests for Konflux shipment data.
Key Classes:
ShipmentData- Main interface for shipment operationsGitLabMergeRequest- GitLab MR wrapper with rich functionalityGitLabServer- Server-level GitLab operationsImageHealthData- Container for image health check results
Key Functionality:
- Parse and interact with shipment YAML files
- Extract Jira issues from shipment data
- Add QE approval to shipment MRs
- Check pipeline status (stage-release, prod-release)
- Create/update "drop bugs" merge requests
- Check container image health via Pyxis API
- Add comments and suggestions to MRs
- Support for forked repository workflows
Key Methods:
get_jira_issues()- Extract Jira issues from shipment YAMLsadd_qe_approval()- Approve shipment MRis_stage_release_success()- Check if stage release completeddrop_bugs()- Create MR to remove unverified bugscheck_component_image_health()- Validate container image freshness gradesadd_image_health_summary_comment()- Report image health to MRcheck_cve_tracker_bug()- Find missing CVE trackers in shipment
GitLab Features:
- File content retrieval with caching
- Pipeline and stage monitoring
- Auto-merge support
- Comment and suggestion management
- Branch creation and management
Purpose: Triggers and monitors Jenkins CI/CD jobs.
Key Classes:
JenkinsHelper- Jenkins job orchestration
Key Functionality:
- Trigger stage testing pipeline
- Trigger image consistency check jobs
- Monitor job queue and execution
- Validate job parameters match release version
- Get build status with detailed error handling
Supported Jobs:
stage-pipeline- Stage environment testingimage-consistency-check- Verify payload images match advisories
Key Methods:
call_stage_job()- Trigger stage testingcall_image_consistency_job()- Trigger image consistency validationget_build_status()- Check job status by build numberis_job_enqueue()- Check if job is queued
Purpose: Common utility functions used across OAR.
Key Functions:
- Version validation:
is_valid_z_release(),validate_release_version(),get_y_release() - URL builders:
get_jira_link(),get_advisory_link(),get_ocp_test_result_url() - Email validation:
is_valid_email() - MR parsing:
parse_mr_url()- Extract project/MR ID from GitLab URLs - Logging:
init_logging()- Configure logging with SSL warning suppression - Message splitting:
split_large_message()- Split content for Slack limits - Payload validation:
is_payload_metadata_url_accessible()- Check if release metadata is available
Purpose: Centralized configuration management for releases.
Key Functionality:
- Load/decrypt
config_store.jsonusing JWE encryption (OAR_JWK env var) - Store release-specific settings (advisories, Jira tickets, builds, owners)
- Provide access to external service credentials
- Support both Errata and Konflux workflow modes
Configuration Data:
- Release version and metadata
- Advisory IDs (image, extras, metadata, rpm, microshift)
- ART Jira ticket reference
- Candidate nightly builds
- Owner email address
- Google Sheets template and service account
- External service URLs and tokens
- Slack channels and user groups
- Shipment MR URL (Konflux flow)
Purpose: Git repository operations for shipment data management.
Key Classes:
GitHelper- Git command wrapper
Key Functionality:
- Clone repositories with authentication
- Create and checkout branches
- Configure remote repositories
- Commit and push changes
- Support for forked repository workflows
Purpose: LDAP queries for organizational hierarchy.
Key Classes:
LdapHelper- LDAP client for Red Hat directory
Key Functionality:
- Look up manager email by employee email
- Support Jira notification escalation workflows
Purpose: Provides composite operator classes that orchestrate operations across multiple core modules for complex workflows.
Key Classes:
ReleaseOwnershipOperator- Handles ownership updates across advisories and shipmentsBugOperator- Manages bug operations across both advisory and shipment sourcesApprovalOperator- Handles approval operations based on release flow type (Errata or Konflux)ImageHealthOperator- Checks image container health for both advisory and shipment dataCVETrackerOperator- Handles CVE tracker bug checking across both sourcesNotificationOperator- Manages notifications based on release flow typeLogCaptureHandler- Custom logging handler to capture log messages for background processes
Key Functionality:
- Orchestrate complex operations across multiple modules
- Support both Errata and Konflux workflow modes
- Handle background processes with proper locking mechanisms
- Schedule periodic checks with timeout handling
- Aggregate data from multiple sources (advisories + shipments)
Key Methods:
ReleaseOwnershipOperator.update_owners()- Update ownership across all sourcesBugOperator.get_jira_issues()- Get combined Jira issues from all sourcesBugOperator.drop_bugs()- Execute bug drop across advisories and shipmentsApprovalOperator.approve_release()- Execute approval with metadata URL checkingImageHealthOperator.check_image_health()- Check container health across sourcesCVETrackerOperator.check_cve_tracker_bugs()- Find missed CVE tracker bugs
Background Processing:
- Implements scheduler with file-based locking to prevent duplicate instances
- Periodic metadata URL accessibility checking (every 30 minutes)
- Automatic advisory status changes when metadata becomes accessible
- Timeout handling (2 days default)
- Proper cleanup and notification on completion/timeout/error
Dependencies: Integrates AdvisoryManager, ShipmentData, JiraManager, NotificationManager, WorksheetManager
ConfigStore (foundation)
├── AdvisoryManager
├── WorksheetManager
├── JiraManager
├── NotificationManager
├── ShipmentData
├── JenkinsHelper
└── Utility functions
External APIs:
├── Errata Tool (advisory.py) → Kerberos auth
├── Jira (jira.py) → JIRA_TOKEN
├── Google Sheets (worksheet.py) → Service Account
├── Slack (notification.py) → SLACK_BOT_TOKEN
├── GitLab (shipment.py) → GITLAB_TOKEN
├── Jenkins (jenkins.py) → JENKINS_USER/JENKINS_TOKEN
└── LDAP (ldap.py) → Kerberos auth
Custom exception types in oar/core/exceptions.py:
AdvisoryException- Errata Tool errorsStateBoxException- StateBox YAML state management errorsWorksheetException- Google Sheets errorsJiraException/JiraUnauthorizedException- Jira errorsNotificationException- Slack/email errorsShipmentDataException- GitLab/shipment errorsGitLabMergeRequestException/GitLabServerException- GitLab API errorsJenkinsException- Jenkins errors
These automated agents work together to provide an end-to-end automated release workflow:
- Release Detector identifies new z-stream releases and creates initial test reports
- Job Controller monitors for new builds and triggers Prow test jobs
- Test Result Aggregator processes test results, retries failures, and accepts builds
- Test Result Checker notifies the team about rejected builds via Slack
- Jira Notificator ensures bugs in ON_QA status are verified in a timely manner
- Slack Bot provides interactive access to OAR commands from Slack
Most of these agents are designed to run as:
- Continuous services (controllers, aggregator, Slack bot) - typically deployed in containers or as systemd services
- Scheduled jobs (detectors, checkers, notificators) - typically deployed as cron jobs or Kubernetes CronJobs
Continuous Services:
- Job Controller (per architecture)
- Test Result Aggregator (per architecture)
- Slack Message Receiver
Scheduled Jobs:
- Release Detector (e.g., every 4-6 hours)
- Jira Notificator (e.g., every 2-4 hours)
- Test Result Checker (e.g., every 30 minutes)
All agents use Python's standard logging module with configurable log levels. Key events to monitor:
- New releases detected
- New builds found
- Test jobs triggered
- Test results aggregated
- Builds accepted/rejected
- Notifications sent
- Errors and exceptions
Most agents rely on:
- Environment variables for credentials and API tokens
- GitHub repository for persistent state (build tracking, test results)
- Job registry files in the repository to define which tests to run
- OAR configuration for release-specific settings
See individual agent sections above for specific environment variables required.
Problem: JIRA_TOKEN authentication fails
- Solution: Regenerate your Jira personal access token at https://issues.redhat.com
- Verify: Test with
curl -H "Authorization: Bearer $JIRA_TOKEN" https://issues.redhat.com/rest/api/2/myself
Problem: Kerberos ticket expired for Errata Tool access
- Solution: Renew your Kerberos ticket:
kinit $kid@$domain - Verify: Check ticket status with
klist
Problem: GitHub token permissions insufficient
- Solution: Ensure token has
reposcope for private repositories - Verify: Test with
gh auth status
Release Detector
- Problem: Cannot detect new releases
- Check: Verify GitHub repository access and ART's version file location
- Debug: Run with
-vflag for verbose logging:oarctl start-release-detector -r 4.19 -v
Job Controller
- Problem: Prow jobs not triggering
- Check: Verify
APITOKENfor Gangway authentication - Check: Ensure job registry files are properly configured in the repository
- Debug: Check GitHub tracking files in
_releasesdirectory
Test Result Aggregator
- Problem: Results not being aggregated
- Check: Verify
GCS_CRED_FILEpath and credentials - Check: Ensure result files exist in GitHub repository
- Debug: Check for errors in aggregator logs regarding GCS access
Jira Notificator
- Problem: Notifications not sent to Jira
- Check: Verify Jira token has permission to comment on issues
- Check: Confirm LDAP connectivity for manager lookups
- Debug: Use
--dry-runflag to test without sending actual comments
Slack Bot
- Problem: Bot not responding to commands
- Check: Verify both
SLACK_APP_TOKENandSLACK_BOT_TOKENare set - Check: Ensure bot is invited to the channel
- Debug: Check WebSocket connection logs for Socket Mode
Problem: ConfigStore cannot decrypt configuration
- Solution: Verify
OAR_JWKenvironment variable is set correctly - Location: JWK is stored in Bitwarden: openshift-qe-trt-env-vars
Problem: Google Sheets API access fails
- Solution: Verify Google Service Account credentials are properly configured
- Check: Ensure service account has edit permissions on the spreadsheet
Problem: Advisory operations fail
- Solution: Ensure you have a valid Kerberos ticket:
kinit $kid@$domain - Check: Test Errata Tool access: visit https://errata.devel.redhat.com
Problem: CVE tracker bug check fails
- Solution: Verify
elliottcommand is available in PATH - Check: Ensure ocp-build-data repository is accessible
Problem: Approval operator background process not running
- Check: Look for lock file in
/tmp/oar_scheduler_*.lock - Debug: Check logs in
/tmp/oar_logs/metadata_checker_*.log - Solution: Remove stale lock file if process is not actually running
Problem: Metadata URL accessibility check times out
- Expected: Default timeout is 2 days
- Solution: Check if advisory metadata URL is actually accessible for the release
- Debug: Manually test URL:
curl -I https://access.redhat.com/errata/RHBA-2025:17662 - Expected Response: HTTP 200 OK indicates the advisory page is accessible
- Enable verbose logging: Most commands support
-vor--debugflag - Check environment variables:
env | grep -E '(JIRA|SLACK|GITHUB|OAR)' - Review recent logs: Agents use Python logging module with timestamps
- Test individual components: Use dry-run modes where available
- Verify external service status: Check status pages for Jira, GitHub, Slack
- Check network connectivity: Ensure access to internal Red Hat services
This project supports the following OpenShift versions for z-stream releases:
Currently Supported Versions:
- OpenShift 4.12.z
- OpenShift 4.13.z
- OpenShift 4.14.z
- OpenShift 4.15.z
- OpenShift 4.16.z
- OpenShift 4.17.z
- OpenShift 4.18.z
- OpenShift 4.19.z
- OpenShift 4.20.z
Workflow Support:
- Errata Flow: Traditional advisory-based workflow (all supported versions)
- Konflux Flow: GitLab MR-based shipment workflow (newer versions)
Note: Version support is configured in:
- Jira Notificator query filters (
oar/notificator/jira_notificator.py) - Job registry files for test definitions
- ConfigStore release validation
When adding support for new OpenShift versions, update:
- Jira query filters in the notificator
- Job registry configurations
- Test report templates
- Add new ci-profile for stage-testing pipeline
- Add new release version to parameter
VERSIONof image-consistency-check job - Update configstore config to add new test template doc ID and slack group alias for release leads