Add smoke test infrastructure by janisz · Pull Request #91 · stackrox/stackrox-mcp

janisz · 2026-03-24T15:31:01Z

Description

Add smoke tests that run against real StackRox Central deployment.
Tests verify end-to-end functionality

Tests read ROX_ENDPOINT and ROX_PASSWORD from environment variables.

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Validation

codecov-commenter · 2026-03-24T15:35:42Z

❌ 2 Tests Failed:

Tests completed	Failed	Passed	Skipped
361	2	359	12

View the top 2 failed test(s) by shortest run time

::policy 1

Stack Traces | 0s run time

- test violation 1
- test violation 2
- test violation 3

::policy 4

Stack Traces | 0s run time

- testing multiple alert violation messages 1
- testing multiple alert violation messages 2
- testing multiple alert violation messages 3

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

github-actions · 2026-03-24T15:41:13Z

E2E Test Results

Commit: 42cd1b6
Workflow Run: View Details
Artifacts: Download test results & logs

=== Evaluation Summary ===

  ✓ list-clusters (assertions: 3/3)
  ✓ cve-detected-workloads (assertions: 3/3)
  ✓ cve-detected-clusters (assertions: 3/3)
  ~ cve-nonexistent (assertions: 2/3)
      - MaxToolCalls: Too many tool calls: expected <= 5, got 7
  ✗ cve-cluster-does-exist (assertions: 3/3)
      one or more verification steps failed
  ~ cve-cluster-does-not-exist (assertions: 2/3)
      - ToolsUsed: Required tool not called: server=stackrox-mcp, tool=, pattern=list_clusters
  ✓ cve-clusters-general (assertions: 3/3)
  ✓ cve-cluster-list (assertions: 3/3)
  ✓ cve-log4shell (assertions: 3/3)
  ✓ cve-multiple (assertions: 3/3)
  ~ rhsa-not-supported (assertions: 1/2)
      - MaxToolCalls: Too many tool calls: expected <= 4, got 13

Tasks:      10/11 passed (90.91%)
Assertions: 29/32 passed (90.62%)
Tokens:     ~67574 (estimate - excludes system prompt & cache)
MCP schemas: ~12738 (included in token total)
Agent used tokens:
  Input:  15361 tokens
  Output: 27509 tokens
Judge used tokens:
  Input:  50242 tokens
  Output: 43938 tokens

Implements comprehensive smoke testing infrastructure that deploys StackRox Central in a Kind cluster and runs integration tests against real APIs. ## Key Components - Smoke test suite (`smoke/smoke_test.go`): - Tests cluster listing, CVE queries, and deployment detection - Runs against real StackRox deployment in CI - Uses build tag `smoke` to separate from unit tests - Authentication helpers (`smoke/token_helper.go`): - API token generation using HTTP basic auth - Health check polling with exponential backoff - Proper context handling and HTTP method constants - GitHub Actions workflow (`.github/workflows/smoke.yml`): - Deploys StackRox Central and vulnerable workload in Kind - Optimized for CI resources (reduced replicas, disabled features) - Waits for cluster health before running tests - Uploads test results and coverage to Codecov - Test utilities (`internal/testutil/test_helpers.go`): - Moved from integration_helpers.go for better organization - Port allocation for tests - Server readiness polling ## CI Optimizations - Kind cluster configured to maximize available CPU - Minimal StackRox deployment (no admission controller, no collector) - Resource constraints removed from sensor and scanner pods - Scanner image scanning skipped in CI to save resources - Port-forwarding to Central for API access ## Testing - Smoke tests run in dedicated workflow - Excluded from standard test target (uses build tags) - Test artifacts and logs collected on failure - Integration with Codecov for coverage tracking Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Remove unnecessary CVE vulnerability testing and scanner deployment to simplify smoke tests and reduce CI resource usage. Changes: - Remove CVE test cases (get_deployments_for_cve tests) - Remove waitForImageScan function - Delete vulnerable workload deployment (nginx:1.14) - Disable scanner deployment (SCANNER_REPLICAS: 0) - Replace bash cluster health check with Go code using testify Eventually - Add IsClusterHealthy function for cleaner cluster status checking - Remove scanner resource constraints and log collection - Keep only list_clusters test for basic connectivity verification Benefits: - Faster test execution (no image scanning wait) - Lower CI resource usage (no scanner pod) - Simpler test code (1 test case instead of 3) - Better error handling with testify Eventually - Less bash scripting, more Go code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The custom kind config was setting CPU reservations to 0 to maximize allocatable resources, but since we remove sensor resource constraints in a later step anyway, the custom config is unnecessary. Using default kind cluster configuration simplifies the workflow. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Replace git clone with actions/checkout@v4 for better GitHub Actions integration. Remove manual sensor resource constraint removal since simplified deployment (no scanner, no vulnerable workload) should allow sensor to schedule without intervention. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The smoke tests already use build tags (//go:build smoke) to separate them from regular unit tests, so the testing.Short() check is redundant and unnecessary. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Replace manual if checks with require.NotEmpty from testify for cleaner and more idiomatic test code. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Replace manual error checks with require.NoError from testify for consistent error handling throughout the test. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add //go:build smoke tag to token_helper.go - Remove parameterized test structure since there's only one test case - Directly call list_clusters test without map iteration Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Rename token_helper.go to central_helper_test.go - Change helper functions to accept testing.T and use require.NoError - Replace custom backoff loop in WaitForCentralReady with assert.Eventually - Add t.Helper() to all helper functions - Simplify error handling by using require instead of returning errors Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The failed smoke tests showed that assert.Eventually doesn't stop the test when the condition times out - it just logs an error and continues. This caused confusing secondary failures when the cluster health check timed out but the test continued to run list_clusters. Changes: - Use require.Eventually for cluster health check (fail fast on timeout) - Use require.Eventually for Central ready check (fail fast on timeout) - Use require.NotEmpty for cluster list validation - Remove unused assert import from both test files This ensures the test fails immediately with a clear message when prerequisites aren't met, rather than continuing and generating misleading secondary failures. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Root cause: Sensor pods were failing to schedule due to insufficient CPU on kind cluster nodes. The deployment had resource requests that exceeded available capacity. Error: "0/1 nodes are available: 1 Insufficient cpu" Fix: - Add step to remove resource requests from sensor deployment - This triggers a rollout with pods that can be scheduled - Wait for sensor pods to become ready after constraint removal - Remove fallback message from sensor wait (fail fast if not ready) This restores functionality that was accidentally removed when we eliminated the custom kind config. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The previous approach tried to remove the requests field, but the pod still failed with insufficient CPU even after patching. This suggests either the limits field was still present, or the remove operation didn't work as expected with the Helm-deployed sensor. New approach: - Use JSON patch "replace" operation instead of "remove" - Set entire resources field to empty object {} - This removes both limits and requests in one operation - Use kubectl rollout status to wait for deployment to complete instead of kubectl wait on pods (which fails when pods don't exist) The replace operation is more reliable than remove when the structure might vary (Helm vs direct deployment). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Previous attempts failed even after patching because: 1. Only patched container[0], but there may be multiple containers 2. Need to verify patch actually removed resources 3. May have LimitRange enforcing minimum resources Changes: - Add debugging to show containers and resources before/after patch - Check for LimitRange in stackrox namespace - Dynamically detect number of containers and patch all of them - Show resources before and after patching to verify it worked This will help diagnose why pods still fail with "Insufficient cpu" even after resources field is set to empty object. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The previous debugging revealed that the patch was successful (resources were set to {}), but pods still couldn't schedule. The issue is that we need to force pod recreation after removing resources. Restore the approach that was working before commit b66e3f4: 1. Use `kubectl set resources` with cpu=0,memory=0 (cleaner than patch) 2. Delete sensor pods to force immediate recreation 3. Wait for new pods with empty resources to be created and ready This approach worked in earlier successful runs. The key insight is that just patching the deployment and waiting for rollout isn't enough - we need to delete and recreate the pods to ensure they schedule with the updated (empty) resource spec. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The sensor resource removal now works correctly - sensor pod became ready! But the port-forward step failed because it tried to redirect output to logs/port-forward.log before the logs directory existed. The "Collect logs" step creates the logs directory, but that happens later. The port-forward step runs earlier and needs the directory to exist. Simple fix: Create logs directory at the start of the port-forward step. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>

janisz force-pushed the smoke_test branch from b7301ff to d365c31 Compare March 24, 2026 16:24

janisz force-pushed the smoke_test branch from 0757c89 to bd01ca2 Compare March 26, 2026 17:28

janisz and others added 14 commits March 26, 2026 18:43

Remove redundant testing.Short check from smoke test

272bef7

The smoke tests already use build tags (//go:build smoke) to separate them from regular unit tests, so the testing.Short() check is redundant and unnecessary. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Use require.NotEmpty for environment variable checks

a2edccf

Replace manual if checks with require.NotEmpty from testify for cleaner and more idiomatic test code. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Use require.NoError for error checks in smoke test

a3149aa

Replace manual error checks with require.NoError from testify for consistent error handling throughout the test. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Simplify smoke test and add build tag to token_helper

266be46

- Add //go:build smoke tag to token_helper.go - Remove parameterized test structure since there's only one test case - Directly call list_clusters test without map iteration Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

janisz requested a review from mtodor March 27, 2026 13:04

janisz marked this pull request as ready for review March 27, 2026 13:05

fix

42cd1b6

Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add smoke test infrastructure#91

Add smoke test infrastructure#91
janisz wants to merge 16 commits intomainfrom
smoke_test

janisz commented Mar 24, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 24, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janisz commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Validation

Uh oh!

codecov-commenter commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 2 Tests Failed:

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janisz commented Mar 24, 2026 •

edited

Loading

codecov-commenter commented Mar 24, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading