Open
Conversation
❌ 2 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
E2E Test ResultsCommit: 42cd1b6 |
Implements comprehensive smoke testing infrastructure that deploys StackRox Central in a Kind cluster and runs integration tests against real APIs. ## Key Components - Smoke test suite (`smoke/smoke_test.go`): - Tests cluster listing, CVE queries, and deployment detection - Runs against real StackRox deployment in CI - Uses build tag `smoke` to separate from unit tests - Authentication helpers (`smoke/token_helper.go`): - API token generation using HTTP basic auth - Health check polling with exponential backoff - Proper context handling and HTTP method constants - GitHub Actions workflow (`.github/workflows/smoke.yml`): - Deploys StackRox Central and vulnerable workload in Kind - Optimized for CI resources (reduced replicas, disabled features) - Waits for cluster health before running tests - Uploads test results and coverage to Codecov - Test utilities (`internal/testutil/test_helpers.go`): - Moved from integration_helpers.go for better organization - Port allocation for tests - Server readiness polling ## CI Optimizations - Kind cluster configured to maximize available CPU - Minimal StackRox deployment (no admission controller, no collector) - Resource constraints removed from sensor and scanner pods - Scanner image scanning skipped in CI to save resources - Port-forwarding to Central for API access ## Testing - Smoke tests run in dedicated workflow - Excluded from standard test target (uses build tags) - Test artifacts and logs collected on failure - Integration with Codecov for coverage tracking Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove unnecessary CVE vulnerability testing and scanner deployment to simplify smoke tests and reduce CI resource usage. Changes: - Remove CVE test cases (get_deployments_for_cve tests) - Remove waitForImageScan function - Delete vulnerable workload deployment (nginx:1.14) - Disable scanner deployment (SCANNER_REPLICAS: 0) - Replace bash cluster health check with Go code using testify Eventually - Add IsClusterHealthy function for cleaner cluster status checking - Remove scanner resource constraints and log collection - Keep only list_clusters test for basic connectivity verification Benefits: - Faster test execution (no image scanning wait) - Lower CI resource usage (no scanner pod) - Simpler test code (1 test case instead of 3) - Better error handling with testify Eventually - Less bash scripting, more Go code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The custom kind config was setting CPU reservations to 0 to maximize allocatable resources, but since we remove sensor resource constraints in a later step anyway, the custom config is unnecessary. Using default kind cluster configuration simplifies the workflow. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace git clone with actions/checkout@v4 for better GitHub Actions integration. Remove manual sensor resource constraint removal since simplified deployment (no scanner, no vulnerable workload) should allow sensor to schedule without intervention. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The smoke tests already use build tags (//go:build smoke) to separate them from regular unit tests, so the testing.Short() check is redundant and unnecessary. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace manual if checks with require.NotEmpty from testify for cleaner and more idiomatic test code. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace manual error checks with require.NoError from testify for consistent error handling throughout the test. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add //go:build smoke tag to token_helper.go - Remove parameterized test structure since there's only one test case - Directly call list_clusters test without map iteration Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Rename token_helper.go to central_helper_test.go - Change helper functions to accept testing.T and use require.NoError - Replace custom backoff loop in WaitForCentralReady with assert.Eventually - Add t.Helper() to all helper functions - Simplify error handling by using require instead of returning errors Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The failed smoke tests showed that assert.Eventually doesn't stop the test when the condition times out - it just logs an error and continues. This caused confusing secondary failures when the cluster health check timed out but the test continued to run list_clusters. Changes: - Use require.Eventually for cluster health check (fail fast on timeout) - Use require.Eventually for Central ready check (fail fast on timeout) - Use require.NotEmpty for cluster list validation - Remove unused assert import from both test files This ensures the test fails immediately with a clear message when prerequisites aren't met, rather than continuing and generating misleading secondary failures. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Root cause: Sensor pods were failing to schedule due to insufficient CPU on kind cluster nodes. The deployment had resource requests that exceeded available capacity. Error: "0/1 nodes are available: 1 Insufficient cpu" Fix: - Add step to remove resource requests from sensor deployment - This triggers a rollout with pods that can be scheduled - Wait for sensor pods to become ready after constraint removal - Remove fallback message from sensor wait (fail fast if not ready) This restores functionality that was accidentally removed when we eliminated the custom kind config. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The previous approach tried to remove the requests field, but the pod
still failed with insufficient CPU even after patching. This suggests
either the limits field was still present, or the remove operation
didn't work as expected with the Helm-deployed sensor.
New approach:
- Use JSON patch "replace" operation instead of "remove"
- Set entire resources field to empty object {}
- This removes both limits and requests in one operation
- Use kubectl rollout status to wait for deployment to complete
instead of kubectl wait on pods (which fails when pods don't exist)
The replace operation is more reliable than remove when the structure
might vary (Helm vs direct deployment).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Previous attempts failed even after patching because: 1. Only patched container[0], but there may be multiple containers 2. Need to verify patch actually removed resources 3. May have LimitRange enforcing minimum resources Changes: - Add debugging to show containers and resources before/after patch - Check for LimitRange in stackrox namespace - Dynamically detect number of containers and patch all of them - Show resources before and after patching to verify it worked This will help diagnose why pods still fail with "Insufficient cpu" even after resources field is set to empty object. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The previous debugging revealed that the patch was successful (resources
were set to {}), but pods still couldn't schedule. The issue is that we
need to force pod recreation after removing resources.
Restore the approach that was working before commit b66e3f4:
1. Use `kubectl set resources` with cpu=0,memory=0 (cleaner than patch)
2. Delete sensor pods to force immediate recreation
3. Wait for new pods with empty resources to be created and ready
This approach worked in earlier successful runs. The key insight is that
just patching the deployment and waiting for rollout isn't enough - we
need to delete and recreate the pods to ensure they schedule with the
updated (empty) resource spec.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The sensor resource removal now works correctly - sensor pod became ready! But the port-forward step failed because it tried to redirect output to logs/port-forward.log before the logs directory existed. The "Collect logs" step creates the logs directory, but that happens later. The port-forward step runs earlier and needs the directory to exist. Simple fix: Create logs directory at the start of the port-forward step. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add smoke tests that run against real StackRox Central deployment.
Tests verify end-to-end functionality
Tests read ROX_ENDPOINT and ROX_PASSWORD from environment variables.
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com
Validation