Skip to content

[ci] Increase test timeout to work around presumed system contention#15297

Open
cjllanwarne wants to merge 1 commit intohail-is:mainfrom
cjllanwarne:cjl_raise_timeout_for_python_test
Open

[ci] Increase test timeout to work around presumed system contention#15297
cjllanwarne wants to merge 1 commit intohail-is:mainfrom
cjllanwarne:cjl_raise_timeout_for_python_test

Conversation

@cjllanwarne
Copy link
Collaborator

@cjllanwarne cjllanwarne commented Feb 23, 2026

Change Description

Addresses the test failing in https://batch.hail.is/batches/8362516/jobs/172 (test_hail_python_service_backend_gcp_11)

This test appears to be unusually susceptible to cluster contention because (1) it's relatively heavyweight and (2) it has a relatively short custom timeout (all tests have a global 10m timeout, this one overrides it with 4m). In the batch above, I am suspicious that a previous preempted attempt left just enough existing work in place that the retry was just a little too slow to finish.

(It might be nicer to cancel previous attempts' jobs at the start of preemption retries, but that's a much bigger change. Hopefully this will reduce flakiness in the test suite with minimal upfront effort)

Security Assessment

  • This change potentially impacts the Hail Batch instance as deployed by Broad Institute in GCP

Impact Rating

  • This change has a low security impact

Impact Description

Just a test change to hopefully reduce flakiness

Appsec Review

  • Required: The impact has been assessed and approved by appsec

@cjllanwarne cjllanwarne requested review from a team and grohli March 9, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants